Score Level Fusion Of Bimodal Emotion Recognition System Using Text And Speech

Main Article Content

A.Shunmuga Sundari, et. al.


The Meteoric blooming of social media services has created freakish opportunities for people to publicly express their views, opinions, and attitudes through various mediums such as text, voice and video. These opinions, views, and attitudes contain user emotion. Emotions sculpt a very indispensable and elementary aspect of people's existence. With the emerging social media analyzing the emotion depicted by the user through single emotion recognition system is much hard to satisfy the demands of emotional recognition system. Aiming to optimize the performance of the emotional recognition system, a score level fusion of bi-modal emotional recognition system SFBM_TS from text and speech was proposed in this paper. It focuses to predict the emotion into four class joy, sad, anger and fear. Initially text is analyzed through selective lexicon based BI-LSTM method and in parallel the speech is analyzed through deep learning network. Finally it employs the score level fusion of both text and speech by combining each class resulted scores of both method and takes the weighted average score for each emotion class. The emotion class which has the highest average score is finalized as the result. To precise the emotion classification process text data is assigned with 0.6 weights and speech is assigned with 0.4 weights. The definitive emotional state was determined by the output of both audio and text emotion analysis. As a result the accuracy of proposed model SFBM_TS is higher than that of the single emotion recognition modals by 5%.


Download data is not yet available.


Metrics Loading ...

Article Details

How to Cite
et. al., A. S. (2021). Score Level Fusion Of Bimodal Emotion Recognition System Using Text And Speech. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 12(10), 1–12. Retrieved from