Utilizing Machine Learning for Speech Emotion Recognition


  • M. Prithi Research Scholar, Department of Computer Science, Periyar University, Salem, Tamilnadu
  • Sankari M. Assistant Professor, Department of CSE, Sathyabama Institute of Science and Technology
  • Jayashri Prashant Shinde Assistant Professor Information Technology Department, G H Raisoni College of Engineering and Management, Pune
  • Rakesh Kumar Department of Computer Engineering & Applications, GLA University, Mathura
  • A. Deepak Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Saveetha University, Chennai, Tamilnadu
  • Hemant Singh Pokhariya Assistant Professor, Department of Computer Science & Engineering, Graphic Era Deemed to be University, Dehradun, Uttarakhand
  • Anurag Shrivastava Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Chennai, Tamilnadu


Speech Emotion Recognition, Machine Learning, MLP Classifier, Accuracy


Voice emotion recognition, a captivating field, employs machine learning techniques to identify and interpret emotions conveyed through speech. The primary objective of this research is to achieve accurate emotion recognition and classification by leveraging advanced algorithms and data analysis techniques. Throughout the process, significant features like pitch, intensity, and spectral characteristics are extracted from a vast collection of labeled voice recordings. Machine learning models including Support Vector Machines, Multilayer Perceptron (MLP) classifiers, Convolutional Neural Networks, and LSTM are then trained on this data to uncover patterns and correlations between these features and emotions. Once trained, these models can be employed to identify emotions in real-time speech inputs. The applications of speech emotion recognition span across multiple domains, encompassing virtual assistants, mental health monitoring, human-computer interaction, and entertainment. However, several challenges such as variability, subjectivity, cultural differences, and contextual influences must be addressed to enhance the accuracy and robustness of speech emotion recognition systems. Ongoing research endeavors seek to overcome these challenges and improve the performance of such systems. The integration of machine learning techniques into speech emotion recognition opens up exciting possibilities for comprehending and analyzing emotions in speech, contributing to a deeper understanding of human communication and interaction. Moreover, this technology holds practical implications in various fields.


Download data is not yet available.


McFee, B., et al.: Librosa: audio and music signal analysis in python. In: Proceedings of the 14th Python in Science Conference, vol. 8 (2015)

Idris, I., & Salam, M. S. H. (2014, December). Emotion detection with hybrid voice quality and prosodic features using neural networks. In 2014 4th World Congress on Information and Communication Technologies (WICT 2014) (pp. 205-210). IEEE.

Wang, K., An, N., & Li, L. (2014, September). Speech emotion recognition based on wavelet packet coefficient model. In The 9th International Symposium onChinese Spoken Language Processing (pp. 478-482). IEEE.

Xu, L., Xu, M., & Yang, D. (2009). ANN based decision fusion for speech emotion recognition. In the Tenth Annual Conference of the International Speech Communication Association.

Atassi, H., & Esposito, A. (2008, November). A speaker independent approach to the classification of emotional vocal expressions. In 2008 20th IEEE international conference on tools with artificial intelligence (Vol. 2, pp. 147-152). IEEE.

Huang, C., Jin, Y., Zhao, Y., Yu, Y., & Zhao, L. (2009, September). Speech emotion recognition based on re-composition of two-class classifiers. In 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops (pp. 1-3). IEEE.

Zhang, J., Yin, Z., Chen, P., & Nichele, S. (2020). Emotion recognition using multi-modal data and machine learning techniques: A tutorial and review. Information Fusion, 59, 103-126.

Kerkeni, L.,Serrestou, Y., Mbarki, M., Raoof, K., & Mahjoub, M. A. (2018, January). SpeechEmotion Recognition: Methods and Cases Study. In ICAART (2) (pp. 175-182).

Basu, S., Chakraborty, J., Bag, A., & Aftabuddin, M. (2017, March). A review on emotion recognition using speech. In 2017 International Conference on Inventive Communication and Computational Technologies (ICICCT) (pp. 109-114). IEEE.

Tarunika, K., Pradeeba, R. B., & Aruna, P. (2018, July). Applying machine learning techniques for speech emotion recognition. In 2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (pp. 1-5). IEEE.

Kerkeni, L., Serrestou, Y., Mbarki, M., Raoof, K., Mahjoub, M. A., & Cleder, C. (2019). Automatic speech emotion recognition using machine learning. In Social media and machine learning. IntechOpen.

Agiripalli, S. S., Bobba, V., & Potharaju, S. P.: A novel trimet graph optimization (TGO) topology for wireless networks, (2019) doi:10.1007/978-981-13-0617-4_8.

Potharaju, S. P., & Sreedevi, M. (2018). A novel cluster of quarter feature selection based on symmetrical uncertainty. Gazi University Journal of Science, 31(2), 456-470.

Agiripalli, S. S., & Bobba, V.: Research on network design and analysis of TGO topology International Journal of Networking and Virtual Organizations, 19(1), 72-86, (2018).

Neha Sharma, P. William, Kushagra Kulshreshtha, Gunjan Sharma, Bhadrappa Haralayya, Yogesh Chauhan, Anurag Shrivastava, “Human Resource Management Model with ICT Architecture: Solution of Management & Understanding of Psychology of Human Resources and Corporate Social Responsibility”, JRTDD, vol. 6, no. 9s(2), pp. 219–230, Aug. 2023.

William, P., Shrivastava, A., Chauhan, P.S., Raja, M., Ojha, S.B., Kumar, K. (2023). Natural Language Processing Implementation for Sentiment Analysis on Tweets. In: Marriwala, N., Tripathi, C., Jain, S., Kumar, D. (eds) Mobile Radio Communications and 5G Networks. Lecture Notes in Networks and Systems, vol 588. Springer, Singapore. https://doi.org/10.1007/978-981-19-7982-8_26

K. Maheswari, P. William, Gunjan Sharma, Firas Tayseer Mohammad Ayasrah, Ahmad Y. A. Bani Ahmad, Gowtham Ramkumar, Anurag Shrivastava, “Enterprise Human Resource Management Model by Artificial Intelligence to Get Befitted in Psychology of Consumers Towards Digital Technology”, JRTDD, vol. 6, no. 10s(2), pp. 209–220, Sep. 2023.

Kumar, A., More, C., Shinde, N. K., Muralidhar, N. V., Shrivastava, A., Reddy, C. V. K., & William, P. (2023). Distributed Electromagnetic Radiation Based Sree Lakshmi, P., Deepak, A., Muthuvel, S.K., Amarnatha Sarma, C Design and Analysis of Stepped Impedance Feed Elliptical PatchAntenna Smart Innovation, Systems and Technologies, 2023, 334, pp. 63

Gupta, A., Mazumdar, B.D., Mishra, M., ...Srivastava, S., Deepak, A., Role of cloud computing in management and education, Materials Today: Renewable Energy Assessment Using Novel Ensembling Approach. Journal of Nano-and Electronic Physics, 15(4).

William, P., Shrivastava, A., Shunmuga Karpagam, N., Mohanaprakash, T.A., Tongkachok, K., Kumar, K. (2023). Crime Analysis Using Computer Vision Approach with Machine Learning. In: Marriwala, N., Tripathi, C., Jain, S., Kumar, D. (eds) Mobile Radio Communications and 5G Networks. Lecture Notes in Networks and Systems, vol 588. Springer, Singapore. https://doi.org/10.1007/978-981-19-7982-8_25

Potharaju, S. P., & Sreedevi, M. (2017). A Novel Clustering Based Candidate Feature Selection Framework Using Correlation Coefficient for Improving Classification Performance. Journal of Engineering Science & Technology Review, 10(6).

Agiripalli, S. S., & Bobba, V. (2019). An Optimal TGO Topology Method for a Scalable and Survivable Network in IOT Communication Technology. Wireless Personal Communications,107(2), 1019-1040.z

Potharaju, S. P., Sreedevi, M., & Agiripalli, S. S. (2019). An Ensemble Feature Selection Framework of Sonar Targets Using Symmetrical Uncertainty and Multi-Layer Perceptron (SUMLP). In Cognitive Informatics and Soft Computing (pp. 247-256). Springer, Singapore.

Mr. Nikhil Surkar, Ms. Shriya Timande. (2012). Analysis of Analog to Digital Converter for Biomedical Applications. International Journal of New Practices in Management and Engineering, 1(03), 01 - 07. Retrieved from http://ijnpme.org/index.php/IJNPME/article/view/6

Khatri, K. ., & Sharma, D. A. . (2020). ECG Signal Analysis for Heart Disease Detection Based on Sensor Data Analysis with Signal Processing by Deep Learning Architectures. Research Journal of Computer Systems and Engineering, 1(1), 06–10. Retrieved from https://technicaljournals.org/RJCSE/index.php/journal/article/view/11

Reddy, B.R.S., Saxena, A.K., Pandey, B.K., Gupta, S., Gurpur, S., Dari, S. S., Dhabliya, D. Machine learning application for evidence image enhancement (2023) Handbook of Research on Thrust Technologies? Effect on Image Processing, pp. 25-38.




How to Cite

Prithi, M. ., M., S. ., Shinde, J. P. ., Kumar, R. ., Deepak, A. ., Pokhariya, H. S. ., & Shrivastava, A. . (2023). Utilizing Machine Learning for Speech Emotion Recognition . International Journal of Intelligent Systems and Applications in Engineering, 12(4s), 809–818. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/3868



Research Article

Most read articles by the same author(s)

1 2 3 > >>