Neuroscience-Inspired CNN Model for Automated Emotion Recognition and Captioning in Film Soundtracks


  • V. Bhuvana Kumar Computer Science & Engineering, Hindustan Institute of Science & Technology, Chennai - India
  • M. Kathiravan Computer Science & Engineering, Hindustan Institute of Science & Technology, Chennai, India


Convolutional Neural Networks (CNNs), revolutionize, Music Emotion Recognition (MER), uniformity


The capacity of music to evoke emotions has gained increasing attention, particularly with the surge in music streaming services and automated recommendation systems. This research focuses on Music Emotion Recognition (MER), integrating audio feature extraction from digital samples and varied machine learning techniques. Unique to this approach is the consideration of neuroscience findings on musical emotion perception, often overlooked in standard MER methods. The aim is to advance automatic music subtitling, specifically targeting emotional detection in movie soundtracks. This study utilizes reputable scientific musical databases recognized in neuroscience research. A key experimental tool is the Constant-Q-Transform spectrograms, which effectively represent human perception of musical tones, analyzed using Convolutional Neural Networks (CNNs). This combination has proven effective in classifying emotions in 2-second musical segments, focusing on primary emotions like happiness, sadness, and fear, crucial for movie music captioning. The results demonstrate significant variations across different models, highlighting a lack of uniformity in quality metrics. Nonetheless, this research is a substantial step towards automated, accessible music captioning, capable of identifying emotional intents in movie soundtracks. This advancement in MER and automatic subtitling could revolutionize how we experience and interact with music in movies.


Download data is not yet available.


Yang, X., Li, Y., & Wang, J. (2022). Emotion recognition in film music using deep convolutional neural networks and recurrent neural networks. IEEE Transactions on Affective Computing, 13(2), 354-365.

Chen, L., Hu, B., & Wang, B. (2021). Multimodal music and video fusion for emotion recognition in films. Proceedings of the 29th ACM International Conference on Multimedia, 254-262.

Zhang, Z., Liu, Q., & Chen, S. (2023). Explainable convolutional neural networks for music emotion recognition. Neural Networks, 165, 154-166.

Wu, Y., Zhang, T., & Li, H. (2022). Music emotion recognition based on neuroimaging features and deep learning. Frontiers in Neuroscience, 16, 954.

Alluri, V., Toiviainen, P., & Ristaniemi, T. (2020). Towards a computational musicology of emotion: Combining music theory and neuroscience for music emotion recognition. Frontiers in Psychology, 11, 1054.

Gold, B. P., Frank, M. J., Bogert, B., & Brattico, E. (2019). Neurological basis of music-evoked emotions: A systematic review. Neuroscience & Biobehavioral Reviews, 102, 145-158.

Wu Z (2022) Research on Automatic ClassificationMethod of Ethnic Music Emotion Based on Machine Learning. J Math 2022.

Seo Y S, Huh J H (2019) Automatic emotion-based music classification for supporting intelligent IoT applications. Electron (Switzerland) 8(2)

Medina YO, Beltrán JR, Baldassarri S (2022) Emotional classification of music using neural networks with the MediaEval dataset. Person Ubiquitous Comput 26(4):1237–1249.

Han B J, Rho S, Dannenberg R B, Hwang E (2009) SMERS: Music emotion recognition using support vector regression. In Proceedings of the 10th International Society for Music Information, Retrieval Conference, ISMIR 2009, pages 651–656.

Yang X, Dong Y, Li J (2018) Review of data features-based music emotion recognition methods. Multimed Syst 24(4):365–389.

Panda R, MalheiroRM, PaivaRP (2020) Audio Features for Music Emotion Recognition: a Survey. IEEE Trans Affect Comput pages, 1–1.

Xiao Z, Dellandrea E, Dou W, Chen L (2008) What is the best segment duration for music mood analysis ? In 2008 International Workshop on Content-Based Multimedia Indexing, CBMI 2008, Conference Proceedings. IEEE, pages 17–24, 6.

Li T LH, Chan A B, Chun A HW (2010) Automatic musical pattern feature extraction using convolutional neural network. In Proceedings of the International MultiConference of Engineers and Computer Scientists 2010, IMECS 2010, pages 546–550, Hong Kong.

Won M, Ferraro A, Bogdanov D, Serra X (2020) Evaluation of CNN-based automatic music tagging models. Proceedings of the Sound and Music Computing Conferences, 2020-June:331–337.

Eerola T, Vuoskoski JK (2011) A comparison of the discrete and dimensional models of emotion inmusic. Psychol Music 39(1):18–49.

Vieillard S, Peretz I, Gosselin N, Khalfa S, Gagnon L, Bouchard B (2008) Happy, sad, scary and peaceful musical excerpts for research on emotions. Cogn Emot 22(4):720–752.

Zhang W, Lei W, Xu X, Xing X (2016) Improved music genre classification with convolutional neural networks. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, volume 08-12-Sept, pages, 3304–3308.




How to Cite

Kumar, V. B. ., & Kathiravan, M. . (2024). Neuroscience-Inspired CNN Model for Automated Emotion Recognition and Captioning in Film Soundtracks. International Journal of Intelligent Systems and Applications in Engineering, 12(15s), 215–222. Retrieved from



Research Article