Neuroscience-Inspired CNN Model for Automated Emotion Recognition and Captioning in Film Soundtracks
Keywords:
Convolutional Neural Networks (CNNs), revolutionize, Music Emotion Recognition (MER), uniformityAbstract
The capacity of music to evoke emotions has gained increasing attention, particularly with the surge in music streaming services and automated recommendation systems. This research focuses on Music Emotion Recognition (MER), integrating audio feature extraction from digital samples and varied machine learning techniques. Unique to this approach is the consideration of neuroscience findings on musical emotion perception, often overlooked in standard MER methods. The aim is to advance automatic music subtitling, specifically targeting emotional detection in movie soundtracks. This study utilizes reputable scientific musical databases recognized in neuroscience research. A key experimental tool is the Constant-Q-Transform spectrograms, which effectively represent human perception of musical tones, analyzed using Convolutional Neural Networks (CNNs). This combination has proven effective in classifying emotions in 2-second musical segments, focusing on primary emotions like happiness, sadness, and fear, crucial for movie music captioning. The results demonstrate significant variations across different models, highlighting a lack of uniformity in quality metrics. Nonetheless, this research is a substantial step towards automated, accessible music captioning, capable of identifying emotional intents in movie soundtracks. This advancement in MER and automatic subtitling could revolutionize how we experience and interact with music in movies.
Downloads
References
Yang, X., Li, Y., & Wang, J. (2022). Emotion recognition in film music using deep convolutional neural networks and recurrent neural networks. IEEE Transactions on Affective Computing, 13(2), 354-365.
Chen, L., Hu, B., & Wang, B. (2021). Multimodal music and video fusion for emotion recognition in films. Proceedings of the 29th ACM International Conference on Multimedia, 254-262.
Zhang, Z., Liu, Q., & Chen, S. (2023). Explainable convolutional neural networks for music emotion recognition. Neural Networks, 165, 154-166.
Wu, Y., Zhang, T., & Li, H. (2022). Music emotion recognition based on neuroimaging features and deep learning. Frontiers in Neuroscience, 16, 954.
Alluri, V., Toiviainen, P., & Ristaniemi, T. (2020). Towards a computational musicology of emotion: Combining music theory and neuroscience for music emotion recognition. Frontiers in Psychology, 11, 1054.
Gold, B. P., Frank, M. J., Bogert, B., & Brattico, E. (2019). Neurological basis of music-evoked emotions: A systematic review. Neuroscience & Biobehavioral Reviews, 102, 145-158.
Wu Z (2022) Research on Automatic ClassificationMethod of Ethnic Music Emotion Based on Machine Learning. J Math 2022.
Seo Y S, Huh J H (2019) Automatic emotion-based music classification for supporting intelligent IoT applications. Electron (Switzerland) 8(2)
Medina YO, Beltrán JR, Baldassarri S (2022) Emotional classification of music using neural networks with the MediaEval dataset. Person Ubiquitous Comput 26(4):1237–1249.
Han B J, Rho S, Dannenberg R B, Hwang E (2009) SMERS: Music emotion recognition using support vector regression. In Proceedings of the 10th International Society for Music Information, Retrieval Conference, ISMIR 2009, pages 651–656.
Yang X, Dong Y, Li J (2018) Review of data features-based music emotion recognition methods. Multimed Syst 24(4):365–389.
Panda R, MalheiroRM, PaivaRP (2020) Audio Features for Music Emotion Recognition: a Survey. IEEE Trans Affect Comput pages, 1–1.
Xiao Z, Dellandrea E, Dou W, Chen L (2008) What is the best segment duration for music mood analysis ? In 2008 International Workshop on Content-Based Multimedia Indexing, CBMI 2008, Conference Proceedings. IEEE, pages 17–24, 6.
Li T LH, Chan A B, Chun A HW (2010) Automatic musical pattern feature extraction using convolutional neural network. In Proceedings of the International MultiConference of Engineers and Computer Scientists 2010, IMECS 2010, pages 546–550, Hong Kong.
Won M, Ferraro A, Bogdanov D, Serra X (2020) Evaluation of CNN-based automatic music tagging models. Proceedings of the Sound and Music Computing Conferences, 2020-June:331–337.
Eerola T, Vuoskoski JK (2011) A comparison of the discrete and dimensional models of emotion inmusic. Psychol Music 39(1):18–49.
Vieillard S, Peretz I, Gosselin N, Khalfa S, Gagnon L, Bouchard B (2008) Happy, sad, scary and peaceful musical excerpts for research on emotions. Cogn Emot 22(4):720–752.
Zhang W, Lei W, Xu X, Xing X (2016) Improved music genre classification with convolutional neural networks. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, volume 08-12-Sept, pages, 3304–3308.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.