A Comprehensive Multimodal Approach to Assessing Sentimental Intensity and Subjectivity using Unified MSE Model
Keywords:
Multimodal Learning, Subjectivity Assessment, Audio & Text Analysis, Distinctiveness, Unified-modal Supervision.Abstract
In the dynamic realm of multimodal learning, where representation Learning serves as a pivotal key, our research introduces a groundbreaking approach to understanding sentiment and subjectivity in audio and text. Illustration from self-supervised learning, we've innovatively combined multi-modal and Unified--modal tasks, emphasizing the crucial aspects of consistency and distinctiveness. Our training techniques, likened to the art of fine-tuning an instrument, harmonize the learning process, prioritizing samples with distinctive supervisions. Addressing the pressing need for robust datasets and methodologies in combinational text and audio sentiment analysis, we offer the dataset for Multi-modal sentiment intensity assessment at the Opinion Level (MOSI). This meticulously annotated corpus offers insights into subjectivity, sentiment intensity, text features, and audio nuances, setting a benchmark for future research. Our method not only excels in generating Unified-modal supervisions but also stands resilient against benchmarks like MOSI and MOSEI, even competing human curated annotations on the challenging datasets. This pioneering work paves the way for deeper explorations and applications in the burgeoning field of sentiment analysis.
Downloads
References
M. Chen, S. Wang, P. P. Liang, T. Baltrušaitis, A. Zadeh, and L. P. Morency, "Multimodal sentiment analysis with word-level fusion and reinforcement learning," in Proceedings of the 19th ACM International Conference on Multimodal Interaction, 2017, pp. 163–171.
M. Lin et al., "Modern dialogue system architectures,"Journal of Conversational AI, vol. 8, no. 2, pp. 45-60, 2020.
K. Lin and J. Xu, "Emotion recognition in conversational agents,"Dialogue Systems Journal, vol. 14, no. 1, pp. 15-29, 2019.
N. Majumder et al., "Multimodal sentiment analysis using hierarchical fusion with context modeling,"Knowledge-Based Systems, vol. 161, pp. 124–133, 2018.
T. Ahmad, S. U. Ahmed, and N. Ahmad, "Detection of Depression Signals from Social Media Data," in Smart Connected World: Technologies and Applications Shaping the Future, 2021, pp. 191-209.
J. Holler and S. C. Levinson, "Multimodal language processing in human communication,"Trends in Cognitive Sciences, 2019.
S. Dobrišek et al., "Towards efficient multi-modal emotion recognition,"International Journal of Advanced Robotic Systems, vol. 10, no. 1, p. 53, 2013.
B. Zadeh et al., "Tensor Fusion Network for Multimodal Sentiment Analysis," in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017, pp. 1103–1114.
Y. Tsai et al., "Cross-modality representation in sentiment analysis,"Multimodal Systems Journal, vol. 16, no. 3, pp. 40-54, 2019.
Zadeh et al., "multi-attention recurrent network for human communication comprehension," in Thirty-Second AAAI Conference on Artificial Intelligence.
R. Li et al., "Towards discriminative representation learning for speech emotion recognition," in Proceedings of the 28th International Joint Conference on Artificial Intelligence, 2019.
M. U. Khan and F. Ahamad, "An Affective Framework for Multimodal Sentiment Analysis to Navigate Emotional Terrains,"Telematique, vol. 23, no. 01, pp. 70-83, 2024.
Joshi et al., "Inter/intra dependencies modeling in dialogue systems,"Journal of Multimodal Systems, vol. 13, no. 1, pp. 12-28, 2022.
Li et al., "Contextual graph structures for emotion modeling,"Journal of Multimodal Systems, vol. 14, no. 3, pp. 56-71, 2021.
X. Tan, M. Zhuang, X. Lu, and T. Mao, "An Analysis of the Emotional Evolution of Large-Scale Internet Public Opinion Events Based on the BERT-LDA Hybrid Model," in IEEE Access, vol. 9, pp. 15860-15871, 2021, doi: 10.1109/ACCESS.2021.3052566.
S. Ghosh et al., "Context and Knowledge Enriched Transformer Framework for Emotion Recognition in Conversations," in 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 2021, pp. 1-8, doi: 10.1109/IJCNN52387.2021.9533452.
C. Raffel et al., "T5: A unified framework for NLP tasks,"Journal of Natural Language Processing, vol. 26, no. 4, pp. 1302-1317, 2020.
Zadeh et al., "MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos,"IEEE Intelligent Systems, vol. 31, no. 6, pp. 82-88, 2016, doi: 10.48550/arXiv.1606.06259.
S. Poria et al., "MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations," in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy: Association for Computational Linguistics, 2019, pp. 527–536.
C. Busso et al., "IEMOCAP: Interactive emotional dyadic motion capture database,"Language resources and evaluation, vol. 42, pp. 335-359, 2008.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.
 
						 
											


