Enhancing Sentiment Analysis of Marathi-English Code-Mixed Texts using an Ensemble Model

Authors

  • Zoya Fahad Khan Research Scholar, Department of Computer Engineering, Datta Meghe College of Engineering, Airoli, Navi Mumbai, Maharashtra, India.
  • S. D. Sawarkar Professor, Department of Computer Engineering, Datta Meghe College of Engineering, Airoli, Navi Mumbai, Maharashtra, India.

Keywords:

Sentiment Analysis, Code-Mixed Language, Deep Learning, Spiking Neural Network, Ensemble Model, Multilingual Sentiment Analysis, Marathi-English Code Mixing

Abstract

In a linguistically diverse nation like India, people frequently blend languages when communicating, resulting in code-mixed language. This practice is particularly prominent on online platforms, where individuals feel most at ease expressing their views. The fluidity of language switching, however, presents a formidable challenge when it comes to analyzing sentiment in such code-mixed texts, which are informal and unstructured. The objective of this study is to tackle the aforementioned challenge through the introduction of a novel method for sentiment analysis on Marathi-English mixed data. Our solution involves the development of a holistic ensemble model, integrating conventional machine learning techniques with a cutting-edge Spiking Neural Network (SNN) using deep learning. This strategy enables the efficient extraction of sentiments from the distinctive linguistic context of Marathi-English code-mixed data. Our model not only focuses on sentiment analysis but also addresses the critical issue of grammatical transitions in code-mixed language. Additionally, it efficiently identifies ambiguous words in code-mixed texts. In order to evaluate the effectiveness of our developed model, we organized a thorough comparison of performance with existing sentiment analysis models. Our findings reveal that our ensemble model, combining n-gram Multinomial Naïve Bayes with the SNN, outperforms other models in accurately deciphering the sentiments hidden within Marathi-English code-mixed texts. This research has made remarkable impact to the field of sentiment analysis in multilingual and mixed data and provides a robust solution for understanding user sentiments in this unique linguistic environment.

Downloads

Download data is not yet available.

References

M. V. Mäntylä, D. Graziotin, and M. Kuutila, "The evolution of sentiment analysis—A review of research topics, venues, and top cited papers," Computer Science Review, vol. 27, pp. 16-32, Feb. 2018.

Abbasi, H. Chen, and A. Salem, "Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums," ACM Transactions on Information Systems (TOIS), vol. 26, no. 3, pp. 1-34, Jun. 2008.

Joshi, A. Balamurali, and P. Bhattacharyya, "A fall-back strategy for sentiment analysis in Hindi: a case study," in Proceedings of the 8th ICON, 2010.

K. Bali, J. Sharma, M. Choudhury, and Y. Vyas, "I am borrowing ya mixing? An Analysis of English-Hindi Code Mixing on Facebook," in Proceedings of the First Workshop on Computational Approaches to Code Switching, pp. 116-126, 2014.

U. Barman, A. Das, J. Wagner, and J. Foster, "Code mixing: A challenge for language identification in the language of social media," in Proceedings of The First Workshop on Computational Approaches to Code Switching, pp. 13-23, Oct. 25, 2014.

S. Sharma, P. Srinivas, and R. Balabantaray, "Text normalization of code mix and sentiment analysis," in International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2015.

Y. Sharma, V. Mangat, and M. Kaur, "A practical approach to Sentiment Analysis of Hindi tweets," in 2015 1st International Conference on Next Generation Computing Technologies (NGCT), Dehradun, pp. 677-680, 2015.

D. Sitaram, S. Murthy, D. Ray, D. Sharma, and K. Dhar, "Sentiment analysis of mixed language employing Hindi-English code switching," in International Conference on Machine Learning and Cybernetics (ICMLC), Guangzhou, pp. 271-276, 2015.

V. Jha, N. Manjunath, P. D. Shenoy, K. R. Venugopal, and L. M. Patnaik, "HOMS: Hindi opinion mining system," in IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS), Kolkata, pp. 366-371, 2015.

Balahur and J. M. Perea-Ortega, "Sentiment analysis system adaptation for multilingual processing," Information Processing and Management: an International Journal, vol. 51, no. 4, pp. 547-556, Jul. 2015.

R. Bhargava, Y. Sharma, and S. Sharma, "Sentiment analysis for mixed script Indic sentences," in 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Jaipur, pp. 524-529, 2016.

S. Rani and P. Kumar, "A sentiment analysis system to improve teaching and learning," Computer, vol. 50, pp. 36-43, 2017.

D. Vilares, M. A. Alonso, and C. Gómez-Rodrígue, "Supervised sentiment analysis in multilingual Information Processing & Management," vol. 53, no. 3, pp. 595-607, Elsevier Ltd., May 2017.

P. Impana and J. S. Kallimani, "Cross-lingual sentiment analysis for Indian regional languages," in International Conference on Electrical, Electronics, Communication, Computer, and Optimization Techniques (ICEECCOT), Mysuru, pp. 1-6, 2017.

Pravalika, V. Oza, N. P. Meghana, and S. S. Kamath, "Domain-specific sentiment analysis approaches for code-mixed social network data," in 2017 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Delhi, pp. 1-6, 2017.

S. M. Mohammad, P. Sobhani, and S. Kiritchenko, "Stance and Sentiment in Tweets," ACM Transactions on Internet Technology (TOIT), vol. 17, no. 3, Jul. 2017.

Hasan, S. Moin, A. Karim, and S. Shamshirband, "Machine Learning-Based Sentiment Analysis for Twitter Accounts," Math. Comput. Appl., 2018.

K. Shalini, H. B. Ganesh, M. A. Kumar, and K. P. Soman, "Sentiment Analysis for Code-Mixed Indian Social Media Text with Distributed Representation," in 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Bangalore, pp. 1126-1131, 2018.

Hamdi, K. Shaban, and A. Zainal, "CLASENTI: A Class-Specific Sentiment Analysis Framework," ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), vol. 17, no. 4, pp. 1-28, Aug. 2018.

N. Choudhary, R. Singh, V. A. Rao, and M. Shrivastava, "Twitter corpus of Resource-Scarce Languages for Sentiment Analysis and Multilingual Emoji Prediction," in Proceedings of the 27th International Conference on Computational Linguistics, pp. 1570-1577, 2018.

F. Iqbal, J. Maqbool, B. C. M. Fung, R. Batool, A. M. Khattak, S. Aleem, and P. C. K. Hung, "A Hybrid Framework for Sentiment Analysis Using Genetic Algorithm-based Feature Reduction," IEEE Access, pp. 1–1, 2019.

T. Y.S.S. Santosh, K. V.S. Aravind, "Hate Speech Detection in Hindi-English code mixed Social Media Text Data," in CoDS-COMAD '19: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data, pp. 310–313, January 2019.

Khandelwal, N. Kumar, "A Unified System for Aggression Identification in English code-mixed and Unilingual Texts," in CoDS COMAD 2020: Proceedings of the 7th ACM IKDD CoDS and 25th COMAD, pp. 55–64, January 2020.

K. Rudra, A. Sharma, K. Bali, "Identifying and Analyzing Different Aspects of English-Hindi Code-Switching in Twitter," ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), vol. 18, Article No.: 29, pp. 1–28, Issue 3, July 2019.

Bawa, P. Khadpe, P. Joshi, "Do Multilingual Users Prefer Chat-bots that Code-Mix? Let's Nudge and Find Out!" Proceedings of the ACM on Human-Computer Interaction, vol. 4, Issue CSCW1, May 2020, Article No.: 041, pp. 1–23.

Z. Wang, S. Y. M. Lee, S. Li, "Emotion Analysis in Code Switching Text With Joint Factor Graph Model," IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), vol. 25, no. 3, pp. 469–480, March 2017.

Tundis, G. Mukherjee, M. Mühlhäuser, "Mixed code text analysis for the detection of online hidden propaganda," ARES '20: Proceedings of the 15th International Conference on Availability, Reliability and Security, Article No.: 76, pp. 1–7, August 2020.

M. Singh, V. Goyal, S. Raj, "Sentiment Analysis of English-Punjabi Code Mixed Social Media Content for Agriculture Domain," in 2019 4th International Conference on Information Systems and Computer Networks (ISCON), IEEE.

M. Graff, S. Miranda-Jimenez, E. S. Tellez, D. Moctezuma, "EvoMSA: A Multilingual Evolutionary Approach for Sentiment Analysis," IEEE Computational Intelligence Magazine, vol. 15, no. 1, pp. 1-28, 2020.

Das and B. Gamback, "Identifying languages at the word level in code-mixed Indian social media text," Proceedings of the 11th International Conference on Natural Language Processing, pp. 378–387, 2014.

S. Banerjee, A. Kuila, A. Roy, S. K. Naskar, P. Rosso, S. Bandyopadhyay, "A Hybrid Approach for Transliterated Word-Level Language Identification: CRF with Post-Processing Heuristics," Proceedings of the Forum for Information Retrieval Evaluation, pp. 54–59, 2014.

S. Ghosh, S. Ghosh, D. Das, "Part-of-speech Tagging of Code-Mixed Social Media Text," Proceedings of the Second Workshop on Computational Approaches to Code-Switching, pp. 90–97, 2016.

Joshi, A. Prabhu, M. Shrivastava, V. Varma, "Towards Sub-Word Level Compositions for Sentiment Analysis of Hindi-English Code-Mixed Text," Proceedings of the 26th International Conference on Computational Linguistics (COLING), pp. 2482–2491, 2016.

Myers-Scotton, "Common and uncommon ground: Social and structural factors in code-switching," Language in Society, vol. 22, no. 4, pp. 475–503, 1993.

M. Bedi, S. Kumar, M. S. Akhtar, T. Chakraborty, "Multi-modal Sarcasm Detection and Humor Classification in Code-mixed Conversations," IEEE Transactions on Affective Computing, Early Access, May 2021.

D. Wang, B. Jing, C. Lu, J. Wu, G. Liu, C. Du, F. Zhuang, "Coarse Alignment of Topic and Sentiment: A Unified Model for Cross-Lingual Sentiment Classification," IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 2, pp. 736–747, Feb. 2021.

Banea, R. Mihalcea, J. Wiebe, "Porting Multilingual Subjectivity Resources across Languages," IEEE Transactions on Affective Computing, vol. 4, no. 2, pp. 211–225, Apr.-Jun. 2013.

M. Xiao, Y. Guo, "Feature Space Independent Semi-Supervised Domain Adaptation via Kernel Matching," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 1, pp. 54–66, Jan. 1, 2015.

H. Chen, Q. Ma, L. Yu, Z. Lin, J. Yan, "Corpus-Aware Graph Aggregation Network for Sequence Labeling," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 2048–2057, May 2021.

Shahade, K. Walse, V. Thakare, M. Atique, “Multi-lingual opinion mining for social media discourses: an approach using deep learning based hybrid fine-tuned smith algorithm with adam optimizer,” International Journal of Information Management Data Insights, Volume 3, Issue 2, November 2023.

Downloads

Published

24.03.2024

How to Cite

Khan, Z. F. ., & Sawarkar, S. D. . (2024). Enhancing Sentiment Analysis of Marathi-English Code-Mixed Texts using an Ensemble Model. International Journal of Intelligent Systems and Applications in Engineering, 12(18s), 741 –. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/5038

Issue

Section

Research Article