Leveraging Contextual Factors for Word Sense Disambiguation in Hindi Language

Authors

  • Hirkani Padwad Department of Computer Science and Engineering Shri Ramdeobaba College of Engineering and Management, Nagpur
  • Gunjan Keswani Department of Computer Science and Engineering Shri Ramdeobaba College of Engineering and Management, Nagpur
  • Wani Bisen Department of Computer Science and Engineering Shri Ramdeobaba College of Engineering and Management, Nagpur
  • Rajshree Sharma Department of Computer Science and Engineering Shri Ramdeobaba College of Engineering and Management, Nagpur
  • Sopan Thakre Department of Computer Science and Engineering Shri Ramdeobaba College of Engineering and Management, Nagpur
  • Aditi Tiwari Department of Computer Science and Engineering Shri Ramdeobaba College of Engineering and Management, Nagpur

Keywords:

Word Sense Disambiguation, MuRIL, INLTK

Abstract

This study presents an unsupervised model for addressing word sense disambiguation, to leverage accurate determination of the intended meaning of a word within a sentence. Identification of the correct sense demands high precision for applications like Machine translation, information retrieval, question answering, sentiment analysis, summarization, language generation. In recent years, few developments have been done in this field specifically for Indian languages. The unavailability of large labelled corpora poses a great challenge to applying large language models to this disambiguation task. Our approach leverages the deep learning BERT-based MuRIL model and measuring the Euclidean distance between synsets of words with multiple senses, achieving an accuracy of 89%. Second, we have curated a dataset based on the Indian theories of meanings which uses contextual factors for disambiguating the exact meaning of a word. The outcomes of this study offer valuable insights into the capabilities of language models applied to Indian languages, and their potential in reducing linguistic ambiguity.

Downloads

Download data is not yet available.

References

Lesk M., “Automatic Sense Disambiguation using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone” in Proceedings of the 5th Annual International Conference on Systems Documentation, Ontario, Canada, pp. 24-26, 1986.

Baldwin T., Kim S., Bond F., Fujita S., Martinez D., and Tanaka T., “A Reexamination of MRDbased Word Sense Disambiguation,” Journal of ACM Transactions on Asian Language Processing, vol. 9, no. 1, pp. 1-21, 2010.

Gaurav S Tomar, Manmeet Singh, Shishir Rai, Atul Kumar, Ratna Sanyal and Sudip S, “Probabilistic Latent Semantic Analysis for Unsupervised Word Sense Disambiguation” in International Journal of Computer Science Issues, Vol. 10, Issue 5, 2013

Banerjee S. and Pederson T., “An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet,” in Proceedings of the 3rd International Conference on Computational Linguistics and Intelligent Text Processing, Mexico City, Mexico, pp. 136-145, 2002.

Banerjee S. and Pederson T., “Extended Gloss Overlaps as a Measure of Semantic Relatedness,” available at: http://www.d.umn.edu/~tpederse/ Pubs/ijcai03.pdf, last visited 2013.

Vasilescu F., Langlasi P., and Lapalme G., “Evaluating Variants of the Lesk Approach for Disambiguating Words,” available at: http://www. lrec-conf.org/proceedings/lrec2004/pdf/219.pdf, last visited 2012.

Zhang, D. Q., Chen, S. C. (2003), “Clustering incomplete data using kernel-based fuzzy c-means algorithm”, Neural Processing Letters, 18 (3) 155-162.

Satyendr Singh and Tanveer Siddiqui, “Utilizing Corpus Statistics for Hindi Word Sense Disambiguation”, In The International Arab Journal of Information Technology, Vol. 12, No. 6A, 2015

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. “Bert: Pre-training of deep bidirectional transformers for language understanding”, arXiv preprint arXiv:1810.04805

Luyao Huang, Chi Sun, Xipeng Qiu∗ , Xuanjing Huang, “GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge”, In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pages 3509–3514, Hong Kong, China, November 3–7, 2019

Simran Khanuj, Diksha Bansal, Sarvesh Mehtani, Savya Khosla, Atreyee Dey, Balaji Gopalan, Dilip Kumar Margam, Pooja Aggarwal, Rajiv Teja Nagipogu, Shachi Dave, Shruti Gupta, Subhash Chandra Bose Gali, Vish Subramanian, Partha Talukdar, “MuRIL: Multilingual Representations for Indian Languages”, In arXiv:2103.10730v2 [cs.CL] 2 Apr 2021

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. “Bert: Pre-training of deep bidirectional transformers for language understanding”, arXiv preprint arXiv:1810.04805

Luyao Huang, Chi Sun, Xipeng Qiu∗ , Xuanjing Huang, “GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge”, In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pages 3509–3514, Hong Kong, China, November 3–7, 2019

Simran Khanuj, Diksha Bansal, Sarvesh Mehtani, Savya Khosla, Atreyee Dey, Balaji Gopalan, Dilip Kumar Margam, Pooja Aggarwal, Rajiv Teja Nagipogu, Shachi Dave, Shruti Gupta, Subhash Chandra Bose Gali, Vish Subramanian, Partha Talukdar, “MuRIL: Multilingual Representations for Indian Languages”, In arXiv:2103.10730v2 [cs.CL] 2 Apr 2021

Akhtar, S.S., Gupta, A., Vajpayee, A., Srivastava, A., Shrivastava, M., 2017,pp. In: ‘‘Word similarity datasets for Indian languages: Annotation and baseline systems. Association for Computational Linguistics, Valencia, Spain, pp. 91–94.

Mishra, B. K., & Jain, S. (2023). An Innovative Method for Hindi Word Sense Disambiguation. SN Computer Science, 4(6), 704.

P. Jha, S. Agarwal, A. Abbas and T. Siddiqui, "Comparative Analysis of Path-based Similarity Measures for Word Sense Disambiguation," 2023 3rd International conference on Artificial Intelligence and Signal Processing (AISP), VIJAYAWADA, India, 2023, pp. 1-5, doi: 10.1109/AISP57993.2023.10134960.

Ritesh Panjwani, Diptesh Kanojia, and Pushpak Bhattacharyya, pyiwn: A Python-based API to access Indian Language WordNets, Global WordNet Conference (GWC 2018), January 2018.

Emilie Aussant. Sanskrit Theories on Homonymy and Polysemy . Bulletin d’Études Indiennes, 2014, Les études sur les langues indiennes. Leur contribution à l’histoire des idées linguistiques et à la linguistique contemporaine, 32. ffhalshs-01502381f

Downloads

Published

12.01.2024

How to Cite

Padwad, H. ., Keswani, G. ., Bisen, W. ., Sharma, R. ., Thakre, S. ., & Tiwari, A. . (2024). Leveraging Contextual Factors for Word Sense Disambiguation in Hindi Language. International Journal of Intelligent Systems and Applications in Engineering, 12(12s), 129–136. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/4497

Issue

Section

Research Article

Similar Articles

You may also start an advanced similarity search for this article.