Unsupervised Word Sense Disambiguation for Marathi language using Word Embeddings

Authors

  • Rasika Ransing, Archana Gulati

Keywords:

Marathi language, Natural Language Processing, Unsupervised Learning, Word Embeddings, Word Sense Disambiguation

Abstract

Word Sense Disambiguation (WSD) is a significant challenge within the field of natural language processing. Marathi is considered to be a language with limited resources. Therefore, there has been less research conducted on the Marathi language. There are many methodologies used in Word Sense Disambiguation (WSD), including Knowledge-based and Machine Learning methods. The Lesk algorithm, which uses WordNet as the sense inventory, is well recognized as a prominent technique within the knowledge-based approach for Word Sense Disambiguation (WSD). Word embeddings are a technique used to encode individual words by using low-dimensional vectors with real-valued components. We introduce a method that leverages word embeddings of the terms in a given phrase, except the ambiguous word. Additionally, it incorporates word embeddings of glosses, synonyms, and instances of the ambiguous words obtained from the Marathi WordNet. The most suitable sense of an ambiguous word is then determined using the context and the word embeddings. From the results of our proposed methodology, we could conclude that including the embeddings of synonyms and examples of ambiguous words increases the accuracy of disambiguating words.

Downloads

Download data is not yet available.

References

Arora, H.S., Bhingardive, S., Bhattacharyya, P.: Detecting most frequent sense using word embeddings and babelnet. In: Proceedings of the 8th Global WordNet Conference (GWC), pp. 21–25 (2016)

Lahoti, P., Mittal, N., Singh, G.: A survey on nlp resources, tools, and techniques for marathi language processing. ACM Transactions on Asian and Low-Resource Language Information Processing 22(2), 1–34 (2022)

Eranpurwala, F., Ramane, P., Bolla, B.K.: Comparative study of marathi text classification using monolingual and multilingual embeddings. In: International Conference on Advanced Network Technologies and Intelligent Computing, pp. 441–452 (2021). Springer

Bevilacqua, M., Pasini, T., Raganato, A., Navigli, R.: Recent trends in word sense disambiguation: A survey. In: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21 (2021). International Joint Conference on Artificial Intelligence, Inc

Ransing, R., Gulati, A.: A survey of different approaches for word sense disambiguation. In: ICT Analysis and Applications: Proceedings of ICT4SD 2022, pp. 435–445. Springer (2022)

Bhingardive, S., Singh, D., Rudramurthy, V., Bhattacharyya, P.: Using word embeddings for bilingual unsupervised wsd. In: Proceedings of the 12th International Conference on Natural Language Processing, pp. 59–64 (2015)

Orkphol, K., Yang, W.: Word sense disambiguation using cosine similarity collaborates with word2vec and wordnet. Future Internet 11(5), 114 (2019)

Popale, L., Bhattacharyya, P.: Creating marathi wordnet. The WordNet in Indian Languages, 147–166 (2017)

Miller, G.A.: Wordnet: a lexical database for english. Communications of the ACM 38(11), 39–41 (1995)

Petrolito, T.: A language-independent lesk based approach to word sense disambiguation. In: Proceedings of the 8th Global WordNet Conference (GWC), pp. 275–281 (2016)

Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual International Conference on Systems Documentation, pp. 24–26 (1986)

Banerjee, S., Pedersen, T.: An adapted lesk algorithm for word sense disambiguation using wordnet. In: International Conference on Intelligent Text Processing and Computational Linguistics, pp. 136–145 (2002). Springer

Bhingardive, S., Puduppully, R., Singh, D., Bhattacharyya, P.: Merging verb senses of hindi wordnet using word embeddings. In: Proceedings of the 11th International Conference on Natural Language Processing, pp. 344–352 (2014)

Scarlini, B., Pasini, T., Navigli, R., et al.: With more contexts comes better performance: Contextualized sense embeddings for all-round word sense disambiguation. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 3528–3539 (2020). The Association for Computational Linguistics

Kumar, S., Kumar, S., Kanojia, D., Bhattacharyya, P.: “a passage to india”: Pretrained word embeddings for indian languages. In: Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), pp. 352–357 (2020)

Gaikwad, V., Haribhakta, Y.: Adaptive glove and fasttext model for hindi word embeddings. In: Proceedings of the 7th ACM IKDD CoDS and 25th COMAD, pp. 175–179 (2020)

Khanuja, S., Bansal, D., Mehtani, S., Khosla, S., Dey, A., Gopalan, B., Margam, D.K., Aggarwal, P., Nagipogu, R.T., Dave, S., et al.: Muril: Multilingual representations for indian languages. arXiv preprint arXiv:2103.10730 (2021)

Kakwani, D., Kunchukuttan, A., Golla, S., Gokul, N., Bhattacharyya, A., Khapra, M.M., Kumar, P.: Indicnlpsuite: Monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for indian languages. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 4948–4961 (2020)

Kharate, N.G., Patil, V.H.: Word sense disambiguation for marathi language using wordnet and the lesk approach. In: Proceeding of First Doctoral Symposium on Natural Computing Research: DSNCR 2020, pp. 45–54 (2021). Springer

Kumari, A., Lobiyal, D.: Word2vec’s distributed word representation for hindi word sense disambiguation. In: International Conference on Distributed Computing and Internet Technology, pp. 325–335 (2019). Springer

Bhingardive, S., Singh, D., Rudramurthy, V., Redkar, H., Bhattacharyya, P.: Unsupervised most frequent sense detection using word embeddings. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1238–1243 (2015)

Laatar, R., Aloulou, C., Belguith, L.H.: Word sense disambiguation of arabic language with word embeddings as part of the creation of a historical dictionary. In: LPKM (2017)

Laatar, R., Aloulou, C., Belguith, L.H.: Evaluation of stacked embeddings for arabic word sense disambiguation. Computaci´on y Sistemas 27(2) (2023)

Panjwani, R., Kanojia, D., Bhattacharyya, P.: pyiwn: a python based api to access indian language wordnets. In: Proceedings of the 9th Global Wordnet Conference, pp. 378–383 (2018)

Downloads

Published

24.03.2024

How to Cite

Archana Gulati, R. R. . (2024). Unsupervised Word Sense Disambiguation for Marathi language using Word Embeddings. International Journal of Intelligent Systems and Applications in Engineering, 12(3), 1374–1380. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/5528

Issue

Section

Research Article