Unsupervised Word Sense Disambiguation for Marathi language using Word Embeddings
Keywords:
Marathi language, Natural Language Processing, Unsupervised Learning, Word Embeddings, Word Sense DisambiguationAbstract
Word Sense Disambiguation (WSD) is a significant challenge within the field of natural language processing. Marathi is considered to be a language with limited resources. Therefore, there has been less research conducted on the Marathi language. There are many methodologies used in Word Sense Disambiguation (WSD), including Knowledge-based and Machine Learning methods. The Lesk algorithm, which uses WordNet as the sense inventory, is well recognized as a prominent technique within the knowledge-based approach for Word Sense Disambiguation (WSD). Word embeddings are a technique used to encode individual words by using low-dimensional vectors with real-valued components. We introduce a method that leverages word embeddings of the terms in a given phrase, except the ambiguous word. Additionally, it incorporates word embeddings of glosses, synonyms, and instances of the ambiguous words obtained from the Marathi WordNet. The most suitable sense of an ambiguous word is then determined using the context and the word embeddings. From the results of our proposed methodology, we could conclude that including the embeddings of synonyms and examples of ambiguous words increases the accuracy of disambiguating words.
Downloads
References
Arora, H.S., Bhingardive, S., Bhattacharyya, P.: Detecting most frequent sense using word embeddings and babelnet. In: Proceedings of the 8th Global WordNet Conference (GWC), pp. 21–25 (2016)
Lahoti, P., Mittal, N., Singh, G.: A survey on nlp resources, tools, and techniques for marathi language processing. ACM Transactions on Asian and Low-Resource Language Information Processing 22(2), 1–34 (2022)
Eranpurwala, F., Ramane, P., Bolla, B.K.: Comparative study of marathi text classification using monolingual and multilingual embeddings. In: International Conference on Advanced Network Technologies and Intelligent Computing, pp. 441–452 (2021). Springer
Bevilacqua, M., Pasini, T., Raganato, A., Navigli, R.: Recent trends in word sense disambiguation: A survey. In: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21 (2021). International Joint Conference on Artificial Intelligence, Inc
Ransing, R., Gulati, A.: A survey of different approaches for word sense disambiguation. In: ICT Analysis and Applications: Proceedings of ICT4SD 2022, pp. 435–445. Springer (2022)
Bhingardive, S., Singh, D., Rudramurthy, V., Bhattacharyya, P.: Using word embeddings for bilingual unsupervised wsd. In: Proceedings of the 12th International Conference on Natural Language Processing, pp. 59–64 (2015)
Orkphol, K., Yang, W.: Word sense disambiguation using cosine similarity collaborates with word2vec and wordnet. Future Internet 11(5), 114 (2019)
Popale, L., Bhattacharyya, P.: Creating marathi wordnet. The WordNet in Indian Languages, 147–166 (2017)
Miller, G.A.: Wordnet: a lexical database for english. Communications of the ACM 38(11), 39–41 (1995)
Petrolito, T.: A language-independent lesk based approach to word sense disambiguation. In: Proceedings of the 8th Global WordNet Conference (GWC), pp. 275–281 (2016)
Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual International Conference on Systems Documentation, pp. 24–26 (1986)
Banerjee, S., Pedersen, T.: An adapted lesk algorithm for word sense disambiguation using wordnet. In: International Conference on Intelligent Text Processing and Computational Linguistics, pp. 136–145 (2002). Springer
Bhingardive, S., Puduppully, R., Singh, D., Bhattacharyya, P.: Merging verb senses of hindi wordnet using word embeddings. In: Proceedings of the 11th International Conference on Natural Language Processing, pp. 344–352 (2014)
Scarlini, B., Pasini, T., Navigli, R., et al.: With more contexts comes better performance: Contextualized sense embeddings for all-round word sense disambiguation. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 3528–3539 (2020). The Association for Computational Linguistics
Kumar, S., Kumar, S., Kanojia, D., Bhattacharyya, P.: “a passage to india”: Pretrained word embeddings for indian languages. In: Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), pp. 352–357 (2020)
Gaikwad, V., Haribhakta, Y.: Adaptive glove and fasttext model for hindi word embeddings. In: Proceedings of the 7th ACM IKDD CoDS and 25th COMAD, pp. 175–179 (2020)
Khanuja, S., Bansal, D., Mehtani, S., Khosla, S., Dey, A., Gopalan, B., Margam, D.K., Aggarwal, P., Nagipogu, R.T., Dave, S., et al.: Muril: Multilingual representations for indian languages. arXiv preprint arXiv:2103.10730 (2021)
Kakwani, D., Kunchukuttan, A., Golla, S., Gokul, N., Bhattacharyya, A., Khapra, M.M., Kumar, P.: Indicnlpsuite: Monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for indian languages. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 4948–4961 (2020)
Kharate, N.G., Patil, V.H.: Word sense disambiguation for marathi language using wordnet and the lesk approach. In: Proceeding of First Doctoral Symposium on Natural Computing Research: DSNCR 2020, pp. 45–54 (2021). Springer
Kumari, A., Lobiyal, D.: Word2vec’s distributed word representation for hindi word sense disambiguation. In: International Conference on Distributed Computing and Internet Technology, pp. 325–335 (2019). Springer
Bhingardive, S., Singh, D., Rudramurthy, V., Redkar, H., Bhattacharyya, P.: Unsupervised most frequent sense detection using word embeddings. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1238–1243 (2015)
Laatar, R., Aloulou, C., Belguith, L.H.: Word sense disambiguation of arabic language with word embeddings as part of the creation of a historical dictionary. In: LPKM (2017)
Laatar, R., Aloulou, C., Belguith, L.H.: Evaluation of stacked embeddings for arabic word sense disambiguation. Computaci´on y Sistemas 27(2) (2023)
Panjwani, R., Kanojia, D., Bhattacharyya, P.: pyiwn: a python based api to access indian language wordnets. In: Proceedings of the 9th Global Wordnet Conference, pp. 378–383 (2018)
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.