Morphological and Contextual Meaning Analysis in Colloquial Tamil Using Yarowsky Influenced Modified Apriori Algorithm
Keywords:
Associative datasets, Prominent root words, 2-Level rule mining, Yarowsky algorithm, Modified apriori algorithm.Abstract
Colloquial Tamil, often referred to as "spoken Tamil" or "conversational Tamil," is the informal variant of the Tamil language used in everyday conversations among native speakers. Colloquial Tamil is rich in idiomatic expressions and colloquialisms. The pronunciation in colloquial Tamil can vary significantly from region to region. The use of colloquial Tamil is influenced by social factors such as age, education level, and urban versus rural upbringing. It serves as a marker of identity and belonging within Tamil-speaking communities. Performing morphological analysis is the need of the hour, especially when regional dialect influenced Tamil content is used across social media. Morphological analysis over the informal Tamil content becomes complex as the language has varying meaning for the same word. To arrive at the actual meaning conveyed by the word, it is imperative to consider the coexisting words in the contextual sentence. These co-existing words roots to the prominent word. In this paper, to support and map the actual contextual meaning of the word under investigation, Yarowsky algorithm influenced modified Apriori algorithm is used. This algorithm along with the generated frequent item sets prove to be a suitable fit for the morphological analyzer in determining the actual contextual meaning. Often the frequent datasets are referred to as associative datasets. As per the literature survey, experimental results demonstrate that Apriori based prediction and time complexity excels over the 2-level grammar based approach. This article concludes with the suggestion that prediction accuracy can be further improved, after performing Yarowsky algorithms initial pruning over the associative datasets.
Downloads
References
Bevilacqua, Michele, Tommaso Pasini, Alessandro Raganato, and Roberto Navigli. "Recent trends in word sense disambiguation: A survey." In International Joint Conference on Artificial Intelligence, pp. 4330-4338. International Joint Conference on Artificial Intelligence, Inc, 2021.
Wickramasinghe, Indika, and Harsha Kalutarage. "Naive Bayes: applications, variations and vulnerabilities: a review of literature with code snippets for implementation." Soft Computing 25, no. 3 (2021): 2277-2293.
Yarowsky, David. "Unsupervised word sense disambiguation rivaling supervised methods." In 33rd annual meeting of the association for computational linguistics, pp. 189-196. 2015.
Telikani, Akbar, Amir H. Gandomi, and Asadollah Shahbahrami. "A survey of evolutionary computation for association rule mining." Information Sciences 524 (2020): 318-352.
Abuleil, Saleem, and Khalid Alsamara. "Using NLP approach for analyzing customer reviews." SOEN—2017 (2017): 117-124.
Croft, William, and Keith T. Poole. "Inferring universals from grammatical variation: Multidimensional scaling for typological analysis." (2018): 1-37.
Croft, William. "On two mathematical representations for “semantic maps”." Zeitschrift für Sprachwissenschaft 41, no. 1 (2022): 67-87.
Navigli, Roberto, and Mirella Lapata. "An experimental study of graph connectivity for unsupervised word sense disambiguation." IEEE transactions on pattern analysis and machine intelligence 32, no. 4 (2019): 678-692.
Elkin, Peter L., Sarah Mullin, Jack Mardekian, Christopher Crowner, Sylvester Sakilay, Shyamashree Sinha, Gary Brady et al. "Using artificial intelligence with natural language processing to combine electronic health record’s structured and free text data to identify nonvalvular atrial fibrillation to decrease strokes and death: Evaluation and case-control study." Journal of medical Internet research 23, no. 11 (2021): e28946.
Baskaran, S., and S. Thiagarajan. "Tamil WordNet S. Rajendran, S. Arulmozi B. Kumara Shanmugam." In 1st International Global WordNet Conference, January 21-25, 2002: Proceedings, p. 271. Central Institute of Indian Languages, 2020.
Clark, Stephen, James R. Curran, and Miles Osborne. "Bootstrapping POS-taggers using unlabelled data." In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003, pp. 49-55. 2013.
Sidorov, Grigori, and Francisco Viveros-Jiménez. "One sense per discourse heuristic for improving precision of WSD methods based on lexical intersections with the context." POLIBITS 57 (2018): 45-50.
Lombardi, Michele, Michela Milano, and Andrea Bartolini. "Empirical decision model learning." Artificial Intelligence 244 (2017): 343-367.
Byrd, Mark S., and Navin Khaneja. "Characterization of the positivity of the density matrix in terms of the coherence vector representation." Physical Review A 68, no. 6 (2013): 062322.
Wachsmuth, Henning, Benno Stein, and Yamen Ajjour. "“PageRank” for argument relevance." In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pp. 1117-1127. 2017.
Hohn, Nicolas, Darryl Veitch, and Patrice Abry. "Cluster processes: a natural language for network traffic." IEEE Transactions on Signal processing 51, no. 8 (2021): 2229-2244.
Dahbi, Azzeddine, Youssef Balouki, and Taoufiq Gadi. "Using multiple minimum support to auto-adjust the threshold of support in apriori algorithm." In International Conference on Soft Computing and Pattern Recognition, pp. 111-119. Cham: Springer International Publishing, 2017.
Thivaharan, S., K. Hariharan, and R. Christie Jerin Kumar. "content grading system for Tamil based on indexed set weights using PCKimmo." International journal of engineering research and technology 8, no. 3 (2019): 177-181.
Srivatsun, G., and S. Thivaharan. "Modelling a machine learning based multivariate content grading system for YouTube Tamil-post analysis." Journal of Intelligent & Fuzzy Systems Preprint (2023): 1-12.
Dang, Nhan Cach, María N. Moreno-García, and Fernando De la Prieta. "Sentiment analysis based on deep learning: A comparative study." Electronics 9, no. 3 (2020): 483.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.