Designing of a Novel Framework for Marathi Natural Language Processing: MR-LIWC2015

Authors

  • Saroj Date Dr. Babasaheb Ambedkar Marathwada University, Chh. Sambhajinagar, Maharashtra, India
  • Sachin N. Deshmukh Dr. Babasaheb Ambedkar Marathwada University, Chh. Sambhajinagar, Maharashtra, India
  • Ryan Boyd University of Texas at Austin, USA
  • Ashwini Ashokkumar University of Texas at Austin, USA
  • James W. Pennebaker University of Texas at Austin, USA

Keywords:

English LIWC, LIWC, Marathi, Marathi LIWC, Marathi translation, NLP, Natural language processing, Sentiment analysis, translation, translation procedure, translation process

Abstract

The role of linguistic analysis in understanding human behaviour, emotions, and psychological states has gained significant prominence in various domains, including psychology, social sciences, and computational linguistics. The Linguistic Inquiry and Word Count (LIWC) is a widely used tool, developed by American social psychologist James W. Pennebaker and team of the University of Texas, Austin, enables automated linguistic analysis of text. This analysis provides insights into psychological and emotional dimensions. However, its applicability has been mainly restricted to English and a few other languages, limiting its usage in multilingual contexts. Originally developed in English, it has been adapted to several other languages like German, Dutch, Spanish, Chinese, Turkish, French, etc. However, this tool is not yet available for Marathi language- a major language spoken by people of Maharashtra, India. This paper presents a novel framework for the development and evaluation of a Marathi translation of the LIWC dictionary, aiming to expand its utility to the Marathi speaking population. The development process of Marathi version of LIWC is based on English LIWC-2015. The work is unique since it is the first LIWC translation for any Indian language. The development of Marathi version of LIWC includes several steps like initial translation  and wildcard(*) expansion, dictionary expansion , linguistic analysis , wordlist development  ,cultural adaptation ,wordlist validation process , refinement phase , equivalence research,  addition of summary variables  and wrap-up final dictionary in official LIWC format. The evaluation of the Marathi LIWC is conducted on a diverse dataset of Marathi text samples, encompassing social media posts, speech transcripts, blogs, short stories and book summaries. The performance of the translated dictionary is assessed based on its ability to   accurately capture linguistic features, emotional tones, and psychological constructs present in the Marathi language. To evaluate the effectiveness of the Marathi LIWC, a diverse dataset of Marathi texts was analyzed using both the original English LIWC and the newly developed Marathi LIWC. The results of the evaluation demonstrate that the Marathi LIWC maintains its alignment with the original LIWC's underlying linguistic and psychological dimensions while catering to the specifics of the Marathi language. The translated dictionary exhibited promising reliability and validity in capturing linguistic and psychological features within Marathi texts.

Downloads

Download data is not yet available.

References

Pennebaker, J. W., Francis, M. E., & Booth, R. J., “Linguistic inquiry and word count: LIWC 2001”, Mahway: Lawrence Erlbaum Associates, 71(2001), 2001.

Chung, C. K., & Pennebaker, J. W., “Linguistic inquiry and word count (LIWC): pronounced “Luke,”... and other useful facts”, In Applied natural language processing: Identification, investigation and resolution, 2012, (pp. 206-229). IGI Global.

Boyd, R. L., Ashokkumar, A., Seraj, S., & Pennebaker, J. W., “The development and psychometric properties of LIWC-22”, Austin, TX: University of Texas at Austin, 2022, 1-47.

Carvalho, F., Rodrigues, R. G., Santos, G., Cruz, P., Ferrari, L., & Guedes, G. P., “Evaluating the Brazilian Portuguese version of the 2015 LIWC Lexicon with sentiment analysis in social networks”, In Anais do VIII Brazilian Workshop on Social Network Analysis and Mining, 2019, (pp. 24-34). SBC.

Balage Filho, P., Pardo, T. A. S., & Aluísio, S., “An evaluation of the Brazilian Portuguese LIWC dictionary for sentiment analysis”, In Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology,2013

Huang Jinlan, Chung, C. K., Hui, N., Lin Yizheng, Xie Yitai, Lam, B. C., ... & Pennebaker, J. W. , “The Development of the Chinese Linguistic Inquiry and Word Count Dictionary]”, Chinese Journal of Psychology, 2012, 54(2), 185-201

Boot, P., Zijlstra, H., & Geenen, R, “The Dutch translation of the linguistic inquiry and word count (LIWC) 2007 dictionary”, Dutch Journal of Applied Linguistics, 2017, 6(1), 65-76.

Van Wissen, L., & Boot, P.,”An electronic translation of the LIWC Dictionary into Dutch”, In Electronic lexicography in the 21st century: Proceedings of eLex 2017 conference (pp. 703-715). Brno: Lexical Computing., 2017

Piolat, A., Booth, R., Chung, C. K., Davids, M., & Pennebaker, J. W., “The French dictionary for LIWC: Modalities of construction and examples of use| La version franaise du dictionnaire pour le LIWC:”, modalités de construction et exemples d'utilisation, 2011

Meier, T., Boyd, R. L., Pennebaker, J. W., Mehl, M. R., Martin, M., Wolf, M., & Horn, A. B., ““LIWC auf Deutsch”: The development, psychometrics, and introduction of DE-LIWC2015”, PsyArXiv, (a)., 2019

Wolf, M., Horn, A. B., Mehl, M. R., Haug, S., Pennebaker, J. W., & Kordy, H.,”Computergestützte quantitative textanalyse: äquivalenz und robustheit der deutschen version des linguistic inquiry and word count”, Diagnostica,, 2008, 54(2), 85-98.

Agosti, A., & Rellini, A., “The Italian liwc dictionary”, Austin, TX: LIWC. Net ,2007

Igarashi, T., Okuda, S., & Sasahara, K., “Development of the Japanese Version of the Linguistic Inquiry and Word Count Dictionary 2015”, Frontiers in psychology, 2022, 13, 841534

Dudău, D. P., & Sava, F. A.,”The development and validation of the Romanian version of Linguistic Inquiry and Word Count 2015 (Ro-LIWC2015)”, Current Psychology, 2022, 41(6), 3597-3614.

Kailer, A., & Chung, C. K., “The russian liwc2007 dictionary”, Austin, TX: LIWC. net., 2011

Bjekić, J., Lazarević, L. B., Živanović, M., & Knežević, G., “Psychometric evaluation of the Serbian dictionary for automatic text analysis-LIWCser”, Psihologija, 2014, 47(1), 5-32.

Ramirez-Esparza, N., Chung, C., Kacewic, E., & Pennebaker, J. ,”The psychology of word use in depression forums in English and in Spanish: Testing two text analytic approaches”, In Proceedings of the international AAAI conference on web and social media (Vol. 2, No. 1, pp. 102-108), 2008

Zasiekin, S., “Exploring Bohdan Lepky’s Translation Ethics Using Linguistic Inquiry and Word Count”, East European Journal of Psycholinguistics, 8(2)., 2021

Popale, Lata, and Pushpak Bhattacharyya.,"Creating Marathi WordNet.", The WordNet in Indian Languages : 147-166., 2017

Falotico, R., & Quatto, P., “Fleiss’ kappa statistic without paradoxes”, Quality & Quantity, 2015, 49, 463-470.

Downloads

Published

11.01.2024

How to Cite

Date, S. . ., Deshmukh, S. N. ., Boyd, R. ., Ashokkumar, A. ., & Pennebaker, J. W. . (2024). Designing of a Novel Framework for Marathi Natural Language Processing: MR-LIWC2015. International Journal of Intelligent Systems and Applications in Engineering, 12(11s), 01–14. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/4414

Issue

Section

Research Article