“A Neural Word Embedding Based Transformer Model for Improving Malayalam Question Answering on Health Domain”


  • Liji S. K. Department of Computer Science, Sullamussalam Science, College, Malappuram, Kerala
  • Muhamed Ilyas P. Department of Computer Science, Sullamussalam Science, College, Malappuram, Kerala, India


Information Retrieval, Malayalam Question Answering, Word Embedding, Recurrent Neural Networks (RNNs), Health Domain, Bidirectional Encoder Representation from Transformers (BERT)


The pursuit of effective Human-Computer interaction has been ongoing since the emergence of modern computing and Artificial Intelligence. Natural Language Processing techniques play a crucial role in implementing Question Answering and Information Retrieval systems. This paper introduces a novel approach employing a Bidirectional Encoder Representation from Transformers (BERT) model, which is based on neural word embeddings, to enhance Malayalam Question Answering in the health domain. The study involves the training and fine-tuning of the BERT model specifically for Question-Answering tasks, utilizing an annotated Malayalam SQUAD dataset related to health. The system demonstrates remarkable performance with an F1 score of 86%, surpassing the accuracy of our earlier models based on word embeddings and Recurrent Neural Networks (RNNs).


Download data is not yet available.


Perez, J., Arenas, M., and Gutierrez, C. 2009. Semantics and complexity of SPARQL. ACM Trans. Database Syst. 34, 3, Article 16 (August 2009), 45 pages. DOI = 10.1145/1567274.1567278 .


Jurafsky D and Martin,J H. (2014). Speech and language processing (Vol. 3): Pearson London.

Eric Brill, Susan Dumais and Michele Banko . “An Analysis of the AskMSR Question-Answering System “. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia, July 2002, pp. 257-264. Association for Computational Linguistics.

Jiafeng Guo, Yixing Fan, Qingyao Ai, W. Bruce Croft. “ Semantic Matching by Non-Linear Word Transportation for Information Retrieval “. CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge ManagementOctober 2016 Pages 701– 710


Marwa Naili, Anja Habacha Chaibi.," Comparative study of word embedding methods in topic segmentation ". International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, KES2017, September 2017, Elsevier.

Rajesh Bordawekar, Oded Shmueli . "Using Word Embedding to Enable Semantic Queries in Relational Databases ``. DEEM'17, Chicago, IL, USA. DOI: http://dx.doi.org/10.1145/3076246.3076251. © 2017 ACM.

Xiao Huang, Jingyuan Zhang et al, " Knowledge Graph Embedding Based Question Answering" WSDM '19, February 11–15, 2019, Melbourne, VIC, Australia,

https://doi.org/10.1145/3289600.3290956, ACM.

Marco Antonio Calijorne Soares and Fernando Silva Parreiras “A literature review on question answering techniques, paradigms and systems”. https://doi.org/10.1016/j.jksuci.2018.08.005. Journal of King Saud University - Computer and Information Sciences ,Volume 32, Issue 6, uly 2020, Pages 635- 646

Connor Holmes , Daniel Mawhirter , Yuxiong He, Feng Yan, Bo Wu.“GRNN: Low-Latency and Scalable RNN Inference on GPUs”. EuroSys '19: Proceedings of the Fourteenth EuroSys Conference 2019March 2019 Article No.: 41Pages 1–16 https://doi.org/10.1145/3302424.3303949

Drew A. Hudson, Christopher D. Manning. “GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering”. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 6700-6709.

Lukovnikov D., Fischer A., Lehmann J. (2019) “Pretrained Transformers for Simple Question Answering over Knowledge Graphs. In: Ghidini C. et al. (eds) The Semantic Web – ISWC 2019. ISWC 2019. Lecture Notes in Computer Science, vol 11778. Springer, Cham. https://doi.org/10.1007/978-3-030-30793-6_27

T. Hori, J. Cho and S. Watanabe, "End-to-end Speech Recognition With Word-Based Rnn Language Models," 2018 IEEE Spoken Language Technology Workshop (SLT), 2018, pp. 389-396, doi: 10.1109/SLT.2018.8639693.

Wei Li, Yunfang Wu. “ Multi-level Gated Recurrent Neural Network for Dialog Act Classification”. ArXiv:1910.01822. 4 Oct 2019.

Alex Sherstinsky, Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network, Physica D: Nonlinear Phenomena, Volume 404, 2020,132306, ISSN 0167-2789,


Budiharto, W., Andreas, V. & Gunawan, A.A.S. Deep learning-based question answering system for intelligent humanoid robot.J Big Data7,77 (2020). https://doi.org/10.1186/s40537-020-00341-6

Hrituraj Singh and Sumit Shekhar.”STL-CQA: Structure-based Transformers with Localization and Encoding for Chart Question Answering”. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 3275–3284, November 16–20, 2020. c 2020 Association for Computational Linguistics.

Thomas Wolf, Lysandre Debut, Victor Sanh et al. “Transformers: State-of-the-Art Natural Language Processing”. Proceedings of the 2020 EMNLP (Systems Demonstrations), pages 38–45 November 16-20, 2020. c 2020 Association for Computational Linguistics.

Yuwen Zhang and Zhaozhuo Xu. “ BERT for Question Answering on SQuAD 2.0”. Preprint, Stanford University.

Jamshid Mozafari, Afsaneh Fatemi and Mohammad Ali."BAS: An Answer Selection Method Using BERT Language Model”. Journal of Computing and Security.

22108/jcs.2021.128002.1066. October 2021.

Wu X., Lv S., Zang L., Han J., Hu S. (2019) Conditional BERT Contextual Augmentation. In: Rodrigues J. et al. (eds) Computational Science – ICCS 2019. ICCS 2019. Lecture Notes in Computer Science, vol 11539. Springer, Cham. https://doi.org/10.1007/978-3-030-22747-0_7.

Betty van Aken, Benjamin Winter, Alexander Loser, and Felix A. Gers.“How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations”. CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge ManagementNovember 2019 Pages 1823–1832


Esteva A, Kale A, Paulus R, Hashimoto K, Yin W, Radev D, Socher R. COVID-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization. NPJ Digit Med. 2021 Apr 12;4(1):68. doi: 10.1038/s41746-021-00437-0. PMID: 33846532; PMCID: PMC8041998.

Yates, Andrew and Nogueira, Rodrigo and Lin, Jimmy. “Pretrained Transformers for Text Ranking: BERT and Beyond”. Association for Computing Machinery, 2021. https://doi.org/10.1145/3437963.3441667.

Timo Moller, Antony Reina, Raghavan Jaykumar, Malte Pietsch “COVID-QA: A Question Answering Dataset for COVID-19”. 29 Jul 2020, ACL 2020 Workshop NLP-COVID Submission.




How to Cite

S. K., L. ., & Ilyas P., M. . (2024). “A Neural Word Embedding Based Transformer Model for Improving Malayalam Question Answering on Health Domain”. International Journal of Intelligent Systems and Applications in Engineering, 12(14s), 542–547. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/4691



Research Article