Comparative Analysis of Lexical Chain, Bidirectional Encoder Representations from Transformers (BERT) and Graph based Approaches for Condensation of Low Resource Language Documents


  • Pranjali Deshpande Department of Computer Engineering, MKSSS’s Cummins College of Engg. for Women, Pune-52, India
  • Sunita Jahirabadkar Department of Computer and Information Technology (CI), Department of Technology, SPPU, Pune-07, India


Automatic Summarization, BERT, Extractive Summarization, Graph based approach, Low Resource Languages, Lexical Chain


Humans use language as their primary and exclusive means of communication. There are around 7000 different languages spoken in the world. Among them, Low Resource Languages (LRL) are ones that do not have the linguistic resources required to create statistical NLP applications. The most common way that people express and store their thoughts is through writing. Technological developments are making the world smaller by making distant communication more accessible. Owing to the rise in internet usage, fresh textual content is created every second. Not all of the information in this text is helpful. In light of this, document condensation or summarization is becoming a more important responsibility. There are two methods for creating summaries: extractive and abstractive. While essential phrases and sentences from the original document are kept in an extractive summary, an abstractive summary is created by reworking the main sentences. When it comes to LRL materials, summarizing becomes more difficult. The studies for condensation or automatic summarization of LRL documents using BERT, lexical chain and Graph based approach are the main topic of this study.


Download data is not yet available.


Deshpande, Pranjali, and Sunita Jahirabadkar. "A Survey on Statistical Approaches for Abstractive Summarization of Low Resource Language Documents." Smart Trends in Computing and Communications. Springer, Singapore, 2022. 729-738.

Barzilay, Regina, and Michael Elhadad. "Using lexical chains for text summarization." Advances in automatic text summarization (1999): 111-121.

P. Deshpande and S. Jahirabadkar, "Study of Low Resource Language Document Extractive Summarization using Lexical chain and Bidirectional Encoder Representations from Transformers (BERT)," International Conference on Computational Performance Evaluation (ComPE), 2021, pp. 457-461.

MarathiWordNet:, an initiative by Vivek Magazine.

Devlin, Jacob, Chang, Ming-Wei, Lee, Kenton and Toutanova, Kristina. "BERT: Pre- training of Deep Bidirectional Transformers for Language Understanding.” Paper Presented at the meeting of the NAACL-HLT (1), 2019.

Vaswani, Ashish, Shazeer, Noam, Parmar, Niki, Uszkoreit, Jakob, Jones, Llion, Gomez, Aidan N., Kaiser, Lukasz and Polosukhin, Illia “Attention Is All You Need”. (2017). cite arxiv: 1706.03762

EI-Kassas, Wafaa S., et al. "EdgeSumm: Graph-based framework for automatic text summarization." Information Processing & Management 57.6 (2020): 102264.

Brants, Thorsten. "TnT-a statistical part-of-speech tagger." arXiv preprint cs/0003055 (2000).

Mandelbaum, Amit, and Adi Shalev. "Word embeddings and their use in sentence classification tasks." arXiv preprint arXiv:1610.08229 (2016).

M. M. Haider, M. A. Hossin, H. R. Mahi and H. Arif, "Automatic Text Summarization Using Gensim Word2Vec and K-Means Clustering Algorithm," 2020 IEEE Region 10 Symposium (TENSYMP), 2020, pp. 283-286, doi: 10.1109/TENSYMP50017.2020.9230670.




How to Cite

Deshpande, P. ., & Jahirabadkar, S. . (2024). Comparative Analysis of Lexical Chain, Bidirectional Encoder Representations from Transformers (BERT) and Graph based Approaches for Condensation of Low Resource Language Documents. International Journal of Intelligent Systems and Applications in Engineering, 12(16s), 08–15. Retrieved from



Research Article