A Comprehensive Survey on Abstractive Text Summarization of Devanagari Script Based Hindi Language

Authors

  • Aparna Madhukar Mete, Manikrao Laxmanrao Dhore

Keywords:

abstractive text summarization, Devanagari script, Hindi and Marathi, generated summaries, neural network, and transfer learning.

Abstract

With the exponential growth of digital content in Indic languages, there is an increasing demand for advanced Natural Language Processing ((NLP) techniques tailored to specific scripts. The research explores the landscape of abstractive text summarization in Devanagari script, with a particular emphasis on Hindi and Marathi, two prominent languages that utilize this script. The survey begins by providing an overview of abstractive text summarization techniques, highlighting the challenges and opportunities specific to languages using the Devanagari script which delves into the existing methodologies, models, and datasets used for abstractive summarization in Hindi and Marathi, offering insights into the unique linguistic features that impact summarization tasks. Furthermore, the survey discusses the impact of linguistic nuances such as compound words, inflections, and contextual dependencies on the efficiency of abstractive summarization models. The research reviews the major advancements in neural network (NN) architectures and pre-trained language models applied to abstractive summarization which analyzes the strengths and limitations of these models, considering factors like model size, training data, and computational resources. In addition to model-centric approaches, the survey explores the role of domain-specific datasets and transfer learning in enhancing the performance of abstractive summarization systems for Devanagari-script languages that also sheds light on the evaluation metrics and benchmarks commonly employed to measures the generated summary qualities, and addressing the challenges of cross-lingual evaluation.

Downloads

Download data is not yet available.

References

C. Zhu, W. Hinthorn, R. Xu, Q. Zeng, M. Zeng, X. Huang, and M. Jiang, “Enhancing factual consistency of abstractive summarization,” arXiv preprint arXiv, 2003.08612, 2020.

J. Cheng, F. Zhang, and X. Guo, “A syntax-augmented and headline-aware neural text summarization method,” IEEE Access, 8, pp.218360-218371, 2020.

J.M. Sanchez-Gomez, M.A. Vega-Rodríguez, and C.J. Pérez, “A decomposition-based multi-objective optimization approach for extractive multi-document text summarization,” Applied Soft Computing, 91, p.106231, 2020.

J. Jiang, H. Zhang, C. Dai, Q. Zhao, H. Feng, Z. Ji, and I. Ganchev, “Enhancements of attention-based bidirectional lstm for hybrid automatic text summarization,” IEEE Access, 9, pp.123660-123671, 2021.

E. Hovy, ‘‘Automated text summarization,’’ in The Oxford Handbook of Computational Linguistics. Oxford, U.K.: Oxford Univ. Press, ch. 32, pp. 583–598, doi: 10.1093/oxfordhb/9780199276349.013.0032, 2005.

Y. Shin, “Multi-Encoder Transformer for Korean Abstractive Text Summarization.” IEEE Access. 2023.

K. Bhatnagar, S. Lonka, and J. Kunal, “San-BERT: Extractive Summarization for Sanskrit Documents using BERT and it's variants,” arXiv preprint arXiv, 2304.01894, 2023.

Dilawari, M.U.G. Khan, S. Saleem, and F.S. Shaikh, “Neural Attention Model for Abstractive Text Summarization Using Linguistic Feature Space,” IEEE Access, 11, pp.23557-23564, 2023.

L. Page, S. Brin, R. Motwani, and T. Winograd, “The pagerank citation ranking: Bring order to the web,” Technical report, stanford University. 1998.

K.K. Mamidala, and S.K. Sanampudi, “Text summarization for Indian languages: a survey,” Int J Adv Res Eng Technol (IJARET), 12(1), pp.530-538, 2021.

Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, 30, 2017.

N. Desai, and P. Shah, “Automatic text summarization using supervised machine learning technique for Hindi language,” Int. J. Res. Eng. Technol, 5(06), pp.361-367, 2016.

K. Kumari, and R. Kumari, ”An Extractive Approach for Automated Summarization of Indian Languages using Clustering Techniques,” In Forum for Information Retrieval Evaluation (Working Notes) (FIRE). CEUR-WS. Org, 2022.

C. Thaokar, and L. Malik, “Test model for summarizing hindi text using extraction method.” In 2013 IEEE Conference on Information & Communication Technologies, (pp. 1138-1143). IEEE, 2013, April.

J. Anitha, P.V.G.D. Prasad Reddy, and M.S. Prasad Babu, “An approach for summarizing hindi text through a hybrid fuzzy neural network algorithm,” Journal of Information & Knowledge Management,13(04), p.1450036, 2014.

V. Dalal, and L. Malik, “Semantic graph based automatic text summarization for hindi documents using particle swarm optimization,” In Information and Communication Technology for Intelligent Systems, (ICTIS 2017)-Volume 2 2 (pp. 284-289), 2018.

N. Moratanch, and S. Chitrakala, “A novel framework for semantic oriented abstractive text summarization,” Journal of Web Engineering, 17(8), pp.675-715, 2018.

Jain, A. Arora, J. Morato, D. Yadav, and K.V. Kumar, “Automatic text summarization for Hindi using real coded genetic algorithm,” Applied Sciences, 12(13), p.6584, 2022.

S. Chopra, M. Auli, and A.M. Rush, “Abstractive sentence summarization with attentive recurrent neural networks,” In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, (pp. 93-98), 2016, June.

S. Vijay, V. Rai, S. Gupta, A. Vijayvargia, and D.M. Sharma, “Extractive text summarisation in hindi,” In 2017 International Conference on Asian Language Processing (IALP), (pp. 318-321). IEEE, 2017, December.

A.N. Gulati, and S.D. Sawarkar, “A novel technique for multidocument Hindi text summarization,” In 2017 international conference on nascent technologies in engineering (ICNTE), (pp. 1-6). IEEE, 2017, January.

N. Kumari, and P. Singh, “Hindi Text Summarization using Sequence to Sequence Neural Network,” ACM Transactions on Asian and Low-Resource Language Information Processing, 22(10), pp.1-18, 2023.

R. Bhansali, A. Bhave, G. Bharat, V. Mahajan, and M.L. Dhore, “Abstractive Text Summarization of Hindi Corpus Using Transformer Encoder-Decoder Model,” In International Symposium on Intelligent Informatics, (pp. 171-185), 2022, August.

Agarwal, S. Naik, and S. Sonawane, “Abstractive Text Summarization for Hindi Language using IndicBART,” In Working Notes of FIRE 2022-Forum for Information Retrieval Evaluation, Kolkata, India, 2022, December.

Kumar, V. Katiyar, and B.K. Chauhan, “Text summarization in Hindi language using TF-IDF,” In Cognitive Informatics and Soft Computing: Proceeding of CISC 2021, (pp. 319-331), 2022.

M. Gupta, and N.K. Garg, “Text summarization of Hindi documents using rule based approach,” In 2016 international conference on micro-electronics and telecommunication engineering (ICMETE), (pp. 366-370), IEEE, 2016, September.

D.K. Gaikwad, D. Sawane, and C.N. Mahender, “Rule Based Question Generation for Marathi Text Summarization using Rule Based Stemmer,” IOSR Journal of Computer Engineering, 3(10), pp.51-54, 2017.

M. Subramaniam, and V. Dalal, “Test model for rich semantic graph representation for Hindi text using abstractive method,” International Research Journal of Engineering and Technology (IRJET), 2(2), pp.113-116, 2015.

P. Kolhe, and A. Kumbhare, “Optimizing accuracy of document summarization using rule mining,” International Journal of Computer Science and Mobile Computing, 6(6), pp.207-216, 2017.

M. Supreet, K. Goel, and M. Gupta, Automatic Hindi Text Summarization Using Selection and Elimination Approach. International Journal of Engineering Applied Sciences and Technology, 5(4), pp.259-266, 2020.

J A. Jadhav, and V. Rajan, “Extractive summarization with swap-net: Sentences and words from alternating pointer networks,” In Proceedings of the 56th annual meeting of the association for computational linguistics, (volume 1: Long papers) (pp. 142-151), 2018, July.

Pareek, G., Modi, D., & Athaiya, A. (2017). A Meticulous Approach for Extractive based Hindi Text Summarization using Genetic Algorithm. Int. J. Innov. Adv. Comput. Sci.(IJACS), 6, 264-273.

Bhargava, Rupal, Gargi Sharma, and Yashvardhan Sharma. "Deep text summarization using generative adversarial networks in Indian languages." Procedia Computer Science 167 (2020): 147-153.

Joshi, Manju Lata, Nisheeth Joshi, and Namita Mittal. "SGATS: Semantic Graph-based Automatic Text Summarization from Hindi Text Documents." Transactions on Asian and Low-Resource Language Information Processing 20, no. 6 (2021): 1-32.

Gupta, P., Nigam, S., & Singh, R. (2023, March). A Statistical Approach for Extractive Hindi Text Summarization Using Machine Translation. In Proceedings of Fourth International Conference on Computer and Communication Technologies: IC3T 2022 (pp. 275-282). Singapore: Springer Nature Singapore.

Suryavanshi, Aniket, Bhavika Gujare, Allan Mascarenhas, and Bhanu Tekwani. "Hindi Multi-Document Text Summarization Using Text Rank Algorithm." Int. J. Comput. Appl 174 (2021): 27-29.

Bandari, Sumalatha, and Vishnu Vardhan Bulusu. "BERT tokenization and hybrid-optimized deep recurrent neural network for Hindi document summarization." International Journal of Fuzzy System Applications (IJFSA) 11, no. 1 (2022): 1-28.

Giri, Virat V., M. M. Math, and U. P. Kulkarni. "Marathi Extractive Text Summarization using Latent Semantic Analysis and Fuzzy Algorithms."

Joshi, Manju Lata, Namita Mittal, and Nisheeth Joshi. "Improving the performance of semantic graph-based keyword extraction and text summarization using fuzzy relations in Hindi Wordnet." Journal of Intelligent & Fuzzy Systems 43, no. 3 (2022): 3771-3788.

Karmakar, Rishabh, Ketki Nirantar, Prathamesh Kurunkar, Pooja Hiremath, and Deptii Chaudhari. "Indian regional language abstractive text summarization using attention-based LSTM neural network." In 2021 International Conference on Intelligent Technologies (CONIT), pp. 1-8. IEEE, 2021.

Tawatia, Kunal, Nishant Jain, and Suman Kundu. "Hindi Document Extractive Summarization: Neural Method on A New Data Set." In 2022 5th International Conference on Computational Intelligence and Networks (CINE), pp. 1-6. IEEE, 2022.

Kwatra, C., & Gupta, K. (2021, September). Extractive and abstractive summarization for hindi text using hierarchical clustering. In 2021 International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems (ICSES) (pp. 1-6). IEEE.

Gupta, Vishal, and Narvinder Kaur. "A novel hybrid text summarization system for Punjabi text." Cognitive Computation 8 (2016): 261-277.

Dhankhar, Sunil, Mukesh Kumar Gupta, Fida Hussain Memon, Surbhi Bhatia, Pankaj Dadheech, and Arwa Mashat. "Support Vector Machine Based Handwritten Hindi Character Recognition and Summarization." Computer Systems Science & Engineering 43, no. 1 (2022).

Gupta, Pooja, Swati Nigam, and Rajiv Singh. "Automatic Extractive Text Summarization using Multiple Linguistic Features." ACM Transactions on Asian and Low-Resource Language Information Processing (2024).

Shah, Aayush, Dhineshkumar Ramasubbu, Dhruv Mathew, and Meet Chetan Gadoya. "Hindi history note generation with unsupervised extractive summarization." In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: Student Research Workshop, pp. 44-49. 2020.

Bandari, Sumalatha, and Vishnu Vardhan Bulusu. "Hybrid Optimization Based Hindi Document Summarization Using Deep Learning Technique."

Ghosh, Akash, Arkadeep Acharya, Prince Jha, Sriparna Saha, Aniket Gaudgaul, Rajdeep Majumdar, Aman Chadha, Raghav Jain, Setu Sinha, and Shivani Agarwal. "MedSumm: A Multimodal Approach to Summarizing Code-Mixed Hindi-English Clinical Queries." In European Conference on Information Retrieval, pp. 106-120. Cham: Springer Nature Switzerland, 2024.

Jha, Anurag Kumar, Kabita Choudhary, and Sujala D. Shetty. "Deep Learning Based Text Translation and Summarization Tool for Hearing Impaired Using Indian Sign Language." In ICPRAM, pp. 426-434. 2023.

Kumar, Doppalapudi Venkata Pavan, Srigadha Shreyas Raj, Pradeepika Verma, and Sukomal Pal. "Extractive Text Summarization using Meta-heuristic Approach." (2022).

Downloads

Published

24.03.2024

How to Cite

Aparna Madhukar Mete. (2024). A Comprehensive Survey on Abstractive Text Summarization of Devanagari Script Based Hindi Language. International Journal of Intelligent Systems and Applications in Engineering, 12(3), 3604–3620. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/5997

Issue

Section

Research Article