Extractive Summarization of Kannada Multi Documents using LDA

Authors

  • Veena R. Research Scholar, Sri Siddartha Academy of Higher Education, Tumakuru, India
  • D. Ramesh Professor and Head, Master of Computer Application, Sri Siddartha Academy of Higher Education, Tumakuru, India
  • Hanumanthappa M. Senior Professor, Department of Computer Science and Applications, Bangalore University, Bengaluru, India

Keywords:

Kannada documents, relevant status value, topic modelling, LDA, multi-document summarization

Abstract

With the main content of the source text intact, Automatic Text Summarization (ATS) condenses and presents the information to the user in a more manageable format. In the scientific literature, many strategies for summarizing texts have been studied for languages with substantial resources. However, ATS is a challenging system and difficult undertaking for languages with limited resources like Kannada. The absence of a reference corpus and Language processing presents challenges in terms of adequate processing tools. We prepared a dataset of news stories written in Kannada because there wasn't a standard collection available. The work demonstrates an extractive topic modelling approach to multi-document textual presentation for Kannada newspapers. To begin, we employ the latent Dirichlet allocation technique to identify latent themes on which the cluster contents modelling technique used. The vector space model is then used for creating the inputted document's sentence vector and dependent vector. Sentences are arranged in accordance with the topic and sentence vectors of the document, taking into account the appropriate status value. Non-redundancy is maximized in the resulting summary.

The assessment results for Kannada reports show that, in comparison to the existing text summarizing algorithms, the proposed technique produces a summary that is more similar to human-generated descriptions.

Downloads

Download data is not yet available.

References

Widyassari, A. P., Rustad, S., Shidik, G. F., Noersasongko, E., Syukur, A., & Affandy, A. (2022). Review of automatic text summarization techniques & methods. Journal of King Saud University-Computer and Information Sciences, 34(4), 1029-1046.

Radev, D. R., Jing, H., Styś, M., & Tam, D. (2004). Centroid-based summarization of multiple documents. Information Processing & Management, 40(6), 919-938.

Mao, X., Yang, H., Huang, S., Liu, Y., & Li, R. (2019). Extractive summarization using supervised and unsupervised learning. Expert systems with applications, 133, 173-181.

Yau, C. K., Porter, A., Newman, N., & Suominen, A. (2014). Clustering scientific documents with topic modeling. Scientometrics, 100, 767-786.

Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., & Zhao, L. (2019). Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools and Applications, 78, 15169-15211.

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022.

Arora, R., & Ravindran, B. (2008, July). Latent dirichlet allocation based multi-document summarization. In Proceedings of the second workshop on Analytics for noisy unstructured text data (pp. 91-97).

Twinandilla, S., Adhy, S., Surarso, B., & Kusumaningrum, R. (2018). Multi-document summarization using k-means and latent dirichlet allocation (lda)–significance sentences. Procedia Computer Science, 135, 663-670.

Yang, G., Wen, D., Chen, N. S., & Sutinen, E. (2015). A novel contextual topic model for multi-document summarization. Expert Systems with Applications, 42(3), 1340-1352.

Rani, R., & Lobiyal, D. K. (2021). An extractive text summarization approach using tagged-LDA based topic modeling. Multimedia tools and applications, 80, 3275-3305.

Rani, U., & Bidhan, K. (2021). Comparative assessment of extractive summarization: textrank tf-idf and lda. Journal of Scientific Research, 65(1), 304-311.

[[12] Kondath, M., Suseelan, D. P., & Idicula, S. M. (2022). Extractive summarization of Malayalam documents using latent Dirichlet allocation: An experience. Journal of Intelligent Systems, 31(1), 393-406.

Gunasundari, S., Shylaja, M. J., Rajalaksmi, S., & Aarthi, M. K. IMPROVED DRIVEN TEXT SUMMARIZATION USING PAGERANKING ALGORITHM AND COSINE SIMILARITY.

Pokharkar, A., Dhumal, P., Singh, A., & Hadawale, H. (2022). Text Summarizer Using NLP. Available at SSRN 4097878.

Senthamizh, S. R., & Arutchelvan, K. (2022). Automatic text summarization using document clustering named entity recognition. International Journal of Advanced Computer Science and Applications, 13(9).

Jain, R. (2022). Automatic Text Summarization of Hindi Text Using Extractive Approach. ECS Transactions, 107(1), 4469

Downloads

Published

24.03.2024

How to Cite

R., V. ., Ramesh, D. ., & M., H. . (2024). Extractive Summarization of Kannada Multi Documents using LDA. International Journal of Intelligent Systems and Applications in Engineering, 12(18s), 561–570. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/5004

Issue

Section

Research Article