Extractive Summarization of Kannada Multi Documents using LDA
Keywords:
Kannada documents, relevant status value, topic modelling, LDA, multi-document summarizationAbstract
With the main content of the source text intact, Automatic Text Summarization (ATS) condenses and presents the information to the user in a more manageable format. In the scientific literature, many strategies for summarizing texts have been studied for languages with substantial resources. However, ATS is a challenging system and difficult undertaking for languages with limited resources like Kannada. The absence of a reference corpus and Language processing presents challenges in terms of adequate processing tools. We prepared a dataset of news stories written in Kannada because there wasn't a standard collection available. The work demonstrates an extractive topic modelling approach to multi-document textual presentation for Kannada newspapers. To begin, we employ the latent Dirichlet allocation technique to identify latent themes on which the cluster contents modelling technique used. The vector space model is then used for creating the inputted document's sentence vector and dependent vector. Sentences are arranged in accordance with the topic and sentence vectors of the document, taking into account the appropriate status value. Non-redundancy is maximized in the resulting summary.
The assessment results for Kannada reports show that, in comparison to the existing text summarizing algorithms, the proposed technique produces a summary that is more similar to human-generated descriptions.
Downloads
References
Widyassari, A. P., Rustad, S., Shidik, G. F., Noersasongko, E., Syukur, A., & Affandy, A. (2022). Review of automatic text summarization techniques & methods. Journal of King Saud University-Computer and Information Sciences, 34(4), 1029-1046.
Radev, D. R., Jing, H., Styś, M., & Tam, D. (2004). Centroid-based summarization of multiple documents. Information Processing & Management, 40(6), 919-938.
Mao, X., Yang, H., Huang, S., Liu, Y., & Li, R. (2019). Extractive summarization using supervised and unsupervised learning. Expert systems with applications, 133, 173-181.
Yau, C. K., Porter, A., Newman, N., & Suominen, A. (2014). Clustering scientific documents with topic modeling. Scientometrics, 100, 767-786.
Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., & Zhao, L. (2019). Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools and Applications, 78, 15169-15211.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022.
Arora, R., & Ravindran, B. (2008, July). Latent dirichlet allocation based multi-document summarization. In Proceedings of the second workshop on Analytics for noisy unstructured text data (pp. 91-97).
Twinandilla, S., Adhy, S., Surarso, B., & Kusumaningrum, R. (2018). Multi-document summarization using k-means and latent dirichlet allocation (lda)–significance sentences. Procedia Computer Science, 135, 663-670.
Yang, G., Wen, D., Chen, N. S., & Sutinen, E. (2015). A novel contextual topic model for multi-document summarization. Expert Systems with Applications, 42(3), 1340-1352.
Rani, R., & Lobiyal, D. K. (2021). An extractive text summarization approach using tagged-LDA based topic modeling. Multimedia tools and applications, 80, 3275-3305.
Rani, U., & Bidhan, K. (2021). Comparative assessment of extractive summarization: textrank tf-idf and lda. Journal of Scientific Research, 65(1), 304-311.
[[12] Kondath, M., Suseelan, D. P., & Idicula, S. M. (2022). Extractive summarization of Malayalam documents using latent Dirichlet allocation: An experience. Journal of Intelligent Systems, 31(1), 393-406.
Gunasundari, S., Shylaja, M. J., Rajalaksmi, S., & Aarthi, M. K. IMPROVED DRIVEN TEXT SUMMARIZATION USING PAGERANKING ALGORITHM AND COSINE SIMILARITY.
Pokharkar, A., Dhumal, P., Singh, A., & Hadawale, H. (2022). Text Summarizer Using NLP. Available at SSRN 4097878.
Senthamizh, S. R., & Arutchelvan, K. (2022). Automatic text summarization using document clustering named entity recognition. International Journal of Advanced Computer Science and Applications, 13(9).
Jain, R. (2022). Automatic Text Summarization of Hindi Text Using Extractive Approach. ECS Transactions, 107(1), 4469
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.