Revolutionizing Single Document Extractive Text Summarization with Improved PageRank
Keywords:
TextRank, PageRank, Modified PageRank, Cosine similarity, Dimensionality ReductionAbstract
In recent days due to the exponential growth of data on the internet, it is now quite challenging to extract information within the time frame specified. A crucial approach to address this issue is an effective and efficient automatic text summarization. This paper focuses on extractive text summarization of single document, taking into account the type of document and summary. This study introduces the improved-PageRank algorithm, a graph-based text summarization technique that captures the aboutness of text content, which is an enhanced version of the modified PageRank algorithm. The proposed technique is evaluated against two other approaches, TextRank and modified PageRank, using the dataset from the Document Understanding Conference, i.e. DUC 2002, DUC 2003 and DUC 2005. ROUGE value, range, and coefficient of variation are used to compare the effectiveness of each algorithm. This experimental study clearly indicates that the improved-PageRank technique provides the best result when compared to other techniques.
Downloads
References
Nenkova, A., & McKeown, K. (2012). A survey of text summarization techniques. Mining text data, 43-76
El-Kassas, W. S., Salama, C. R., Rafea, A. A., & Mohamed, H. K. (2021). Automatic text summarization: A comprehensive survey. Expert systems with applications, 165, 113679
Nenkova, A., & McKeown, K. (2012). A survey of text summarization techniques. Mining text data, 43-76
El-Kassas, W. S., Salama, C. R., Rafea, A. A., & Mohamed, H. K. (2020). EdgeSumm: Graph-based framework for automatic text summarization. Information Processing & Management, 57(6), 102264
Patil, S. P., &Rautray, R. SMATS: Single and Multi Automatic Text Summarization. Karbala International Journal of Modern Science, 9(1), 6
Saini, N., Saha, S., Jangra, A., & Bhattacharyya, P. (2019). Extractive single document summarization using multi-objective optimization: Exploring self-organized differential evolution, grey wolf optimizer and water cycle algorithm. Knowledge-Based Systems, 164, 45-67
Erkan, G., & Radev, D. R. (2004). Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research, 22, 457-479.
Fang, C., Mu, D., Deng, Z., & Wu, Z. (2017). Word-sentence co-ranking for automatic extractive text summarization. Expert Systems with Applications, 72, 189-195.
Goldstein, J., Mittal, V.O., Carbonell, J.G., Kantrowitz, M., 2000. Multi-document summarization by sentence extraction, in. In: NAACL-ANLP 2000 Workshop: Automatic Summarization
Mallick, C., Das, A. K., Dutta, M., Das, A. K., & Sarkar, A. (2019). Graph-based text summarization using modified TextRank. In Soft Computing in Data Analytics: Proceedings of International Conference on SCDA 2018 (pp. 137-146). Springer Singapore
Fatima, Q., &Cenek, M. (2015, August). New graph-based text summarization method. In 2015 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM) (pp. 396-401). IEEE
Patil, V., Krishnamoorthy, M., Oke, P., &Kiruthika, M. (2004). A statistical approach for document summarization. Department of Computer Engineering Fr. C. Rodrigues Institute of Technology, Vashi, Navi Mumbai, Maharashtra, India
Rautray, R., Balabantaray, R. C., & Bhardwaj, A. (2015). Document summarization using sentence features. International Journal of Information Retrieval Research (IJIRR), 5(1), 36-47
Yu, S., Su, J., Li, P., & Wang, H. (2016). Towards high performance text mining: a TextRank-based method for automatic text summarization. International Journal of Grid and High-Performance Computing (IJGHPC), 8(2), 58-75
Liu, Z., Li, P., Zheng, Y., & Sun, M. (2009, August). Clustering to find exemplar terms for keyphrase extraction. In Proceedings of the 2009 conference on empirical methods in natural language processing (pp. 257-266)
Castillo, E., Cervantes, O., &Vilarino, D. (2017). Text analysis using different graph-based representations. Computación y Sistemas, 21(4), 581-599
Rautaray, J., Panigrahi, S., & Nayak, A. (2022, August). An Empirical and Comparative Study of Graph based Summarization Algorithms. In 2022 International Conference on Machine Learning, Computer Systems and Security (MLCSS) (pp. 274-279). IEEE
Manjari, K. U. (2020, October). Extractive summarization of Telugu documents using TextRank algorithm. In 2020 Fourth international conference on I-SMAC (IoT in social, mobile, analytics and cloud)(I-SMAC) (pp. 678-683). IEEE
Barrios, F., López, F., Argerich, L., &Wachenchauzer, R. (2016). Variations of the similarity function of textrank for automated summarization. arXiv preprint arXiv:1602.03606
Mihalcea, R., &Tarau, P. (2004, July). Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing (pp. 404-411)
Erkan, G., &Radev, D. R. (2004). Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research, 22, 457-479
Li, J., Huang, G., Fan, C., Sun, Z., & Zhu, H. (2019). Key word extraction for short text via word2vec, doc2vec, and textrank. Turkish Journal of Electrical Engineering and Computer Sciences, 27(3), 1794-1805
Yadav, A. K., Ranvijay, Yadav, R. S., & Maurya, A. K. (2023). Graph-based extractive text summarization based on single document. Multimedia Tools and Applications, 1-27
Elbarougy, R., Behery, G., & El Khatib, A. (2020). Extractive Arabic text summarization using modified PageRank algorithm. Egyptian informatics journal, 21(2), 73-81
Hua, Z., Fei, L., & Jing, X. (2023). An improved risk prioritization method for propulsion system based on heterogeneous information and PageRank algorithm. Expert Systems with Applications, 212, 118798.
He, S., Guo, F., & Zou, Q. (2020). MRMD2. 0: a python tool for machine learning with feature ranking and reduction. Current Bioinformatics, 15(10), 1213-1221.
Kadriu, K., & Obradovic, M. (2021). Extractive approach for text summarisation using graphs. arXiv preprint arXiv:2106.10955.
Chin-Yew Lin and E.H. Hovy. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of 2003 Language Technology Conference (HLT-NAACL 2003), Edmonton, Canada
Lin, C. Y. (2004). Rouge: A package for automatic evaluation of summaries. Text summarization branches out. 74–81.
Fabbri, A. R., Kryściński, W., McCann, B., Xiong, C., Socher, R., & Radev, D. (2021). Summeval: Re-evaluating summarization evaluation. Transactions of the Association for Computational Linguistics, 9, 391-409.
Gräßler, I., Thiele, H., Oleff, C., Scholle, P., & Schulze, V. (2019, July). Method for analysing requirement change propagation based on a modified pagerank algorithm. In Proceedings of the Design Society: International Conference on Engineering Design (Vol. 1, No. 1, pp. 3681-3690). Cambridge University Press.
Text Retrieval Conference (TREC) website : https://trec.nist.gov
Downloads
Published
How to Cite
Issue
Section
License
![Creative Commons License](http://i.creativecommons.org/l/by-sa/4.0/88x31.png)
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.