Revolutionizing Single Document Extractive Text Summarization with Improved PageRank

Authors

  • Jyotirmayee Rautaray, Sangram Panigrahi, Ajit Kumar Nayak

Keywords:

TextRank, PageRank, Modified PageRank, Cosine similarity, Dimensionality Reduction

Abstract

In recent days due to the exponential growth of data on the internet, it is now quite challenging to extract information within the time frame specified. A crucial approach to address this issue is an effective and efficient automatic text summarization. This paper focuses on extractive text summarization of single document, taking into account the type of document and summary.  This study introduces the improved-PageRank algorithm, a graph-based text summarization technique that captures the aboutness of text content, which is an enhanced version of the modified PageRank algorithm.  The proposed technique is evaluated against two other approaches, TextRank and modified PageRank, using the dataset from the Document Understanding Conference, i.e. DUC 2002, DUC 2003 and DUC 2005. ROUGE value, range, and coefficient of variation are used to compare the effectiveness of each algorithm. This experimental study clearly indicates that the improved-PageRank technique provides the best result when compared to other techniques.

Downloads

Download data is not yet available.

References

Nenkova, A., & McKeown, K. (2012). A survey of text summarization techniques. Mining text data, 43-76

El-Kassas, W. S., Salama, C. R., Rafea, A. A., & Mohamed, H. K. (2021). Automatic text summarization: A comprehensive survey. Expert systems with applications, 165, 113679

Nenkova, A., & McKeown, K. (2012). A survey of text summarization techniques. Mining text data, 43-76

El-Kassas, W. S., Salama, C. R., Rafea, A. A., & Mohamed, H. K. (2020). EdgeSumm: Graph-based framework for automatic text summarization. Information Processing & Management, 57(6), 102264

Patil, S. P., &Rautray, R. SMATS: Single and Multi Automatic Text Summarization. Karbala International Journal of Modern Science, 9(1), 6

Saini, N., Saha, S., Jangra, A., & Bhattacharyya, P. (2019). Extractive single document summarization using multi-objective optimization: Exploring self-organized differential evolution, grey wolf optimizer and water cycle algorithm. Knowledge-Based Systems, 164, 45-67

Erkan, G., & Radev, D. R. (2004). Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research, 22, 457-479.

Fang, C., Mu, D., Deng, Z., & Wu, Z. (2017). Word-sentence co-ranking for automatic extractive text summarization. Expert Systems with Applications, 72, 189-195.

Goldstein, J., Mittal, V.O., Carbonell, J.G., Kantrowitz, M., 2000. Multi-document summarization by sentence extraction, in. In: NAACL-ANLP 2000 Workshop: Automatic Summarization

Mallick, C., Das, A. K., Dutta, M., Das, A. K., & Sarkar, A. (2019). Graph-based text summarization using modified TextRank. In Soft Computing in Data Analytics: Proceedings of International Conference on SCDA 2018 (pp. 137-146). Springer Singapore

Fatima, Q., &Cenek, M. (2015, August). New graph-based text summarization method. In 2015 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM) (pp. 396-401). IEEE

Patil, V., Krishnamoorthy, M., Oke, P., &Kiruthika, M. (2004). A statistical approach for document summarization. Department of Computer Engineering Fr. C. Rodrigues Institute of Technology, Vashi, Navi Mumbai, Maharashtra, India

Rautray, R., Balabantaray, R. C., & Bhardwaj, A. (2015). Document summarization using sentence features. International Journal of Information Retrieval Research (IJIRR), 5(1), 36-47

Yu, S., Su, J., Li, P., & Wang, H. (2016). Towards high performance text mining: a TextRank-based method for automatic text summarization. International Journal of Grid and High-Performance Computing (IJGHPC), 8(2), 58-75

Liu, Z., Li, P., Zheng, Y., & Sun, M. (2009, August). Clustering to find exemplar terms for keyphrase extraction. In Proceedings of the 2009 conference on empirical methods in natural language processing (pp. 257-266)

Castillo, E., Cervantes, O., &Vilarino, D. (2017). Text analysis using different graph-based representations. Computación y Sistemas, 21(4), 581-599

Rautaray, J., Panigrahi, S., & Nayak, A. (2022, August). An Empirical and Comparative Study of Graph based Summarization Algorithms. In 2022 International Conference on Machine Learning, Computer Systems and Security (MLCSS) (pp. 274-279). IEEE

Manjari, K. U. (2020, October). Extractive summarization of Telugu documents using TextRank algorithm. In 2020 Fourth international conference on I-SMAC (IoT in social, mobile, analytics and cloud)(I-SMAC) (pp. 678-683). IEEE

Barrios, F., López, F., Argerich, L., &Wachenchauzer, R. (2016). Variations of the similarity function of textrank for automated summarization. arXiv preprint arXiv:1602.03606

Mihalcea, R., &Tarau, P. (2004, July). Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing (pp. 404-411)

Erkan, G., &Radev, D. R. (2004). Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research, 22, 457-479

Li, J., Huang, G., Fan, C., Sun, Z., & Zhu, H. (2019). Key word extraction for short text via word2vec, doc2vec, and textrank. Turkish Journal of Electrical Engineering and Computer Sciences, 27(3), 1794-1805

Yadav, A. K., Ranvijay, Yadav, R. S., & Maurya, A. K. (2023). Graph-based extractive text summarization based on single document. Multimedia Tools and Applications, 1-27

Elbarougy, R., Behery, G., & El Khatib, A. (2020). Extractive Arabic text summarization using modified PageRank algorithm. Egyptian informatics journal, 21(2), 73-81

Hua, Z., Fei, L., & Jing, X. (2023). An improved risk prioritization method for propulsion system based on heterogeneous information and PageRank algorithm. Expert Systems with Applications, 212, 118798.

He, S., Guo, F., & Zou, Q. (2020). MRMD2. 0: a python tool for machine learning with feature ranking and reduction. Current Bioinformatics, 15(10), 1213-1221.

Kadriu, K., & Obradovic, M. (2021). Extractive approach for text summarisation using graphs. arXiv preprint arXiv:2106.10955.

Chin-Yew Lin and E.H. Hovy. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of 2003 Language Technology Conference (HLT-NAACL 2003), Edmonton, Canada

Lin, C. Y. (2004). Rouge: A package for automatic evaluation of summaries. Text summarization branches out. 74–81.

Fabbri, A. R., Kryściński, W., McCann, B., Xiong, C., Socher, R., & Radev, D. (2021). Summeval: Re-evaluating summarization evaluation. Transactions of the Association for Computational Linguistics, 9, 391-409.

Gräßler, I., Thiele, H., Oleff, C., Scholle, P., & Schulze, V. (2019, July). Method for analysing requirement change propagation based on a modified pagerank algorithm. In Proceedings of the Design Society: International Conference on Engineering Design (Vol. 1, No. 1, pp. 3681-3690). Cambridge University Press.

Text Retrieval Conference (TREC) website : https://trec.nist.gov

Downloads

Published

24.03.2024

How to Cite

Jyotirmayee Rautaray. (2024). Revolutionizing Single Document Extractive Text Summarization with Improved PageRank. International Journal of Intelligent Systems and Applications in Engineering, 12(3), 3216–3228. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/5927

Issue

Section

Research Article