SummaGen: Next-Generation Seq-to-Seq Model for Summarizing Unstructured Text

Authors

  • Sachin Solanki Medi-Caps University, Indore – 453331, INDIA
  • Suresh Jain Medi-Caps University, Indore – 453331, INDIA
  • Kailash Chandra Bandhu Medi-Caps University, Indore – 453331, INDIA

Keywords:

Deep learning (DL), Automatic text summarization (ATS), LSTM, Rouge metric, Daily mail, Sequence to Sequence (Seq-to-Seq)

Abstract

In today’s digital era large volume of textual data is generated every second, we can search anything on the web, and it never say “Sorry! Unable to find it on internet”. It always come with plenty of suggestions and data, now practically it is impossible to go through all the data and reach the final decisions. Exponential growth of textual information, automatic text summarization has emerged as a crucial answer. Redundancy, coherence, co-reference, and semantic links between words and sentences are only a few examples of concerns that might be prioritized to strengthen the essential of a summary. In this research, we examine how to improve semantic link between words and sentences, it can help bring about a more accurate generated summary. The suggested technique generates summaries of texts using a pre-existing deep learning model based on a Seq-to-Seq LSTM encoder decoder. Sentence summaries and dictionaries are mapped against one another to see how close they are conceptually. The suggested method was tested on the CNN/Daily-Mail dataset, which is available for public use and contains unstructured text describing news items. Using Rouge scores (Rouge-1, Rouge-2, and Rouge-L), we evaluate how well our system performs in comparison to the current gold standard for extractive text summarization. With the Seq-to-Seq model, the proposed approach produced a 42.74 percent Rouge-1 score, a 12.46 percent Rouge-2 score, and a 43.01 percent Rouge-L score.

Downloads

Download data is not yet available.

References

M. A. Fattah and F. Ren, “{GA}, {MR}, {FFNN}, {PNN} and {GMM} based models for automatic text summarization,” Comput. Speech {&}amp$mathsemicolon$ Lang., vol. 23, no. 1, pp. 126–144, Jan. 2009, doi: 10.1016/j.csl.2008.04.002.

R. A. García-Hernández and Y. Ledeneva, “Word Sequence Models for Single Text Summarization,” 2009 Second International Conferences on Advances in Computer-Human Interactions. IEEE, 2009. doi: 10.1109/achi.2009.58.

B. Hachey, “Multi-document summarisation using generic relation extraction,” in Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing Volume 1 - {EMNLP} {textquotesingle}09, Association for Computational Linguistics, 2009. doi: 10.3115/1699510.1699565.

K. Sarkar, “Improving Multi-document Text Summarization Performance using Local and Global Trimming,” in Proceedings of the First International Conference on Intelligent Human Computer Interaction, Springer India, 2009, pp. 272–282. doi: 10.1007/978-81-8489-203-1_27.

S. Harabagiu and F. Lacatusu, “Using topic themes for multi-document summarization,” {ACM} Trans. Inf. Syst., vol. 28, no. 3, pp. 1–47, Jun. 2010, doi: 10.1145/1777432.1777436.

V. Gupta and G. S. Lehal, “A Survey of Text Summarization Extractive Techniques,” J. Emerg. Technol. Web Intell., vol. 2, no. 3, Aug. 2010, doi: 10.4304/jetwi.2.3.258-268.

R. Ferreira et al., “Assessing sentence scoring techniques for extractive text summarization,” Expert Syst. Appl., vol. 40, no. 14, pp. 5755–5764, Oct. 2013, doi: 10.1016/j.eswa.2013.04.023.

S. Elfayoumy and J. Thoppil, “A Survey of Unstructured Text Summarization Techniques,” Int. J. Adv. Comput. Sci. Appl., vol. 5, no. 4, 2014, doi: 10.14569/ijacsa.2014.050421.

R. Ferreira et al., “A Context Based Text Summarization System,” in 2014 11th {IAPR} International Workshop on Document Analysis Systems, IEEE, Apr. 2014. doi: 10.1109/das.2014.19.

S. M. Patel, “Extractive Based Automatic Text Summarization,” J. Comput., pp. 550–563, 2017, doi: 10.17706/jcp.12.6.550-563.

J. R. Thomas, S. K. Bharti, and K. S. Babu, “Automatic Keyword Extraction for Text Summarization in e-Newspapers,” in Proceedings of the International Conference on Informatics and Analytics, ACM, Aug. 2016. doi: 10.1145/2980258.2980442.

N. Ramanujam and M. Kaliappan, “An Automatic Multidocument Text Summarization Approach Based on Naïve Bayesian Classifier Using Timestamp Strategy,” Sci. World J., vol. 2016, pp. 1–10, 2016, doi: 10.1155/2016/1784827.

A. Khan, N. Salim, and A. I. obasa, “An Optimized Semantic Technique for Multi-Document Abstractive Summarization,” Indian J. Sci. Technol., vol. 8, no. 32, Aug. 2016, doi: 10.17485/ijst/2015/v8i32/92118.

S. R. Rahimi, A. T. Mozhdehi, and M. Abdolahi, “An overview on extractive text summarization,” in 2017 {IEEE} 4th International Conference on Knowledge-Based Engineering and Innovation ({KBEI}), IEEE, Dec. 2017. doi: 10.1109/kbei.2017.8324874.

L. Lebanoff, K. Song, and F. Liu, “Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2018. doi: 10.18653/v1/d18-1446.

D. M. Victor, F. F. Eduardo, R. Biswas, E. Alegre, and L. Fernández-Robles, “Application of Extractive Text Summarization Algorithms to Speech-to-Text Media,” in Lecture Notes in Computer Science, Springer International Publishing, 2019, pp. 540–550. doi: 10.1007/978-3-030-29859-3_46.

C. Hark and A. Karci, “Karc{i} summarization: A simple and effective approach for automatic text summarization using Karc{i} entropy,” Inf. Process. {&}amp$mathsemicolon$ Manag., vol. 57, no. 3, p. 102187, May 2020, doi: 10.1016/j.ipm.2019.102187.

Z. Li, Z. Peng, S. Tang, C. Zhang, and H. Ma, “Text Summarization Method Based on Double Attention Pointer Network,” {IEEE} Access, vol. 8, pp. 11279–11288, 2020, doi: 10.1109/access.2020.2965575.

M. U. of K. I. Afsharizadeh, E.-K. U. of K. I. Hossein, B. U. U. T. N. Ayoub, and G. T. U. T. N. Chrupała, “A Survey on Multi-document Summarization and Domain-Oriented Approaches,” J. Inf. Syst. Telecommun., vol. 10, no. 01, pp. 68–79, 2022, doi: https://doi.org/10.52547/jist.16245.10.37.68.

I. Mehrabi, Shima Computer Engineering Department, Faculty of Engineering, University of Guilan, Rasht, I. Mirroshandel, Abolghasem Seyed Computer Engineering Department, Faculty of Engineering, University of Guilan, Rasht, and I. Hamidreza, Ahmadifar Computer Engineering Department, Faculty of Engineering, University of Guilan, Rasht, “DeepSumm: A Novel Deep Learning-Based Multi-Lingual Multi-Documents Summarization System,” J. Inf. Syst. Telecommun., 2020, doi: 10.7508/jist.2019.03.005.

M. Y. Saeed, M. Awais, R. Talib, and M. Younas, “Unstructured Text Documents Summarization With Multi-Stage Clustering,” {IEEE} Access, vol. 8, pp. 212838–212854, 2020, doi: 10.1109/access.2020.3040506.

T. Uçkan and A. Karci, “Extractive multi-document text summarization based on graph independent sets,” Egypt. Informatics J., vol. 21, no. 3, pp. 145–157, Sep. 2020, doi: 10.1016/j.eij.2019.12.002.

B. Mutlu, E. A. Sezer, and M. A. Akcayol, “Candidate sentence selection for extractive text summarization,” Inf. Process. {&}amp$mathsemicolon$ Manag., vol. 57, no. 6, p. 102359, Nov. 2020, doi: 10.1016/j.ipm.2020.102359.

S. Kumar and K. K. Bhatia, “Semantic similarity and text summarization based novelty detection,” {SN} Appl. Sci., vol. 2, no. 3, Feb. 2020, doi: 10.1007/s42452-020-2082-z.

W. S. El-Kassas, C. R. Salama, A. A. Rafea, and H. K. Mohamed, “Automatic text summarization: A comprehensive survey,” Expert Syst. Appl., vol. 165, p. 113679, Mar. 2021, doi: 10.1016/j.eswa.2020.113679.

Downloads

Published

25.12.2023

How to Cite

Solanki, S. ., Jain, S. ., & Bandhu, K. C. . (2023). SummaGen: Next-Generation Seq-to-Seq Model for Summarizing Unstructured Text . International Journal of Intelligent Systems and Applications in Engineering, 12(2), 55–61. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/4220

Issue

Section

Research Article