Long Document Classification using Hierarchical Attention Networks

Authors

  • Ayesha Mariyam, SK. Althaf Hussain Basha, S.Viswanadha Raju

Keywords:

long document classification, word2vec, sentence2vec, attention model

Abstract

Online comments and reviews are the primal exponents of the current era, which needs thorough conceptual discriminations and opinion analysis. Full-text analysis using document classification methods are traditional approach. Deep learning methods are employed on long legal documents for text classification. The word level classification semantically signifies the sense of classification. Statistical methods exist like TF, TF/PDF, and TF/IDF. Document classification is sentence-level classification using sentence vectors mentioned in hitherto successful research consensus. The challenge lies in long legal documents classifying based on the sense in sentences. The article represents challenges in the existing propositions and ideas for implementing long document classification using attention learning. A CNN-based attention learning model is described for classification on BBC Web News datasets. The results are appraised using performance evaluation metrics and RoC graph and have accomplished estimated accuracy of 96%.

Downloads

Download data is not yet available.

Author Biography

Ayesha Mariyam, SK. Althaf Hussain Basha, S.Viswanadha Raju

Ms. Ayesha Mariyam, SK. Althaf Hussain Basha, S.Viswanadha Raju

Research Scholar, CSE,

Jawaharlal Nehru Technological University,

Hyderabad, India.

ayesha.mariyam84@gmail.com

 

Professor and Head

Computer Science and Engineering

Krishna Chaitanya Institute of Technology and Sciences, Markapur

althafbashacse@gmail.com

 

Professor

Computer Science and Engineering

JNTUH College of Engineering, Jagtial

svraju.jntu@gmail.com

References

Arar, Moab, Ariel Shamir, and Amit H. Bermano. "Learned queries for efficient local attention." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10841-10852. 2022.

Coquenet, Denis, Clément Chatelain, and Thierry Paquet. "End-to-end handwritten paragraph text recognition using a vertical attention network." IEEE Transactions on Pattern Analysis and Machine Intelligence 45, no. 1 (2022): 508-524.

Dogra, Varun, Sahil Verma, Pushpita Chatterjee, Jana Shafi, Jaeyoung Choi, and Muhammad Fazal Ijaz. "A complete process of text classification system using state-of-the-art NLP models." Computational Intelligence and Neuroscience 2022 (2022).

de Santana Correia, Alana, and Esther Luna Colombini. "Attention, please! A survey of neural attention models in deep learning." Artificial Intelligence Review 55, no. 8 (2022): 6037-6124.

Gnanavel, S., Vinodhini Mani, M. Sreekrishna, R. S. Amshavalli, Yomiyu Reta Gashu, N. Duraimurugan, and Namburi Srinivasa Rao. "Rapid Text Retrieval and Analysis Supporting Latent Dirichlet Allocation Based on Probabilistic Models." Mobile Information Systems 2022 (2022).

Lan, Fei. "Research on text similarity measurement hybrid algorithm with term semantic information and TF-IDF method." Advances in Multimedia Vol. 2022 (2022).

Liu, Minqian, Lizhao Liu, Junyi Cao, and Qing Du. "Co-attention network with label embedding for text classification." Neurocomputing 471 (2022): 61-69.

Madyatmadja, Evaristus Didik, Bernardo Nugroho Yahya, and Cristofer Wijaya. “Contextual Text Analytics Framework for Citizen Report Classification: A Case Study Using the Indonesian Language.” IEEE Access 10 (2022): 31432-31444.

Muaad, Abdullah Y., Hanumanthappa Jayappa Davanagere, D. S. Guru, JV Bibal Benifa, Channabasava Chola, Hussain AlSalman, Abdu H. Gumaei, and Mugahed A. Al-antari. “Arabic Document Classification: Performance Investigation of Preprocessing and Representation Techniques.” Mathematical Problems in Engineering 2022 (2022): 1-16.

Bhavani, A., and B. Santhosh Kumar. "A review of state art of text classification algorithms." In 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), pp. 1484-1490. IEEE, 2021.

Dong, Hang, Víctor Suárez-Paniagua, William Whiteley, and Honghan Wu. "Explainable automated coding of clinical notes using hierarchical label-wise attention networks and label embedding initialization." Journal of biomedical informatics 116 (2021): 103728.

Minaee, Shervin, Nal Kalchbrenner, Erik Cambria, Narjes Nikzad, Meysam Chenaghlu, and Jianfeng Gao. "Deep learning--based text classification: A Comprehensive Review." ACM computing surveys (CSUR) 54, No. 3 (2021): 1-40.

Naseem, Usman, Matloob Khushi, Shah Khalid Khan, Kamran Shaukat, and Mohammad Ali Moni. "A comparative analysis of active learning for biomedical text mining." Applied System Innovation 4, no. 1 (2021): 23.

Pintas, Julliano Trindade, Leandro AF Fernandes, and Ana Cristina Bicharra Garcia. "Feature Selection Methods for Text Classification: A Systematic Literature Review." Artificial Intelligence Review 54, No. 8 (2021): 6149-6200.

Poulos, Jason, and Rafael Valle. "Character-based handwritten text transcription with attention networks." Neural Computing and Applications 33, No. 16 (2021): 10563-10573.

Sun, Qian, Aili Shen, Hiyori Yoshikawa, Chunpeng Ma, Daniel Beck, Tomoya Iwakura, and Timothy Baldwin. "Evaluating Hierarchical Document Categorisation." In Proceedings of The 19th Annual Workshop of the Australasian Language Technology Association, pp. 179-184. 2021.

Wagh, Vedangi, Snehal Khandve, Isha Joshi, Apurva Wani, Geetanjali Kale, and Raviraj Joshi. "Comparative study of long document classification." In TENCON 2021-2021 IEEE Region 10 Conference (TENCON), pp. 732-737. IEEE, 2021.

Bansal, Neha, Arun Sharma, and R. K. Singh. “An Evolving Hybrid Deep Learning Framework for Legal Document Classification.” Ingénierie des Systèmes d'Information 24, No. 4 (2019).

Linmei, Hu, Tianchi Yang, Chuan Shi, Houye Ji, and Xiaoli Li. "Heterogeneous graph attention networks for semi-supervised short text classification." In Proceedings of the 2019 conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 4821-4830. 2019.

Kakade, Arpit, Kunal Dhumal, Sachin Das, Shikhar Jain, and N. M. Ranjan. "A neural network approach for text document classification and semantic text analytics." Journal of Data Mining and Management 2, No. 2 (2017): 1-5.

Semberecki, Piotr, and Henryk Maciejewski. "Deep Learning methods for Subject Text Classification of Articles." In 2017 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 357-360. IEEE, 2017.

Dharmadhikari, Shweta C., Maya Ingle, and Parag Kulkarni. "Empirical studies on machine learning based text classification algorithms." Advanced Computing 2, No. 6 (2011): 161.

Baker, L. Douglas, and Andrew Kachites McCallum. "Distributional clustering of words for text classification." In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 96-103. 1998.

Kumar, M. R., Chakravarthy, V. D., Ranganatham, T. N., & Ramana, K. (2021). WITHDRAWN: Personal finance transaction index scoring using machine learning model.

Kumar, M. R., & Gunjan, V. K. (2020). Review of machine learning models for credit scoring analysis. Ingeniería Solidaria, 16(1).

An Encoder and Decoder model of seq2seq

Downloads

Published

27.01.2023

How to Cite

Ayesha Mariyam, SK. Althaf Hussain Basha, S.Viswanadha Raju. (2023). Long Document Classification using Hierarchical Attention Networks. International Journal of Intelligent Systems and Applications in Engineering, 11(2s), 343 –. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/2708

Issue

Section

Research Article