Identification and Categorization of SMS using Deep Learning and Machine Learning Methods

Authors

  • Sinkon Nayak, Manjusha Pandey, Siddharth Swarup Rautaray

Keywords:

Text message Categorization, Machine Learning, Deep Learning, Word Embedding, Contextual Embedding

Abstract

Text messages are short messages that can be used for personal as well as professional ways to share messages without involving the internet as a mode of communication. There are some essential text messages and some are nonessential. It is crucial to filter out the nonessential messages from the essential ones. Various machine learning and deep learning methods are used to categorize the text messages. This research work uses various machine learning and deep learning methods to categorize them. To extract the features from the text messages this study uses word embedding and contextual embedding techniques. Finally, the measurement of the performances is done with the help of performance matrices and confusion matrix parameters. For the word embedding-based feature selection method the Extra Tree and LSTM are more accurate i.e. 96.86% and 98.06%. And for the sentence embedding-based feature selection method the SVM and Bi-directional LSTM are more accurate i.e. 99.1% and 99.19%.

Downloads

Download data is not yet available.

References

Ghourabi, A., Mahmood, M. A., & Alzubi, Q. M. (2020). A hybrid CNN-LSTM model for SMS spam detection in arabic and english messages. Future Internet, 12(9), 156.

Bogoradnikova, D., Makhnytkina, O., Matveev, A., Zakharova, A., & Akulov, A. (2021, May). Multilingual sentiment analysis and toxicity detection for text messages in russian. In 2021 29th Conference of Open Innovations Association (FRUCT) (pp. 55-64). IEEE.

Dogra, V., Verma, S., Chatterjee, P., Shafi, J., Choi, J., & Ijaz, M. F. (2022). A complete process of text classification system using state-of-the-art NLP models. Computational Intelligence and Neuroscience, 2022.

Adhikari, S. (2020, March). Nlp based machine learning approaches for text summarization. In 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC) (pp. 535-538). IEEE.

Ghosh, S., Vinyals, O., Strope, B., Roy, S., Dean, T., & Heck, L. (2016). Contextual lstm (clstm) models for large scale nlp tasks. arXiv preprint arXiv:1602.06291.

Mahesh, B. (2020). Machine learning algorithms-a review. International Journal of Science and Research (IJSR).[Internet], 9(1), 381-386.

Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., & Gao, J. (2021). Deep learning--based text classification: a comprehensive review. ACM computing surveys (CSUR), 54(3), 1-40.

Li, Q., Peng, H., Li, J., Xia, C., Yang, R., Sun, L., ... & He, L. (2022). A survey on text classification: From traditional to deep learning. ACM Transactions on Intelligent Systems and Technology (TIST), 13(2), 1-41.

Roberts, Daniel P., et al. Precision agriculture and geospatial techniques for sustainable disease control. Indian Phytopathology, pp. 1-19. 2021.

Nunes, Simão AS, et al. “Cities go smart!”: A system dynamics-based approach to smart city conceptualization. Journal of Cleaner Production. pp. 127683. 2021

Agbozo, Ebenezer, and Kamen Spassov. Establishing efficient governance through data-driven e-government. Proceedings of the 11th International Conference on Theory and Practice of Electronic Governance. 2018.

AlSayegh, Ahmed, Chowdhury Hossan, and Bret Slade. Radical improvement of e-government services in Dubai. International Journal of Services Technology and Management 25.1, pp. 53-67. 2019.

Kumar, Shiv, et al. Advance e-governance system. International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS). IEEE, 2017.

Al-Dmour, H., Saad, N., Basheer Amin, E., Al-Dmour, R., & Al-Dmour, A. (2023). The influence of the practices of big data analytics applications on bank performance: filed study. VINE Journal of Information and Knowledge Management Systems, 53(1), 119-141.

Vasa, J., Yadav, H., Patel, B., & Patel, R. (2023). Architecture, Applications and Data Analytics Tools for Smart Cities: A Technical Perspective. In Sentiment Analysis and Deep Learning: Proceedings of ICSADL 2022 (pp. 859-873). Singapore: Springer Nature Singapore.

Samuel, P., Reshmy, A. K., Rajesh, S., Kanipriya, M., & Karthika, R. A. (2023). AI-Based Big Data Algorithms and Machine Learning Techniques for Managing Data in E-Governance. In AI, IoT, and Blockchain Breakthroughs in E-Governance (pp. 19-35). IGI Global.

Bibri, S. E., Krogstie, J., Kaboli, A., & Alahi, A. (2024). Smarter eco-cities and their leading-edge artificial intelligence of things solutions for environmental sustainability: A comprehensive systematic review. Environmental Science and Ecotechnology, 19, 100330.

Gubareva, R., & Lopes, R. P. (2024). Literature Review on the Smart City Resources Analysis with Big Data Methodologies. SN Computer Science, 5(1), 152.

Reference Link- https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset

Choudhary, K., & Beniwal, R. (2021, November). Xplore Word Embedding Using CBOW Model and Skip-Gram Model. In 2021 7th International Conference on Signal Processing and Communication (ICSC) (pp. 267-270). IEEE.

Alammar, J (2018). The Illustrated Transformer [Blog post]. Retrieved from https://jalammar.github.io/illustrated-transformer/

Downloads

Published

26.06.2024

How to Cite

Sinkon Nayak. (2024). Identification and Categorization of SMS using Deep Learning and Machine Learning Methods. International Journal of Intelligent Systems and Applications in Engineering, 12(4), 845–852. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/6307

Issue

Section

Research Article