Identification and Categorization of SMS using Deep Learning and Machine Learning Methods
Keywords:
Text message Categorization, Machine Learning, Deep Learning, Word Embedding, Contextual EmbeddingAbstract
Text messages are short messages that can be used for personal as well as professional ways to share messages without involving the internet as a mode of communication. There are some essential text messages and some are nonessential. It is crucial to filter out the nonessential messages from the essential ones. Various machine learning and deep learning methods are used to categorize the text messages. This research work uses various machine learning and deep learning methods to categorize them. To extract the features from the text messages this study uses word embedding and contextual embedding techniques. Finally, the measurement of the performances is done with the help of performance matrices and confusion matrix parameters. For the word embedding-based feature selection method the Extra Tree and LSTM are more accurate i.e. 96.86% and 98.06%. And for the sentence embedding-based feature selection method the SVM and Bi-directional LSTM are more accurate i.e. 99.1% and 99.19%.
Downloads
References
Ghourabi, A., Mahmood, M. A., & Alzubi, Q. M. (2020). A hybrid CNN-LSTM model for SMS spam detection in arabic and english messages. Future Internet, 12(9), 156.
Bogoradnikova, D., Makhnytkina, O., Matveev, A., Zakharova, A., & Akulov, A. (2021, May). Multilingual sentiment analysis and toxicity detection for text messages in russian. In 2021 29th Conference of Open Innovations Association (FRUCT) (pp. 55-64). IEEE.
Dogra, V., Verma, S., Chatterjee, P., Shafi, J., Choi, J., & Ijaz, M. F. (2022). A complete process of text classification system using state-of-the-art NLP models. Computational Intelligence and Neuroscience, 2022.
Adhikari, S. (2020, March). Nlp based machine learning approaches for text summarization. In 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC) (pp. 535-538). IEEE.
Ghosh, S., Vinyals, O., Strope, B., Roy, S., Dean, T., & Heck, L. (2016). Contextual lstm (clstm) models for large scale nlp tasks. arXiv preprint arXiv:1602.06291.
Mahesh, B. (2020). Machine learning algorithms-a review. International Journal of Science and Research (IJSR).[Internet], 9(1), 381-386.
Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., & Gao, J. (2021). Deep learning--based text classification: a comprehensive review. ACM computing surveys (CSUR), 54(3), 1-40.
Li, Q., Peng, H., Li, J., Xia, C., Yang, R., Sun, L., ... & He, L. (2022). A survey on text classification: From traditional to deep learning. ACM Transactions on Intelligent Systems and Technology (TIST), 13(2), 1-41.
Roberts, Daniel P., et al. Precision agriculture and geospatial techniques for sustainable disease control. Indian Phytopathology, pp. 1-19. 2021.
Nunes, Simão AS, et al. “Cities go smart!”: A system dynamics-based approach to smart city conceptualization. Journal of Cleaner Production. pp. 127683. 2021
Agbozo, Ebenezer, and Kamen Spassov. Establishing efficient governance through data-driven e-government. Proceedings of the 11th International Conference on Theory and Practice of Electronic Governance. 2018.
AlSayegh, Ahmed, Chowdhury Hossan, and Bret Slade. Radical improvement of e-government services in Dubai. International Journal of Services Technology and Management 25.1, pp. 53-67. 2019.
Kumar, Shiv, et al. Advance e-governance system. International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS). IEEE, 2017.
Al-Dmour, H., Saad, N., Basheer Amin, E., Al-Dmour, R., & Al-Dmour, A. (2023). The influence of the practices of big data analytics applications on bank performance: filed study. VINE Journal of Information and Knowledge Management Systems, 53(1), 119-141.
Vasa, J., Yadav, H., Patel, B., & Patel, R. (2023). Architecture, Applications and Data Analytics Tools for Smart Cities: A Technical Perspective. In Sentiment Analysis and Deep Learning: Proceedings of ICSADL 2022 (pp. 859-873). Singapore: Springer Nature Singapore.
Samuel, P., Reshmy, A. K., Rajesh, S., Kanipriya, M., & Karthika, R. A. (2023). AI-Based Big Data Algorithms and Machine Learning Techniques for Managing Data in E-Governance. In AI, IoT, and Blockchain Breakthroughs in E-Governance (pp. 19-35). IGI Global.
Bibri, S. E., Krogstie, J., Kaboli, A., & Alahi, A. (2024). Smarter eco-cities and their leading-edge artificial intelligence of things solutions for environmental sustainability: A comprehensive systematic review. Environmental Science and Ecotechnology, 19, 100330.
Gubareva, R., & Lopes, R. P. (2024). Literature Review on the Smart City Resources Analysis with Big Data Methodologies. SN Computer Science, 5(1), 152.
Reference Link- https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset
Choudhary, K., & Beniwal, R. (2021, November). Xplore Word Embedding Using CBOW Model and Skip-Gram Model. In 2021 7th International Conference on Signal Processing and Communication (ICSC) (pp. 267-270). IEEE.
Alammar, J (2018). The Illustrated Transformer [Blog post]. Retrieved from https://jalammar.github.io/illustrated-transformer/
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.