Optimizing SMS Spam Detection: Leveraging the Strength of a Voting Classifier Ensemble

Authors

  • Manas Ranjan Bishi, N Sardhak Manikanta, G Hari Surya Bharadwaj, P Siva Krishna Teja, G Rama Koteswara Rao

Keywords:

Phishing Offensives, Sophisticated, Machine Learning, Leverage, SMS Spamers.

Abstract

The paper addresses the challenge of SMS spam, which has seen a significant rise in recent years. This work proposes an ensemble learning` approach using Support Vector Machine (SVM), Naive Bayes, Extra Trees, and a Voting Classifier to enhance SMS spam detection. The system employs diverse machine learning algorithms, meticulously chosen and fine-tuned for optimal performance. The ensemble, centered on the Voting Classifier strengthened by a Random Forest Classifier, plays a crucial role in identifying spam messages. Evaluation is conducted using key metrics such as Accuracy, Precision, Recall, and F1-score to provide a comprehensive understanding of the model's effectiveness. Dataset exploration revealed unexpected dynamics, challenging initial assumptions. For instance, higher word counts were observed in spam messages, potentially reflecting strategic tactics employed by spammers. Additionally, the identification of over 6,000 duplicate texts within spam messages raises intriguing questions about spammers' methodologies. The development process incorporates meticulous data preprocessing steps like tokenization, lowercasing, and stop word removal. Rigorous training sessions with SVM, Naive Bayes, and Random Forest classifiers leverage their unique strengths, while the introduction of a voting ensemble method enhances the model's robustness and mitigates potential biases. The paper concludes by demonstrating the practical application of the SMS spam detector, achieving an accuracy of 94% through the combined application of various machine learning algorithms. This systematic and thoughtful approach positions the paper as a valuable contribution to the field of SMS spam detection, addressing real-world challenges in digital communication security.

Downloads

Download data is not yet available.

References

P. Navaney, G. Dubey and A. Rana, "SMS Spam Filtering Using Supervised Machine Learning Algorithms," 2018 8th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India, 2018, pp. 43-48, doi: 10.1109/CONFLUENCE.2018.8442564.

G. Ubale and S. Gaikwad, "SMS Spam Detection Using TFIDF and Voting Classifier," 2022 International Mobile and Embedded Technology Conference (MECON), Noida, India, 2022, pp. 363-366, doi: 10.1109/MECON53876.2022.9752078.

A. Subasi, S. Alzahrani, A. Aljuhani and M. Aljedani, "Comparison of Decision Tree Algorithms for Spam E-mail Filtering," 2018 1st International Conference on Computer Applications & Information Security (ICCAIS), Riyadh, Saudi Arabia, 2018, pp. 1-5, doi: 10.1109/CAIS.2018.8442016.

P. K. Panigrahi, "A Comparative Study of Supervised Machine Learning Techniques for Spam E-mail Filtering," 2012 Fourth International Conference on Computational Intelligence and Communication Networks, Mathura, India, 2012, pp. 506-512, doi: 10.1109/CICN.2012.14.

N. J. Kawale and S. Y. Sait, "A Review on Various Techniques for Spam Detection," 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS), Coimbatore, India, 2021, pp. 1771-1775, doi: 10.1109/ICAIS50930.2021.9395979.

T. Vyas, P. Prajapati and S. Gadhwal, "A survey and evaluation of supervised machine learning techniques for spam e-mail filtering," 2015 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), Coimbatore, India, 2015, pp. 1-7, doi: 10.1109/ICECCT.2015.7226077.

Alghoul A., Al Ajrami S., Al Jarousha G., Harb G., and Abu-Naser S. S., “Email classification using artificial neural network,” International Journal for Academic Development, vol. 2, 2018.

N. Mirza, B. Patil, T. Mirza and R. Auti, "Evaluating efficiency of classifier for email spam detector using hybrid feature selection approaches," 2017 International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 2017, pp. 735-740, doi: 10.1109/ICCONS.2017.8250561.

X. Liu, H. Lu and A. Nayak, "A Spam Transformer Model for SMS Spam Detection," in IEEE Access, vol. 9, pp. 80253-80263, 2021, doi: 10.1109/ACCESS.2021.3081479.

Marková, Eva, et al. "Malicious Emails Classification Based on Machine Learning." Proceedings of the Computational Methods in Systems and Software. Cham: Springer International Publishing, 2021. 797-810.

Silpa, C., et al. "A Meta Classifier Model for SMS Spam Detection using MultinomialNB-LinearSVC Algorithms." 2023 International Conference on Networking and Communications (ICNWC). IEEE, 2023.

C. Ulus, Z. Wang, S. M. A. Iqbal, K. M. S. Khan and X. Zhu, "Transfer Naïve Bayes Learning using Augmentation and Stacking for SMS Spam Detection," 2022 IEEE International Conference on Knowledge Graph (ICKG), Orlando, FL, USA, 2022, pp. 275-282, doi: 10.1109/ICKG55886.2022.00042.

Goswami, Vasudha, Vijay Malviya, and Pratyush Sharma. "Detecting spam emails/SMS using Naive Bayes, support vector machine and Random Forest." Proceeding of the International Conference on Computer Networks, Big Data and IoT (ICCBI-2019). Springer International Publishing, 2020.

Jukic, Samed & Azemović, Jasmin & Kečo, Dino & Kevric, Jasmin. (2015). COMPARISON OF MACHINE LEARNING TECHNIQUES IN SPAM E-MAIL CLASSIFICATION. Southeast Europe Journal of Soft Computing. 4. 32-36. 10.21533/scjournal.v4i1.88.

Reaves, Bradley & Blue, Logan & Tian, Dave & Traynor, Patrick & Butler, Kevin. (2016). Detecting SMS Spam in the Age of Legitimate Bulk Messaging. 165-170. 10.1145/2939918.2939937.

Yadav, Kuldeep & Kumaraguru, Ponnurangam & Goyal, Atul & Gupta, Ashish & Naik, Vinayak. (2011). SMSAssassin: crowdsourcing driven mobile-based system for SMS Spam filtering. 10.1145/2184489.2184491.

Terli, Niharika, et al. "Detection of Spam in SMS Using Machine Learning Algorithms." International Conference on Smart Computing and Communication. Singapore: Springer Nature Singapore, 2023.

Bari, Prince, et al. "SMS and E-mail Spam Classification Using Natural Language Processing and Machine Learning." International Conference on Communication, Electronics and Digital Technology. Singapore: Springer Nature Singapore, 2023.

Patil L, Sakhidas J, Jain D, Darji S, Borhade K. A Comparative Study of Spam SMS Detection Techniques for English Content Using Supervised Machine Learning Algorithms. InInternational Symposium on Intelligent Informatics 2022 Aug 31 (pp. 211-224). Singapore: Springer Nature Singapore.

Gadde, Sridevi, A. Lakshmanarao, and S. Satyanarayana. "SMS spam detection using machine learning and deep learning techniques." 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS). Vol. 1. IEEE, 2021.

Shafi’I, Muhammad Abdulhamid, et al. "A review on mobile SMS spam filtering techniques." IEEE Access 5 (2017): 15650-15666.

T. A. Almeida, J. M. G. Hidalgo and A. Yamakami, “Contributions to the Study of SMS Spam Filtering: New Collection and Results,” Proceedings of the 11th ACM Symposium on Document Engineering in DocEng‘11, New York, 2011, pp. 259-262.

Wilvicta, Nisha & Tousif, Mohammed & Architecture Science and Technology, International Journal Of Advances In Engineering. (2023). SMS Spam Detection Using Machine Learning. 1. 1-6.

Kumar, N. and Sonowal, S., 2020, July. Email spam detection using machine learning algorithms. In 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA) (pp. 108-113). IEEE.

Saleh, Abdul Jabbar, Asif Karim, Bharanidharan Shanmugam, Sami Azam, Krishnan Kannoorpatti, Mirjam Jonkman, and Friso De Boer. "An intelligent spam detection model based on artificial immune system." Information 10, no. 6 (2019): 209.

Downloads

Published

24.03.2024

How to Cite

N Sardhak Manikanta, G Hari Surya Bharadwaj, P Siva Krishna Teja, G Rama Koteswara Rao, M. R. B. . (2024). Optimizing SMS Spam Detection: Leveraging the Strength of a Voting Classifier Ensemble. International Journal of Intelligent Systems and Applications in Engineering, 12(3), 2458–2469. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/5717

Issue

Section

Research Article