A Framework for Spam Toxicity Level Detection using Machine Learning Models

Authors

  • K Aditya Shastry, Akshatha G C, Naresh E, Karthik D U, Mohan Kumar T G

Keywords:

Machine Learning, Spam, Toxicity, Classification.

Abstract

Electronic mails are a technique of conveying a message to an individual or group through the internet.  When emails first became popular, they could only be used for institutional and scientific research. However, as technology has advanced to the point where it can reach every individual on the planet, everyone now can have an email address in his or her name. As technology advances and more people begin to use email, billions of emails are sent in the name of promotions, ads, and spam. It is challenging to manage emails, and users nowadays are having difficulty locating their essential email. In this work, a framework is proposed for classifying the emails into multiple classes (bills, promotions, personal, spam, and OTP) so that the relevant emails can be identified by the users easily. The framework consists of supervised machine learning (ML) models such as “random forest (RF)”, “support vector machine (SVM)”, “naïve bayes (NB)”, “k-Nearest Neighbour (k-NN)”, and “decision tree (DT)” for classifying mails. A web application was also developed for the same purpose. Results demonstrated that the RF and k-NN classifiers outperformed the other classifiers based on accuracy.

Downloads

Download data is not yet available.

References

Gomes, S. R., Saroar, S. G., Mosfaiul, M., Telot, A., Khan, B. N., Chakrabarty, A., & Mostakim, M. (2017). A comparative approach to email classification using Naive Bayes classifier and hidden Markov model. 2017 4th International Conference on Advances in Electrical Engineering (ICAEE), 482-487. https://doi.org/10.1109/icaee.2017.8255404.

Raza, M., Jayasinghe, N. D., & Muslam, M. M. A. (2021). A Comprehensive Review on Email Spam Classification using Machine Learning Algorithms. 2021 International Conference on Information Networking (ICOIN), 327-332. https://doi.org/10.1109/icoin50884.2021.9334020.

Li, W., & Meng, W. (2015). An empirical study on email classification using supervised machine learning in real environments. 2015 IEEE International Conference on Communications (ICC), 7438-7443. https://doi.org/10.1109/icc.2015.7249515.

Iqbal, K., & Khan, M. S. (2022). Email classification analysis using machine learning techniques. Applied Computing and Informatics, 630-635. https://doi.org/10.1108/aci-01-2022-0012 .

Junnarkar, A., Adhikari, S., Fagania, J., Chimurkar, P., & Karia, D. (2021). E-Mail Spam Classification via Machine Learning and Natural Language Processing. 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), 693-699. https://doi.org/10.1109/icicv50876.2021.9388530.

Crawford, M., Khoshgoftaar, T. M., Prusa, J. D., Richter, A. N., & Al Najada, H. (2015). Survey of review spam detection using machine learning techniques. Journal of Big Data, 2(1). https://doi.org/10.1186/s40537-015-0029-9.

Dada, E. G., Bassi, J. S., Chiroma, H., Abdulhamid, S. M., Adetunmbi, A. O., & Ajibuwa, O. E. (2019). Machine learning for email spam filtering: Review, approaches, and open research problems. Heliyon, 5(6). https://doi.org/10.1016/j.heliyon.2019.e01802.

Risch, J., & Krestel, R. (2020). Toxic Comment Detection in Online Discussions. Algorithms for Intelligent Systems, 85-109. https://doi.org/10.1007/978-981-15-1216-2_4.

Ahsan, M. I., Nahian, T., Kafi, A. A., Hossain, M. I., & Shah, F. M. (2016). Review spam detection using active learning. 2016 IEEE 7th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), 1-7. https://doi.org/10.1109/iemcon.2016.7746279.

A Karim, A., Azam, S., Shanmugam, B., Kannoorpatti, K., & Alazab, M. (2019). A Comprehensive Survey for Intelligent Spam Email Detection. IEEE Access, 7, 168261-168295. https://doi.org/10.1109/access.2019.2954791.

Guang Jun, L., Nazir, S., Khan, H. U., & Haq, A. U. (2020). Spam Detection Approach for Secure Mobile Message Communication Using Machine Learning Algorithms. Security and Communication Networks, 2020, 1-6. https://doi.org/10.1155/2020/8873639.

Vinitha, V. S., & Renuka, D. K. (2019). Performance Analysis of E-Mail Spam Classification using different Machine Learning Techniques. 2019 International Conference on Advances in Computing and Communication Engineering (ICACCE), 1-5. https://doi.org/10.1109/icacce46606.2019.9080000.

Raja, P., Sangeetha, K., SuganthaKumar, G., Madesh, R., & Vimal Prakash, N. (2022). Email Spam Classification Using Machine Learning Algorithms. 2022 Second International Conference on Artificial Intelligence and Smart Energy (ICAIS), 343-348. https://doi.org/10.1109/icais53314.2022.9743033.

Li, W., Meng, W., Tan, Z., & Xiang, Y. (2014). Towards Designing an Email Classification System Using Multi-view Based Semi-supervised Learning. 2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications, 174-181. https://doi.org/10.1109/trustcom.2014.26.

Kumar, N., Sonowal, S., & Nishant. (2020). Email Spam Detection Using Machine Learning Algorithms. 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), 108-113. https://doi.org/10.1109/icirca48905.2020.9183098.

K. Taunk, S. De, S. Verma and A. Swetapadma, "A Brief Review of Nearest Neighbor Algorithm for Learning and Classification," 2019 International Conference on Intelligent Computing and Control Systems (ICCS), 2019, pp. 1255-1260, doi: 10.1109/ICCS45141.2019.9065747.

T. Toma, S. Hassan and M. Arifuzzaman, "An Analysis of Supervised Machine Learning Algorithms for Spam Email Detection," 2021 International Conference on Automation, Control and Mechatronics for Industry 4.0 (ACMI), 2021, pp. 1-5, doi: 10.1109/ACMI53878.2021.9528108.

V. Vishagini and A. K. Rajan, "An Improved Spam Detection Method with Weighted Support Vector Machine," 2018 International Conference on Data Science and Engineering (ICDSE), 2018, pp. 1-5, doi: 10.1109/ICDSE.2018.8527737.

A. Sumithra, A. Ashifa, S. Harini and N. Kumaresan, "Probability-based Naïve Bayes Algorithm for Email Spam Classification," 2022 International Conference on Computer Communication and Informatics (ICCCI), 2022, pp. 1-5, doi: 10.1109/ICCCI54379.2022.9740792.

I. Čavor, "Decision Tree Model for Email Classification," 2021 25th International Conference on Information Technology (IT), 2021, pp. 1-4, doi: 10.1109/IT51528.2021.9390143.

N. Mirza, B. Patil, T. Mirza and R. Auti, "Evaluating efficiency of classifier for email spam detector using hybrid feature selection approaches," 2017 International Conference on Intelligent Computing and Control Systems (ICICCS), 2017, pp. 735-740, doi: 10.1109/ICCONS.2017.8250561.

Doulamis, Anastasios D., Ismail, Safaa S. I., Mansour, Romany F., Abd El-Aziz, Rasha M. Taloba, Ahmed I.,(2022), Efficient E-Mail Spam Detection Strategy Using Genetic Decision Tree Processing with NLP Features,Computational Intelligence and Neuroscience, Hindawi, doi:10.1155/2022/7710005.

J. Cui and X. Li, "Content Based Spam Email Classification using Supervised SVM, Decision Trees and Naive Bayes," ICMLCA 2021; 2nd International Conference on Machine Learning and Computer Application, 2021, pp. 1-4.

M. Heydarian, T. E. Doyle and R. Samavi, "MLCM: Multi-Label Confusion Matrix," in IEEE Access, vol. 10, pp. 19083-19095, 2022, doi: 10.1109/ACCESS.2022.3151048.

Downloads

Published

24.03.2024

How to Cite

Akshatha G C, Naresh E, Karthik D U, Mohan Kumar T G, K. A. S. (2024). A Framework for Spam Toxicity Level Detection using Machine Learning Models. International Journal of Intelligent Systems and Applications in Engineering, 12(3), 2626–2634. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/5735

Issue

Section

Research Article