A Framework for Spam Toxicity Level Detection using Machine Learning Models
Keywords:
Machine Learning, Spam, Toxicity, Classification.Abstract
Electronic mails are a technique of conveying a message to an individual or group through the internet. When emails first became popular, they could only be used for institutional and scientific research. However, as technology has advanced to the point where it can reach every individual on the planet, everyone now can have an email address in his or her name. As technology advances and more people begin to use email, billions of emails are sent in the name of promotions, ads, and spam. It is challenging to manage emails, and users nowadays are having difficulty locating their essential email. In this work, a framework is proposed for classifying the emails into multiple classes (bills, promotions, personal, spam, and OTP) so that the relevant emails can be identified by the users easily. The framework consists of supervised machine learning (ML) models such as “random forest (RF)”, “support vector machine (SVM)”, “naïve bayes (NB)”, “k-Nearest Neighbour (k-NN)”, and “decision tree (DT)” for classifying mails. A web application was also developed for the same purpose. Results demonstrated that the RF and k-NN classifiers outperformed the other classifiers based on accuracy.
Downloads
References
Gomes, S. R., Saroar, S. G., Mosfaiul, M., Telot, A., Khan, B. N., Chakrabarty, A., & Mostakim, M. (2017). A comparative approach to email classification using Naive Bayes classifier and hidden Markov model. 2017 4th International Conference on Advances in Electrical Engineering (ICAEE), 482-487. https://doi.org/10.1109/icaee.2017.8255404.
Raza, M., Jayasinghe, N. D., & Muslam, M. M. A. (2021). A Comprehensive Review on Email Spam Classification using Machine Learning Algorithms. 2021 International Conference on Information Networking (ICOIN), 327-332. https://doi.org/10.1109/icoin50884.2021.9334020.
Li, W., & Meng, W. (2015). An empirical study on email classification using supervised machine learning in real environments. 2015 IEEE International Conference on Communications (ICC), 7438-7443. https://doi.org/10.1109/icc.2015.7249515.
Iqbal, K., & Khan, M. S. (2022). Email classification analysis using machine learning techniques. Applied Computing and Informatics, 630-635. https://doi.org/10.1108/aci-01-2022-0012 .
Junnarkar, A., Adhikari, S., Fagania, J., Chimurkar, P., & Karia, D. (2021). E-Mail Spam Classification via Machine Learning and Natural Language Processing. 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), 693-699. https://doi.org/10.1109/icicv50876.2021.9388530.
Crawford, M., Khoshgoftaar, T. M., Prusa, J. D., Richter, A. N., & Al Najada, H. (2015). Survey of review spam detection using machine learning techniques. Journal of Big Data, 2(1). https://doi.org/10.1186/s40537-015-0029-9.
Dada, E. G., Bassi, J. S., Chiroma, H., Abdulhamid, S. M., Adetunmbi, A. O., & Ajibuwa, O. E. (2019). Machine learning for email spam filtering: Review, approaches, and open research problems. Heliyon, 5(6). https://doi.org/10.1016/j.heliyon.2019.e01802.
Risch, J., & Krestel, R. (2020). Toxic Comment Detection in Online Discussions. Algorithms for Intelligent Systems, 85-109. https://doi.org/10.1007/978-981-15-1216-2_4.
Ahsan, M. I., Nahian, T., Kafi, A. A., Hossain, M. I., & Shah, F. M. (2016). Review spam detection using active learning. 2016 IEEE 7th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), 1-7. https://doi.org/10.1109/iemcon.2016.7746279.
A Karim, A., Azam, S., Shanmugam, B., Kannoorpatti, K., & Alazab, M. (2019). A Comprehensive Survey for Intelligent Spam Email Detection. IEEE Access, 7, 168261-168295. https://doi.org/10.1109/access.2019.2954791.
Guang Jun, L., Nazir, S., Khan, H. U., & Haq, A. U. (2020). Spam Detection Approach for Secure Mobile Message Communication Using Machine Learning Algorithms. Security and Communication Networks, 2020, 1-6. https://doi.org/10.1155/2020/8873639.
Vinitha, V. S., & Renuka, D. K. (2019). Performance Analysis of E-Mail Spam Classification using different Machine Learning Techniques. 2019 International Conference on Advances in Computing and Communication Engineering (ICACCE), 1-5. https://doi.org/10.1109/icacce46606.2019.9080000.
Raja, P., Sangeetha, K., SuganthaKumar, G., Madesh, R., & Vimal Prakash, N. (2022). Email Spam Classification Using Machine Learning Algorithms. 2022 Second International Conference on Artificial Intelligence and Smart Energy (ICAIS), 343-348. https://doi.org/10.1109/icais53314.2022.9743033.
Li, W., Meng, W., Tan, Z., & Xiang, Y. (2014). Towards Designing an Email Classification System Using Multi-view Based Semi-supervised Learning. 2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications, 174-181. https://doi.org/10.1109/trustcom.2014.26.
Kumar, N., Sonowal, S., & Nishant. (2020). Email Spam Detection Using Machine Learning Algorithms. 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), 108-113. https://doi.org/10.1109/icirca48905.2020.9183098.
K. Taunk, S. De, S. Verma and A. Swetapadma, "A Brief Review of Nearest Neighbor Algorithm for Learning and Classification," 2019 International Conference on Intelligent Computing and Control Systems (ICCS), 2019, pp. 1255-1260, doi: 10.1109/ICCS45141.2019.9065747.
T. Toma, S. Hassan and M. Arifuzzaman, "An Analysis of Supervised Machine Learning Algorithms for Spam Email Detection," 2021 International Conference on Automation, Control and Mechatronics for Industry 4.0 (ACMI), 2021, pp. 1-5, doi: 10.1109/ACMI53878.2021.9528108.
V. Vishagini and A. K. Rajan, "An Improved Spam Detection Method with Weighted Support Vector Machine," 2018 International Conference on Data Science and Engineering (ICDSE), 2018, pp. 1-5, doi: 10.1109/ICDSE.2018.8527737.
A. Sumithra, A. Ashifa, S. Harini and N. Kumaresan, "Probability-based Naïve Bayes Algorithm for Email Spam Classification," 2022 International Conference on Computer Communication and Informatics (ICCCI), 2022, pp. 1-5, doi: 10.1109/ICCCI54379.2022.9740792.
I. Čavor, "Decision Tree Model for Email Classification," 2021 25th International Conference on Information Technology (IT), 2021, pp. 1-4, doi: 10.1109/IT51528.2021.9390143.
N. Mirza, B. Patil, T. Mirza and R. Auti, "Evaluating efficiency of classifier for email spam detector using hybrid feature selection approaches," 2017 International Conference on Intelligent Computing and Control Systems (ICICCS), 2017, pp. 735-740, doi: 10.1109/ICCONS.2017.8250561.
Doulamis, Anastasios D., Ismail, Safaa S. I., Mansour, Romany F., Abd El-Aziz, Rasha M. Taloba, Ahmed I.,(2022), Efficient E-Mail Spam Detection Strategy Using Genetic Decision Tree Processing with NLP Features,Computational Intelligence and Neuroscience, Hindawi, doi:10.1155/2022/7710005.
J. Cui and X. Li, "Content Based Spam Email Classification using Supervised SVM, Decision Trees and Naive Bayes," ICMLCA 2021; 2nd International Conference on Machine Learning and Computer Application, 2021, pp. 1-4.
M. Heydarian, T. E. Doyle and R. Samavi, "MLCM: Multi-Label Confusion Matrix," in IEEE Access, vol. 10, pp. 19083-19095, 2022, doi: 10.1109/ACCESS.2022.3151048.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.