Advanced Spam Email Detection using Machine Learning and Bio-Inspired Meta-Heuristics Algorithms

Authors

  • Purva Mange Associate Professor and HoD, Symbiosis School of Planning Architecture and Design, Nagpur Campus, Symbiosis International (Deemed University), Pune, India
  • Aditi Lule Assistant Professor, Symbiosis School of Planning Architecture and Design, Nagpur Campus, Symbiosis International (Deemed University), Pune, India
  • Rohini Savant Assistant Professor, Symbiosis School of Planning Architecture and Design, Nagpur Campus, Symbiosis International (Deemed University), Pune, India

Keywords:

Convolution neural network, Machine Learning, Genetic Algorithm, Particle Swarm Optimization, Spam detection

Abstract

Modern methods for precise detection and mitigation of spam emails are required since they continue to be a ubiquitous and changing menace. By combining machine learning and metaheuristics algorithms that are bio-inspired, we provide a novel method for detecting spam emails in this study. The conventional rule-based and content-based approaches are not always able to keep up with spammers' constantly evolving strategies. In order to overcome this difficulty, we suggest a hybrid model that makes use of both the advantages of machine learning and bio-inspired algorithms. Our approach makes use of a broad range of features gleaned from email headers, text, attachments, and sender behaviour. The accuracy of the detection is improved by this component, which catches complex patterns and relationships within the data. The categorization method is then optimized by using bio-inspired metaheuristics algorithms like particle swarm optimization (PSO) or genetic algorithms (GA). The model's parameters can be adjusted for better performance using these algorithms, which simulate real processes like swarm behaviour or genetic evolution. The dynamic adaption to new spam strategies is made easier and the number of false positives is decreased with this integration. The success of our strategy is demonstrated by our experimental analysis on a real-world email dataset. By achieving greater accuracy rates and lower false positive rates than traditional spam detection techniques, the hybrid model outperforms them. The model also shows robustness against hostile attacks and demonstrates its adaptability to various email sources and languages.

Downloads

Download data is not yet available.

References

M. A. Ferrag, L. Maglaras, S. Moschoyiannis, and H. Janicke, “Deep learning for cyber security intrusion detection: approaches, datasets, and comparative study,” Journal of Information Security and Applications, vol. 50, Article ID 102419, 2020.

N. Kumar and S. Sonowal, “Email spam detection using machine learning algorithms,” in Proceedings of the 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), pp. 108–113, Coimbatore, India, 2020.

I. Santos, Y. K. Penya, J. Devesa, and P. G. Bringas, “N-grams-based file signatures for malware detection,” ICEIS, vol. 9, no. 2, pp. 317–320, 2009.

S. Cresci, M. Petrocchi, A. Spognardi, and S. Tognazzi, “On the capability of evolved spambots to evade detection via genetic engineering,” Online Social Networks and Media, vol. 9, pp. 1–16, 2019.

A. J. Saleh, A. Karim, B. Shanmugam et al., “An intelligent spam detection model based on artificial immune system,” Information, vol. 10, no. 6, p. 209, 2019.

S. B. Kotsiantis, I. Zaharakis, and P. Pintelas, “Supervised machine learning: a review of classification techniques,” Emerging artificial intelligence applications in computer engineering, vol. 160, pp. 3–24, 2007.

S. Ajani and M. Wanjari, "An Efficient Approach for Clustering Uncertain Data Mining Based on Hash Indexing and Voronoi Clustering," 2013 5th International Conference and Computational Intelligence and Communication Networks, 2013, pp. 486-490, doi: 10.1109/CICN.2013.106.

Khetani, V. ., Gandhi, Y. ., Bhattacharya, S. ., Ajani, S. N. ., & Limkar, S. . (2023). Cross-Domain Analysis of ML and DL: Evaluating their Impact in Diverse Domains. International Journal of Intelligent Systems and Applications in Engineering, 11(7s), 253–262. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/2951

E. Blanzieri and A. Bryl, E-mail Spam Filtering with Local SVM Classifiers, University of Trento, Trento, Italy, 2008.

H. Bhuiyan, A. Ashiquzzaman, T. Islam Juthi, S. Biswas, and J. Ara, “A survey of existing e-mail spam filtering methods considering machine learning techniques,” Global Journal of Computer Science and Technology, vol. 18, 2018.

A. Asuncion and D. Newman, “UCI machine learning repository,” 2007, https://archive.ics.uci.edu/ml/index.php.

T. Vyas, P. Prajapati, and S. Gadhwal, “A survey and evaluation of supervised machine learning techniques for spam e-mail filtering,” in Proceedings of the 2015 IEEE international conference on electrical, computer and communication technologies (ICECCT), IEEE, Tamil Nadu, India, March 2015.

G. Jain, M. Sharma, and B. Agarwal, “Optimizing semantic lstm for spam detection,” International Journal of Information Technology, vol. 11, no. 2, pp. 239–250, 2019.

F. Masood, G. Ammad, A. Almogren et al., “Spammer detection and fake user identification on social networks,” IEEE Access, vol. 7, pp. 68140–68152, 2019.

A. Akhtar, G. R. Tahir, and K. Shakeel, “A mechanism to detect Urdu spam emails,” in Proceedings of the 2017 IEEE 8th Annual Ubiquitous Computing, Electronics and Mobile Communication Conference (UEMCON), pp. 168–172, IEEE, New York, NY, USA, Oct 2017.

H. Drucker, D. Donghui Wu, and V. N. Vapnik, “Support vector machines for spam categorization,” IEEE Transactions on Neural Networks, vol. 10, no. 5, pp. 1048–1054, 1999.

H. Afzal and K. Mehmood, “Spam filtering of bi-lingual tweets using machine learning,” in Proceedings of the 2016 18th International Conference on Advanced Communication Technology (ICACT), pp. 710–714, IEEE, PyeongChang, Korea (South), Feb 2016.

S. K. Tuteja and N. Bogiri, “Email spam filtering using bpnn classification algorithm,” in Proceedings of the 2016 International Conference on Automatic Control and Dynamic Optimization Techniques (ICACDOT), pp. 915–919, IEEE, Pune, India, Sep 2016.

M. Mohamad and A. Selamat, “An evaluation on the efficiency of hybrid feature selection in spam email classification,” in Proceedings of the 2015 International Conference on Computer, Communications, and Control Technology (I4CT), pp. 227–231, IEEE, Kuching, Malaysia, Apr 2015.

P. Sharma, U. Bhardwaj, and U. Bhardwaj, “Machine learning based spam e-mail detection,” International Journal of Intelligent Engineering and Systems, vol. 11, no. 3, pp. 1–10, 2018.

S. Suryawanshi, A. Goswami, and P. Patil, “Email spam detection: an empirical comparative study of different ml and ensemble classifiers,” in Proceedings of the 2019 IEEE 9th International Conference on Advanced Computing (IACC), pp. 69–74, IEEE, Tiruchirappalli, India, Dec 2019.

K. Agarwal and T. Kumar, “Email spam detection using integrated approach of naïve bayes and particle swarm optimization,” in Proceedings of the 2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS), pp. 685–690, IEEE, Madurai, India, June 2018.

A. Iyengar, G. Kalpana, S. Kalyankumar, and S. GunaNandhini, “Integrated spam detection for multilingual emails,” in Proceedings of the 2017 International Conference on Information Communication and Embedded Systems (ICICES), pp. 1–4, IEEE, Chennai, India, February 2017.

K. Kandasamy and P. Koroth, “An integrated approach to spam classification on twitter using url analysis, natural language processing and machine learning techniques,” in Proceedings of the 2014 IEEE Students’ Conference on Electrical, Electronics and Computer Science, pp. 1–5, IEEE, Bhopal, India, March 2014.

X.-l. Chen, P.-y. Liu, Z.-f. Zhu, and Y. Qiu, “A method of spam filtering based on weighted support vector machines,” in Proceedings of the 2009 IEEE International Symposium on IT in Medicine & Education, vol. 1, pp. 947–950, IEEE, Jinan, China, Aug 2009.

H. Kaur and A. Sharma, “Improved email spam classification method using integrated particle swarm optimization and decision tree,” in Proceedings of the 2016 2nd International Conference on Next Generation Computing Technologies (NGCT), pp. 516–521, IEEE, Dehradun, India, Oct 2016.

A. Karim, S. Azam, B. Shanmugam, K. Kannoorpatti, and M. Alazab, “A comprehensive survey for intelligent spam email detection,” IEEE Access, vol. 7, pp. 168261–168295, 2019.

I. Androutsopoulos, J. Koutsias, K. V. Chandrinos, G. Paliouras, and C. D. Spyropoulos, An Evaluation of Naive Bayesian Anti-spam Filtering, 2000, arXiv preprint cs/0006013

N. G. M. Jameel and L. E. George, “Detection of phishing emails using feed forward neural network,” International Journal of Computer Applications, vol. 77, no. 7, 2013.

S. L. Marie-Sainte and N. Alalyani, "Firefly algorithm based feature selection for arabic text classification", J. King Saud Univ.-Comput. Inf. Sci., vol. 32, no. 3, pp. 320-328, Mar. 2020.

E. A. Natarajan, S. Subramanian and K. Premalatha, "An enhanced cuckoo search for optimization of bloom filter in spam filtering", Global J. Comput. Sci. Technol., vol. 12, no. 1, pp. 75-81, Jan. 2012

A. Géron, Hands-On Machine Learning With Scikit-Learn Keras and TensorFlow, Newton, MA, USA:O’Reilly Media, 2019.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, et al., "Scikit-learn: Machine learning in Python", J. Mach. Learn. Res., vol. 12, pp. 2825-2830, Oct. 2011

G. Singh, B. Kumar, L. Gaur and A. Tyagi, "Comparison between multinomial and Bernoulli Naïve Bayes for text classification", Proc. Int. Conf. Autom. Comput. Technol. Manage. (ICACTM), pp. 593-596, Apr. 2019.

N. Rusland, N. Wahid, S. Kasim and H. Hafit, "Analysis of Naïve Bayes algorithm for email spam filtering across multiple datasets", Proc. IOP Conf. Ser. Mater. Sci. Eng., vol. 226, 2017.

F. Temitayo, O. Stephen and A. Abimbola, "Hybrid GA-SVM for Efficient Feature Selection in E-mail Classification", Comput. Eng. Intell. Syst., vol. 3, no. 3, pp. 17-28, 2012

M, V. ., P U, P. M. ., M, T. ., & Lopez, D. . (2023). XDLX: A Memory-Efficient Solution for Backtracking Applications in Big Data Environment using XOR-based Dancing Links. International Journal on Recent and Innovation Trends in Computing and Communication, 11(1), 88–94. https://doi.org/10.17762/ijritcc.v11i1.6054

Andrew Hernandez, Stephen Wright, Yosef Ben-David, Rodrigo Costa,. Enhancing Decision Support Systems through Machine Learning Algorithms. Kuwait Journal of Machine Learning, 2(3). Retrieved from http://kuwaitjournals.com/index.php/kjml/article/view/194

Timande, S., Dhabliya, D. Designing multi-cloud server for scalable and secure sharing over web (2019) International Journal of Psychosocial Rehabilitation, 23 (5), pp. 835-841.

Downloads

Published

10.11.2023

How to Cite

Mange, P. ., Lule, A. ., & Savant , R. . (2023). Advanced Spam Email Detection using Machine Learning and Bio-Inspired Meta-Heuristics Algorithms. International Journal of Intelligent Systems and Applications in Engineering, 12(4s), 122–135. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/3757

Issue

Section

Research Article