Indian Phishing Landscape: A Machine Learning and Deep Learning Approach for Detecting Malicious URLs and Curating an Indigenous Dataset

Authors

  • Dhruv Gada, Chinmayee Kale, Himanshu Goswami ,Sridhar Iyer

Keywords:

CNN, Cyber Security, Deep Learning, Indian Malicious Website Dataset, LSTM, Machine Learning, RNN, URL Phishing

Abstract

Owing to the advancing technologies, activities like browsing the web for critical information, posting on social media and preference for online transactions have increased tremendously. At such a time, it is quite easy to fall into the trap of phishing websites that clone legitimate sites. The paper implements cutting-edge, remarkable Machine Learning techniques which uses Decision Tree algorithm along with Random Forest Classifier, SGD, Logistic Regression, K-Neighbors and Deep Learning methods like CNN, RNN and LSTM on two datasets to classify the online websites as ‘phishing’ and ‘benign’. One of the dataset used is curated to focus on Indian Phishing Website data thus following a more targeted approach in the detection of malicious websites and allowing a proactive defense against attackers. The paper slices and categorizes the data by URL length, top-level domain, symbol counting and other hyper-parameters for seamless feature extraction. The paper also uses TF-IDF for metrics generation as it overcomes limitations of the simple frequency counts and creates a distinctive categorization of the input data values. The fusion of ML and DL techniques to achieve robust cybersecurity, effective training and testing of data on authenticated datasets and a personalized curation of a dataset for top Indian Phishing Websites is how the paper bridges the gap between the research till date.

Downloads

Download data is not yet available.

References

Aljofey A., Jiang Q., Qu Q., Huang M., & Niyigena J. P. 2020. An effective phishing detection model based on character level convolutional neural network from URL. {Electronics}, 9(9), 1514.

Alshingiti Z., Alaqel R., Al-Muhtadi J., Haq Q.E.U., Saleem K., Faheem M.H. 2022. A Deep Learning-Based Phishing Detection System Using CNN, LSTM, and LSTM-CNN. {Electronics} 2023, 12, 232.

Altaher A. 2017. Phishing websites classification using hybrid SVM and KNN approach.{ International Journal of Advanced Computer Science and Applications}, 8(6)..

Arathi Krishna V, Anusree A, Blessy Jose, Karthika Anilkumar, Ojus Thomas Lee, 2021, Phishing Detection using Machine Learning based URL Analysis: A Survey, {INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH & TECHNOLOGY (IJERT) NCREIS – 2021} (Volume 09 – Issue 13),

Babu Rao Pawar, Nagasunder Rao Pawar. Detection of Phishing URL using Machine Learning. Diss. Dublin, National College of Ireland, 2021.

Benavides E., Fuertes W., Sanchez S., Sanchez M. 2020. Classification of Phishing Attack Solutions by Employing Deep Learning Techniques: A Systematic Literature Review. In: Rocha, Á., Pereira, R. (eds) Developments and Advances in Defense and Security. Smart Innovation, Systems and Technologies, vol 152. Springer, Singapore.

Feroz M. N., & Mengel S. 2014, October. Examination of data, rule generation and detection of phishing URLs using online logistic regression. In 2014 {IEEE International Conference on Big Data (Big Data)} (pp. 241-250). IEEE.

Jain A. K., & Gupta B. B. 2018. PHISH-SAFE: URL Features-Based Phishing Detection System Using Machine Learning.{Cyber Security}, 467–474. doi:10.1007/978-981-10-8536-9_44

Jeeva S.C., Rajsingh E.B. Intelligent phishing url detection using association rule mining. {Hum. Cent. Comput. Inf. Sci.} 6, 10 2016. https://doi.org/10.1186/s13673-016-0064-3

Kalaharsha P. and Babu M. Mehtre. "Detecting Phishing Sites--An Overview." {arXiv} preprint arXiv:2103.12739 2021.

M. N. Feroz and S. Mengel, "Phishing URL Detection Using URL Ranking," 2015 {IEEE International Congress on Big Data}, New York, NY, USA, 2015, pp. 635-638, doi: 10.1109/BigDataCongress.2015.97.

Mahajan, Rishikesh & Siddavatam, Irfan. 2018. Phishing Website Detection using Machine Learning Algorithms. {International Journal of Computer Applications.} 181. 45-47. 10.5120/ijca2018918026.

Opara, Chidimma & Chen, Yingke & wei, Bo. 2020. Look Before You Leap: Detecting Phishing Web Pages by Exploiting Raw URL And HTML Characteristics.

Orunsolu A. A., Sodiya A. S., & Akinwale A. T. 2022. A predictive model for phishing detection. {Journal of King Saud University-Computer and Information Sciences}, 34(2), 232-247.

Puri N., Saggar P., Kaur A., & Garg P. 2022, July. Application of ensemble Machine Learning models for phishing detection on web networks. In {2022 Fifth International Conference on Computational Intelligence and Communication Technologies (CCICT)} (pp. 296-303). IEEE.

Rao, Routhu Srinivasa, Tatti Vaishnavi, and Alwyn Roshan Pais. "CatchPhish: detection of phishing websites by inspecting URLs." {Journal of Ambient Intelligence and Humanized Computing} 11 2020: 813-825.

Ravindra, Salvi & Sanjay, Shah & Gulzar, Shaikh & Pallavi, Khodke. 2021. Phishing Website Detection Based on URL. {International Journal of Scientific Research in Computer Science, Engineering and Information Technology}. 589-594. 10.32628/CSEIT2173124.

Sahingoz O. K., Buber E., Demir O., & Diri B. 2019. Machine learning based phishing detection from URLs. {Expert Systems with Applications}, 117, 345-357.

Sánchez-Paniagua M., Fernández E. F., Alegre E., Al-Nabki W., & Gonzalez-Castro V. 2022. Phishing URL detection: A real-case scenario through login URLs. {IEEE Access}, 10, 42949-42960.

S. Patil and S. Dhage, "A Methodical Overview on Phishing Detection along with an Organized Way to Construct an Anti-Phishing Framework," {2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS)}, Coimbatore, India, 2019, pp. 588-593, doi: 10.1109/ICACCS.2019.8728356.

Tang L., & Mahmoud Q. H. 2021. A survey of machine learning-based solutions for phishing website detection. {Machine Learning and Knowledge Extraction}, 3(3), 672-694.

Ubing A. A., Jasmi S. K. B., Abdullah A., Jhanjhi N. Z., & Supramaniam M. 2019. Phishing website detection: An improved accuracy through feature selection and ensemble learning.

Vinodharan, Chandanaboina. 2023. Phishing attack and Phishing Url Detection. 10.13140/RG.2.2.19036.46728.

Wei W., Ke Q., Nowak J., Korytkowski M., Scherer R., & Woźniak M. 2020. Accurate and fast URL phishing detector: a convolutional neural network approach. {Computer Networks}, 178, 107275.

Downloads

Published

12.06.2024

How to Cite

Dhruv Gada. (2024). Indian Phishing Landscape: A Machine Learning and Deep Learning Approach for Detecting Malicious URLs and Curating an Indigenous Dataset. International Journal of Intelligent Systems and Applications in Engineering, 12(4), 1670–1679. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/6465

Issue

Section

Research Article