Indian Phishing Landscape: A Machine Learning and Deep Learning Approach for Detecting Malicious URLs and Curating an Indigenous Dataset
Keywords:
CNN, Cyber Security, Deep Learning, Indian Malicious Website Dataset, LSTM, Machine Learning, RNN, URL PhishingAbstract
Owing to the advancing technologies, activities like browsing the web for critical information, posting on social media and preference for online transactions have increased tremendously. At such a time, it is quite easy to fall into the trap of phishing websites that clone legitimate sites. The paper implements cutting-edge, remarkable Machine Learning techniques which uses Decision Tree algorithm along with Random Forest Classifier, SGD, Logistic Regression, K-Neighbors and Deep Learning methods like CNN, RNN and LSTM on two datasets to classify the online websites as ‘phishing’ and ‘benign’. One of the dataset used is curated to focus on Indian Phishing Website data thus following a more targeted approach in the detection of malicious websites and allowing a proactive defense against attackers. The paper slices and categorizes the data by URL length, top-level domain, symbol counting and other hyper-parameters for seamless feature extraction. The paper also uses TF-IDF for metrics generation as it overcomes limitations of the simple frequency counts and creates a distinctive categorization of the input data values. The fusion of ML and DL techniques to achieve robust cybersecurity, effective training and testing of data on authenticated datasets and a personalized curation of a dataset for top Indian Phishing Websites is how the paper bridges the gap between the research till date.
Downloads
References
Aljofey A., Jiang Q., Qu Q., Huang M., & Niyigena J. P. 2020. An effective phishing detection model based on character level convolutional neural network from URL. {Electronics}, 9(9), 1514.
Alshingiti Z., Alaqel R., Al-Muhtadi J., Haq Q.E.U., Saleem K., Faheem M.H. 2022. A Deep Learning-Based Phishing Detection System Using CNN, LSTM, and LSTM-CNN. {Electronics} 2023, 12, 232.
Altaher A. 2017. Phishing websites classification using hybrid SVM and KNN approach.{ International Journal of Advanced Computer Science and Applications}, 8(6)..
Arathi Krishna V, Anusree A, Blessy Jose, Karthika Anilkumar, Ojus Thomas Lee, 2021, Phishing Detection using Machine Learning based URL Analysis: A Survey, {INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH & TECHNOLOGY (IJERT) NCREIS – 2021} (Volume 09 – Issue 13),
Babu Rao Pawar, Nagasunder Rao Pawar. Detection of Phishing URL using Machine Learning. Diss. Dublin, National College of Ireland, 2021.
Benavides E., Fuertes W., Sanchez S., Sanchez M. 2020. Classification of Phishing Attack Solutions by Employing Deep Learning Techniques: A Systematic Literature Review. In: Rocha, Á., Pereira, R. (eds) Developments and Advances in Defense and Security. Smart Innovation, Systems and Technologies, vol 152. Springer, Singapore.
Feroz M. N., & Mengel S. 2014, October. Examination of data, rule generation and detection of phishing URLs using online logistic regression. In 2014 {IEEE International Conference on Big Data (Big Data)} (pp. 241-250). IEEE.
Jain A. K., & Gupta B. B. 2018. PHISH-SAFE: URL Features-Based Phishing Detection System Using Machine Learning.{Cyber Security}, 467–474. doi:10.1007/978-981-10-8536-9_44
Jeeva S.C., Rajsingh E.B. Intelligent phishing url detection using association rule mining. {Hum. Cent. Comput. Inf. Sci.} 6, 10 2016. https://doi.org/10.1186/s13673-016-0064-3
Kalaharsha P. and Babu M. Mehtre. "Detecting Phishing Sites--An Overview." {arXiv} preprint arXiv:2103.12739 2021.
M. N. Feroz and S. Mengel, "Phishing URL Detection Using URL Ranking," 2015 {IEEE International Congress on Big Data}, New York, NY, USA, 2015, pp. 635-638, doi: 10.1109/BigDataCongress.2015.97.
Mahajan, Rishikesh & Siddavatam, Irfan. 2018. Phishing Website Detection using Machine Learning Algorithms. {International Journal of Computer Applications.} 181. 45-47. 10.5120/ijca2018918026.
Opara, Chidimma & Chen, Yingke & wei, Bo. 2020. Look Before You Leap: Detecting Phishing Web Pages by Exploiting Raw URL And HTML Characteristics.
Orunsolu A. A., Sodiya A. S., & Akinwale A. T. 2022. A predictive model for phishing detection. {Journal of King Saud University-Computer and Information Sciences}, 34(2), 232-247.
Puri N., Saggar P., Kaur A., & Garg P. 2022, July. Application of ensemble Machine Learning models for phishing detection on web networks. In {2022 Fifth International Conference on Computational Intelligence and Communication Technologies (CCICT)} (pp. 296-303). IEEE.
Rao, Routhu Srinivasa, Tatti Vaishnavi, and Alwyn Roshan Pais. "CatchPhish: detection of phishing websites by inspecting URLs." {Journal of Ambient Intelligence and Humanized Computing} 11 2020: 813-825.
Ravindra, Salvi & Sanjay, Shah & Gulzar, Shaikh & Pallavi, Khodke. 2021. Phishing Website Detection Based on URL. {International Journal of Scientific Research in Computer Science, Engineering and Information Technology}. 589-594. 10.32628/CSEIT2173124.
Sahingoz O. K., Buber E., Demir O., & Diri B. 2019. Machine learning based phishing detection from URLs. {Expert Systems with Applications}, 117, 345-357.
Sánchez-Paniagua M., Fernández E. F., Alegre E., Al-Nabki W., & Gonzalez-Castro V. 2022. Phishing URL detection: A real-case scenario through login URLs. {IEEE Access}, 10, 42949-42960.
S. Patil and S. Dhage, "A Methodical Overview on Phishing Detection along with an Organized Way to Construct an Anti-Phishing Framework," {2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS)}, Coimbatore, India, 2019, pp. 588-593, doi: 10.1109/ICACCS.2019.8728356.
Tang L., & Mahmoud Q. H. 2021. A survey of machine learning-based solutions for phishing website detection. {Machine Learning and Knowledge Extraction}, 3(3), 672-694.
Ubing A. A., Jasmi S. K. B., Abdullah A., Jhanjhi N. Z., & Supramaniam M. 2019. Phishing website detection: An improved accuracy through feature selection and ensemble learning.
Vinodharan, Chandanaboina. 2023. Phishing attack and Phishing Url Detection. 10.13140/RG.2.2.19036.46728.
Wei W., Ke Q., Nowak J., Korytkowski M., Scherer R., & Woźniak M. 2020. Accurate and fast URL phishing detector: a convolutional neural network approach. {Computer Networks}, 178, 107275.
Downloads
Published
How to Cite
Issue
Section
License
![Creative Commons License](http://i.creativecommons.org/l/by-sa/4.0/88x31.png)
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.