Improved Detection of Phishing Websites using Machine Learning

Authors

  • Sumo Sami M Aldaham, Osama Ouda, A.A. Abd El-Aziz,

Keywords:

Website Phishing Detection; Machine Learning; Cybersecurity; Support Vector Machine; Decision Tree; Artificial Neural Networks

Abstract

Phishing attacks pose a significant threat in the cyber landscape, compromising the security of millions by exploiting trust in seemingly legitimate websites. These attacks deceive users into divulging sensitive information, posing substantial challenges to both individual and organizational security. The sophistication of phishing tactics, such as spear phishing and whaling, necessitates advanced detection methods beyond traditional rule-based systems. This paper addresses this issue by employing machine learning techniques to accurately identify and classify phishing websites. We deployed various machine learning models, including Decision Tree, Support Vector Machine (SVM), Artificial Neural Network (ANN), and Random Forest (RF), rigorously testing and evaluating their efficacy in detecting phishing attacks. The dataset used in thisaper was sourced from PhishTank.org, providing a real-world context for our models. Preprocessing steps included artifact removal, normalization, and handling data inconsistencies to enhance model performance. These steps ensure that the models processed the most relevant and accurate information, improving their ability to differentiate between legitimate and malicious websites. The results of this study are promising, as the decision tree model showed the highest accuracy at 96.7%, followed by the random forest model at 95.75%. These results confirm the ability of these models to effectively detect phishing sites. The ANN model, despite the challenges of overfitting, highlighted the potential of deep learning in this area, suggesting that with further fine-tuning and regularization, it could provide more powerful detection capabilities. The SVM model's low accuracy of 83.8% was not sufficient. Instead, it provided important insights into what types of phishing strategies require different or more precise detection methods. This finding is critical for developing more targeted models in the future paper.

Downloads

Download data is not yet available.

References

A. K. Dutta, "Detecting Phishing Websites Using Machine Learning Technique," PLoS ONE, vol. 16, no. 10, e0258361, 2021. [Online]. Available: https://doi.org/10.1371/journal.pone.0258361.

Jain A.K., Gupta B.B. “PHISH-SAFE: URL Features-Based Phishing Detection System Using Machine Learning”, Cyber Security. Advances in Intelligent Systems and Computing, vol. 729, 2018, https://doi.org/10.1007/978-981-10-8536-9_44

Purbay M., Kumar D, “Split Behavior of Supervised Machine Learning Algorithms for Phishing URL Detection”, Lecture Notes in Electrical Engineering, vol. 683, 2021, https://doi.org/10.1007/978-981-15-6840-4_40

Gandotra E., Gupta D, “An Efficient Approach for Phishing Detection using Machine Learning”, Algorithms for Intelligent Systems, Springer, Singapore, 2021, https://doi.org/10.1007/978-981-15-8711-5_12.

Hung Le, Quang Pham, Doyen Sahoo, and Steven C.H. Hoi, “URLNet: Learning a URL Representation with Deep Learning for Malicious URL Detection”, Conference’17, Washington, DC, USA, arXiv:1802.03162, July 2017.

Hong J., Kim T., Liu J., Park N., Kim SW, “Phishing URL Detection with Lexical Features and Blacklisted Domains”, Autonomous Secure Cyber Systems. Springer, https://doi.org/10.1007/978-3-030-33432-1_12.

J. Kumar, A. Santhanavijayan, B. Janet, B. Rajendran and B. S. Bindhumadhava, “Phishing Website Classification and Detection Using Machine Learning,” 2020 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 2020, pp. 1–6, 10.1109/ICCCI48352.2020.9104161.

"Hassan Y.A. and Abdelfettah B, "Using case-based reasoning for phishing detection", Procedia Computer Science, vol. 109, 2017, pp. 281–288." (“[1] F. Yahya et al., Detection of Phishing Websites ... - ResearchGate”)

Rao RS, Pais AR. Jail-Phish: An improved search engine-based phishing detection system. Computers & Security. 2019 Jun 1; 83:246–67.

"Aljofey A, Jiang Q, Qu Q, Huang M, Niyigena JP." (“Prediction of Phishing Websites Using Stacked Ensemble ... - Springer”) An effective phishing detection model based on the character-level convolutional neural network from URL. Electronics. 2020 Sep; 9(9):1514.

AlEroud A, Karabatis G. Bypassing detection of URL-based phishing attacks using generative adversarial deep neural networks. In: Proceedings of the Sixth International Workshop on Security and Privacy Analytics 2020 Mar 16 (pp. 53–60).

R. Verma and N. Dyer, "Detection of Phishing Websites Using a Novel Twofold Ensemble Model," IEEE Access, vol. 7, pp. 114134-114145, 2019.

H. R. Shahriar, M. Zulkernine, and S. M. Farhad, "PhishDef: URL names say it all," IEEE Trans. Netw. Serv. Manag., vol. 17, no. 1, pp. 498-511, Mar. 2020.

B. B. Gupta, A. Tewari, D. Jain, and M. Agrawal, "Fighting against phishing attacks: state of the art and future challenges," Neural Comput. Appl., vol. 31, no. 12, pp. 9143-9169, Dec. 2020.

[14] K. R. Choo, "Cryptocurrency phishing and scams: Attack vectors, impacts, and a way forward," IEEE Access, vol. 8, pp. 67512-67525, 2020.

L. Zhang, S. Tan, and J. Yang, "URLNet: Learning a URL representation with deep learning for malicious URL detection," IEEE Access, vol. 8, pp. 1776-1786, 2020.

M. Sharif, S. Bhagavatula, L. Bauer, and M. K. Reiter, "Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition," in Proc. ACM SIGSAC Conf. Comput. Commun. Secur., Dallas, TX, USA, 2019, pp. 1528-1540.

A. D. Nguyen, M. L. Nguyen, and N. G. Nguyen, "Deep learning for deepfakes creation and detection: A survey," IEEE Access, vol. 9, pp. 139877-139907, 2021.

S. M. Al-Rawahi and M. S. Al-Fahdi, "Using machine learning techniques for rising phishing attacks on social networks," in Proc. IEEE Conf. on Application, Information and Network Security (AINS), Muscat, Oman, 2020, pp. 1-6.

A. N. Khan, M. Kiah, S. A. Madani, S. Ali, and M. Shamshirband, "Phishing attacks detection using machine learning and deep learning techniques: A review," Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 6, 2019.

E. Sitnikova, "Phishing in the era of advanced cyber threats," in Cybersecurity Education for Awareness and Compliance, IGI Global, 2019, pp. 28-50.

Downloads

Published

26.03.2024

How to Cite

Sumo Sami M Aldaham, Osama Ouda, A.A. Abd El-Aziz,. (2024). Improved Detection of Phishing Websites using Machine Learning. International Journal of Intelligent Systems and Applications in Engineering, 12(21s), 4619–4633. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/6351

Issue

Section

Research Article