Machine Learning-Based Phishing Detection System
Keywords:
Feature Extraction, Machine Learning, Mendeley Dataset 2020, Phishing Detection, Random ForestAbstract
Phishing attacks are still one of the most important threats to cybersecurity, exploiting human weaknesses to illicitly obtain sensitive information such as credit card numbers, personal data, and passwords. These attacks are generally carried out by misleading emails or websites that mimic legitimate sources, which can have severe consequences, such as financial losses, identity theft and data breaches within the organization. To address this growing concern, we have developed a phishing detection system using a random forest (RF) model. The model has been trained on significant Mendeley datasets_2020 and has demonstrated considerable advantages in accurately detecting phishing attempts. By analyzing the critical features of the site's URL, the system can distinguish between legitimate and malicious sites. Our comprehensive evaluation showed a high 99.4% accuracy and makes it a reliable tool for phishing detection. We have integrated the system into Chrome's web extension, allowing real-time detection and improving user protection. The paper highlights the potential of machine learning in cybersecurity and offers opportunities for future research and development to improve phishing detection through advanced ML techniques and larger, more diverse datasets.
Downloads
References
M. K. Prabakaran, P. Meenakshi Sundaram, and A. D. Chandrasekar, “An enhanced deep learning-based phishing detection mechanism to effectively identify malicious URLs using variational autoencoders,” IET Inf. Secur., vol. 17, no. 3, pp. 423–440, 2023, doi: 10.1049/ise2.12106.
S. Alnemari and M. Alshammari, “Detecting Phishing Domains Using Machine Learning,” Appl. Sci., vol. 13, no. 8, Art. no. 8, Jan. 2023, doi: 10.3390/app13084649.
L. Tang and Q. H. Mahmoud, “A Survey of Machine Learning-Based Solutions for Phishing Website Detection,” Mach. Learn. Knowl. Extr., vol. 3, no. 3, Art. no. 3, Sep. 2021, doi: 10.3390/make3030034.
V. Shahrivari, M. M. Darabi, and M. Izadi, “Phishing Detection Using Machine Learning Techniques,” Sep. 20, 2020, arXiv: arXiv:2009.11116. doi: 10.48550/arXiv.2009.11116.
Y. Wei and Y. Sekiya, “Sufficiency of Ensemble Machine Learning Methods for Phishing Websites Detection,” IEEE Access, vol. 10, pp. 124103–124113, 2022, doi: 10.1109/ACCESS.2022.3224781.
H. Ali, M. Salleh, K. Hussain, A. Ullah, A. Ahmad, and R. Naseem, “A review on data preprocessing methods for class imbalance problem,” pp. 390–397, Oct. 2019, doi: 10.14419/ijet.v8i3.29508.
G. Vrbančič, “Phishing Websites Dataset.” Mendeley Data, Sep. 24, 2020. doi: 10.17632/72ptz43s9v.1.
S. Kapan and E. Sora Gunal, “Improved Phishing Attack Detection with Machine Learning: A Comprehensive Evaluation of Classifiers and Features,” Appl. Sci., vol. 13, no. 24, Art. no. 24, Jan. 2023, doi: 10.3390/app132413269.
S. Raschka, J. Patterson, and C. Nolet, “Machine Learning in Python: Main Developments and Technology Trends in Data Science, Machine Learning, and Artificial Intelligence,” Information, vol. 11, no. 4, Art. no. 4, Apr. 2020, doi: 10.3390/info11040193.
V. Chang, V. R. Bhavani, A. Q. Xu, and M. Hossain, “An artificial intelligence model for heart disease detection using machine learning algorithms,” Healthc. Anal., vol. 2, p. 100016, Nov. 2022, doi: 10.1016/j.health.2022.100016.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.