Enhancing Predictive Accuracy in Phishing Attack Detection: A Study on the Impact of Collinearity and Feature Selection in ML-based Logistic Regression Models
Keywords:
Phishing URL, Machine Learning, Logistic Regression, Collinearity.Abstract
Phishing threats present dangers, for people and businesses alike emphasizing the need, for creating reliable detection techniques. It is crucial to establish phishing tactics to protect confidential data and avoid monetary damages. This study delves deeper into the intricacies of logistic regression models and how these models could effectively detect phishing attacks with a focus on impact of factors like collinearity and feature selection on predictive accuracy and model performance. In addition to logistic regression, different machine learning models, such as Decision Tree Classifier, Gaussian Naive Bayes, Logistic Regression, K Nearest Neighbors and Linear Discriminant Analysis were also considered to analyze the relationships between predictor variables and successful phishing attack likelihood and the predictive accuracy from each of the methods. By conducting experiments and comparisons we show that addressing collinearity issues and employing feature selection techniques significantly improve the predictive accuracy of logistic regression models compared to other common machine learning models. Through a methodical process of feature engineering focused on addressing collinearity among predictors, we achieved a substantial reduction of over 35% in the false negative rate for the logistic regression model which is crucial as false negatives are more costly. These findings provide insights, for enhancing the efficiency of phishing detection systems to strengthen cybersecurity defenses against emerging threats.
Downloads
References
Adeyemo, V.E., Balogun, A.O., Mojeed, H.A., Akande, N.O., Adewole, K.S. (2021). Ensemble-Based Logistic Model Trees for Website Phishing Detection. In: Anbar, M., Abdullah, N., Manickam, S. (eds) Advances in Cyber Security. ACeS 2020. Communications in Computer and Information Science, vol 1347. Springer, Singapore.
Moedjahedy, J., Setyanto, A., Alarfaj, F. K., & Alreshoodi, M. (2022). CCrFS: Combine Correlation Features Selection for Detecting Phishing Websites Using Machine Learning. Future Internet, 14(8), 229.
Vajrobol, V., Gupta, B. B., & Gaurav, A. (2024). Mutual information based logistic regression for phishing URL detection. Cyber Security and Applications, 2, 100044.
Chiramdasu, R., Srivastava, G., Bhattacharya, S., Reddy, P. K., & Gadekallu, T. R. (2021, August). Malicious url detection using logistic regression. In 2021 IEEE International Conference on Omni-Layer Intelligent Systems (COINS) (pp. 1-6). IEEE.
Sarma, D., Mittra, T., Bawm, R. M., Sarwar, T., Lima, F. F., & Hossain, S. (2021). Comparative analysis of machine learning algorithms for phishing website detection. In Inventive Computation and Information Technologies: Proceedings of ICICIT 2020 (pp. 883-896). Springer Singapore.
Abedin, N. F., Bawm, R., Sarwar, T., Saifuddin, M., Rahman, M. A., & Hossain, S. (2020, December). Phishing attack detection using machine learning classification techniques. In 2020 3rd International Conference on Intelligent Sustainable Systems (ICISS) (pp. 1125-1130). IEEE.
Prasad,Arvind and Chandra,Shalini. (2024). PhiUSIIL Phishing URL (Website). UCI Machine Learning Repository. https://doi.org/10.1016/j.cose.2023.103545.
Midi, H., Sarkar, S. K., & Rana, S. (2010). Collinearity diagnostics of binary logistic regression model. Journal of Interdisciplinary Mathematics, 13(3), 253–267.
Ben-Farag, S. O., & El-Saeiti, I. N. (2022) Effect and Influence of Class Imbalance and Multicollinearity in Binary Logistic Regression (A Comparative Simulation Study).
Alin, A. (2010). Multicollinearity. Wiley interdisciplinary reviews: computational statistics, 2(3), 370-374.
Downloads
Published
How to Cite
Issue
Section
License
![Creative Commons License](http://i.creativecommons.org/l/by-sa/4.0/88x31.png)
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.