Cloud-Based Machine Learning Approach for Accurate Detection of Website Phishing

Authors

  • Saba Hussein Rashid College of Computer Science and Mathematics, Tikrit University, Salaheddin, IRAQ
  • Wisam Dawood Abdullah Asst. Prof., College of Computer Science and Mathematics, Tikrit University, Salaheddin, IRAQ

Keywords:

AWS, Cloud Machine Learning, Prediction Time, Website Phishing, XGBoost

Abstract

In recent years, the rapid growth of cyberspace has led to an increase in challenges related to information security. One of the most dangerous cyberattacks is website phishing, which is complex in nature and difficult to detect in real-time. Cloud Machine Learning has emerged as an effective approach for detecting website phishing by leveraging Cloud Computing Services to obtain accurate results quickly. Therefore, this study presents a Cloud Machine Learning method for evaluating and assessing the time required to detect website phishing using three SageMaker built-in algorithms: Extreme Gradient Boosting, Linear Learner, and k-Nearest Neighbor. Amazon Web Services is utilized for storage, training, evaluation, and online deployment over a large dataset of 11,430 samples and 89 features. The results indicate that Extreme Gradient Boosting outperformed the other two algorithms with an accuracy of 96.4% and an online prediction time of 0.0005 minutes, followed by Linear Learner with an accuracy of 94.4% and a prediction time of 0.0006 minutes. While k-Nearest Neighbor obtained the lowest accuracy score of 83.7% and the longest prediction time of 0.0008 minutes.

Downloads

Download data is not yet available.

References

M. G. Cains, L. Flora, D. Taber, Z. King, and D. S. Henshel, “Defining Cyber Security and Cyber Security Risk within a Multidisciplinary Context using Expert Elicitation,” Risk Analysis, 2021, doi: 10.1111/risa.13687.

A. Mishra, Y. I. Alzoubi, M. J. Anwar, and A. Q. Gill, “Attributes impacting cybersecurity policy development: An evidence from seven nations,” Comput Secur, vol. 120, p. 102820, Sep. 2022, doi: 10.1016/j.cose.2022.102820.

A. A.A. and P. K., “Towards the Detection of Phishing Attacks,” in 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI)(48184), IEEE, Jun. 2020, pp. 337–343. doi: 10.1109/ICOEI48184.2020.9142967.

O. K. Sahingoz, E. Buber, O. Demir, and B. Diri, “Machine learning based phishing detection from URLs,” Expert Syst Appl, vol. 117, pp. 345–357, Mar. 2019, doi: 10.1016/j.eswa.2018.09.029.

Md. S. I. Islam Prottasha, Md. Z. Rahman, A. K. Hossain, S. F. Mou, Md. B. Ahmed, and M. S. Kaiser, “Vote algorithm based probabilistic model for phishing website detection,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 28, no. 3, p. 1582, Dec. 2022, doi: 10.11591/ijeecs.v28.i3.pp1582-1591.

M. Korkmaz, O. K. Sahingoz, and B. Diri, “Detection of Phishing Websites by Using Machine Learning-Based URL Analysis,” in 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), IEEE, Jul. 2020, pp. 1–7. doi: 10.1109/ICCCNT49239.2020.9225561.

J. Kumar, A. Santhanavijayan, B. Janet, B. Rajendran, and B. S. Bindhumadhava, “Phishing Website Classification and Detection Using Machine Learning,” in 2020 International Conference on Computer Communication and Informatics (ICCCI), IEEE, Jan. 2020, pp. 1–6. doi: 10.1109/ICCCI48352.2020.9104161.

C. Gu, “A Lightweight Phishing Website Detection Algorithm by Machine Learning,” in 2021 International Conference on Signal Processing and Machine Learning (CONF-SPML), IEEE, Nov. 2021, pp. 245–249. doi: 10.1109/CONF-SPML54095.2021.00054.

N. S. Zaini et al., “Phishing detection system using nachine learning classifiers,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 17, no. 3, p. 1165, Mar. 2020, doi: 10.11591/ijeecs.v17.i3.pp1165-1171.

Y. Wei and Y. Sekiya, “Sufficiency of Ensemble Machine Learning Methods for Phishing Websites Detection,” IEEE Access, vol. 10, pp. 124103–124113, 2022, doi: 10.1109/ACCESS.2022.3224781.

S. Das Guptta, K. T. Shahriar, H. Alqahtani, D. Alsalman, and I. H. Sarker, “Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques,” Annals of Data Science, Mar. 2022, doi: 10.1007/s40745-022-00379-8.

A. Nandi Tultul, R. Afroz, and M. A. Hossain, “Comparison of the efficiency of machine learning algorithms for phishing detection from uniform resource locator,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 28, no. 3, p. 1640, Dec. 2022, doi: 10.11591/ijeecs.v28.i3.pp1640-1648.

A. A. Orunsolu, A. S. Sodiya, and A. T. Akinwale, “A predictive model for phishing detection,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 2, pp. 232–247, Feb. 2022, doi: 10.1016/j.jksuci.2019.12.005.

I. Saha, D. Sarma, R. J. Chakma, M. N. Alam, A. Sultana, and S. Hossain, “Phishing attacks detection using deep learning approach,” in Proceedings of the 3rd International Conference on Smart Systems and Inventive Technology, ICSSIT 2020, Institute of Electrical and Electronics Engineers Inc., Aug. 2020, pp. 1180–1185. doi: 10.1109/ICSSIT48917.2020.9214132.

P. Mccaffrey, “Cloud technologies,” in An Introduction to Healthcare Informatics, Elsevier, 2020, pp. 307–316. doi: 10.1016/B978-0-12-814915-7.00021-1.

K. Swedha and T. Dubey, “Analysis of Web Authentication Methods Using Amazon Web Services,” in 2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT), IEEE, Jul. 2018, pp. 1–6. doi: 10.1109/ICCCNT.2018.8494054.

H. Singh, Practical Machine Learning with AWS. Berkeley, CA: Apress, 2021. doi: 10.1007/978-1-4842-6222-1.

E. S. Gualberto, R. T. de Sousa, T. P. B. de Vieira, J. P. C. L. da Costa, and C. G. Duque, “From Feature Engineering and Topics Models to Enhanced Prediction Rates in Phishing Detection,” IEEE Access, vol. 8, pp. 76368–76385, 2020, doi: 10.1109/ACCESS.2020.2989126.

X. Y. Liew, N. Hameed, and J. Clos, “An investigation of XGBoost-based algorithm for breast cancer classification,” Machine Learning with Applications, vol. 6, p. 100154, Dec. 2021, doi: 10.1016/j.mlwa.2021.100154.

M. Chen, Q. Liu, S. Chen, Y. Liu, C. H. Zhang, and R. Liu, “XGBoost-Based Algorithm Interpretation and Application on Post-Fault Transient Stability Status Prediction of Power System,” IEEE Access, vol. 7, pp. 13149–13158, 2019, doi: 10.1109/ACCESS.2019.2893448.

J. Cao, J. Gao, H. Nikafshan Rad, A. S. Mohammed, M. Hasanipanah, and J. Zhou, “A novel systematic and evolved approach based on XGBoost-firefly algorithm to predict Young’s modulus and unconfined compressive strength of rock,” Eng Comput, vol. 38, no. S5, pp. 3829–3845, Dec. 2022, doi: 10.1007/s00366-020-01241-2.

X. Y. Liew, N. Hameed, and J. Clos, “An investigation of XGBoost-based algorithm for breast cancer classification,” Machine Learning with Applications, vol. 6, p. 100154, Dec. 2021, doi: 10.1016/j.mlwa.2021.100154.

Aishwarya V, “Binary Classification Model for Fraudulent Credit Card Transactions,” IOSR J Comput Eng, vol. 22, no. 3, pp. 38–45, 2020, doi: 10.9790/0661-2203023845.

N. F. Abedin, R. Bawm, T. Sarwar, M. Saifuddin, M. A. Rahman, and S. Hossain, “Phishing Attack Detection using Machine Learning Classification Techniques,” in 2020 3rd International Conference on Intelligent Sustainable Systems (ICISS), IEEE, Dec. 2020, pp. 1125–1130. doi: 10.1109/ICISS49785.2020.9315895.

M. Rastogi, A. Chhetri, D. K. Singh, and G. Rajan V, “Survey on Detection and Prevention of Phishing Websites using Machine Learning,” in 2021 International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), IEEE, Mar. 2021, pp. 78–82. doi: 10.1109/ICACITE51222.2021.9404714.

A. Niranjan, D. K. Haripriya, R. Pooja, S. Sarah, P. Deepa Shenoy, and K. R. Venugopal, “EKRV: Ensemble of kNN and Random Committee Using Voting for Efficient Classification of Phishing,” in Advances in Intelligent Systems and Computing, Springer Verlag, 2019, pp. 403–414. doi: 10.1007/978-981-13-1708-8_37.

R. D. Prayogo and S. A. Karimah, “Optimization of Phishing Website Classification Based on Synthetic Minority Oversampling Technique and Feature Selection,” in 2020 International Workshop on Big Data and Information Security (IWBIS), IEEE, Oct. 2020, pp. 121–126. doi: 10.1109/IWBIS50925.2020.9255562.

The implementation stages of the research methodology

Downloads

Published

17.05.2023

How to Cite

Rashid, S. H. ., & Abdullah, W. D. . (2023). Cloud-Based Machine Learning Approach for Accurate Detection of Website Phishing. International Journal of Intelligent Systems and Applications in Engineering, 11(6s), 451 –. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/2870

Issue

Section

Research Article