An Identification and Analysis of Harmful URLs through the Application of Machine Learning Techniques

Authors

  • Swagat M. Karve Assistant Professor, S. B. Patil College of Engineering, Indapur (MH), India
  • Shital Kakad Assistant Professor, Information Technology Department, Marathwada Mitra Mandal College of Engineering, Pune
  • Swapnaja Amol Associate Professor, Information Technology Department, Marathwada Mitra Mandal College of Engineering, Pune
  • Ashwini B. Gavali Assistant Professor, S. B. Patil College of Engineering, Indapur (MH), India
  • Sonali B. Gavali Asssistant Professor, Dr.D.Y.Patil, Institute of Technology, Pimpri,Pune
  • Shrinivas T. Shirkande Assistant Professor, S.B.Patil College of Engineering, Indapur (MH), India

Keywords:

Malicious URLs, Cyber security, Malware, Phishing, Machine learning, Deep learning

Abstract

Malicious URLs pose a significant cyber security threat, posing risks to user security and causing substantial financial losses. Traditional detection methods relying on blacklists are limited in addressing rapidly evolving threats. As a response, machine learning approaches have gained popularity for enhancing the efficiency of malicious URL detection. This paper presents a detailed analysis, offering a structured insight into various aspects and formally defining the machine learning task of identifying malicious URLs. It delves into feature representation, algorithm design. The objective of survey is to provide a detailed analysis of harmful URLS not only to researchers but to cyber security experts.

Downloads

Download data is not yet available.

References

P. Ashwini and N. Vadivelan, "Security from phishing attack on internet using evolving fuzzy neural network," CVR Journal of Science and Technology, vol. 20, no. 1, pp. 50-55, 2021.

D. Sahoo, C. Liu, and S. C. Hoi, "Malicious URL detection using machine learning: A survey," arXiv preprint arXiv:1701.07179, 2017.

H. V. S. Aalla, N. R. Dumpala, and M. Eliazer, "Malicious URL prediction using machine learning techniques," Annals of the Romanian Society for Cell Biology, pp. 2170-2176, 2021.

M. Aljabri et al., "Detecting malicious URLs using machine learning techniques: review and research directions," IEEE Access, 2022.

J. Yuan, Y. Liu, and L. Yu, "A novel approach for malicious URL detection based on the joint model," Security and Communication Networks, pp. 1-12, 2021.

G. N. Anil, "Detection of phishing websites based on feature extraction using machine learning," International Research Journal of Engineering and Technology (IRJET), 2020.

J. Liu, "Lexical Features of Economic Legal Policy and News in China Since the COVID-19 Outbreak," Frontiers in Public Health, vol. 10, p. 928965, 2022.

A. Joshi, L. Lloyd, P. Westin, and S. Seethapathy, "Using lexical features for malicious URL detection--a machine learning approach," arXiv preprint arXiv:1910.06277, 2019.

TechTarget. [Online].

Available: https://www.techtarget.com/

H. Choi, B. B. Zhu, and H. Lee, "Detecting malicious web links and identifying their attack types," in 2nd USENIX Conference on Web Application Development (WebApps 11), 2011.

C. Johnson, B. Khadka, R. B. Basnet, and T. Doleck, "Towards Detecting and Classifying Malicious URLs Using Deep Learning," J. Wirel. Mob. Networks Ubiquitous Comput. Dependable Appl., vol. 11, no. 4, pp. 31-48, 2020.

M. Cova, C. Kruegel, and G. Vigna, "Detection and analysis of drive-by-download attacks and malicious Javascript code," in Proc. 19th Int. Conf. World Wide Web (WWW), 2010, pp. 281–290, doi:10.1145/1772690.1772720.

M. Sánchez-Paniagua et al., "Phishing URL detection: A real-case scenario through login URLs," IEEE Access, vol. 10, pp. 42949-42960, 2022.

A. Pandey and J. Chadawar, "Phishing URL Detection using Hybrid Ensemble Model," international journal of engineering research & technology (IJERT), vol. 11, no. 04, 2022.

M. Romagna and N. J. van den Hout, "Hacktivism and website defacement: motivations, capabilities and potential threats," in 27th virus bulletin international conference, 2017, pp. 1-10.

M. Romagna and N. J. van den Hout, "Hacktivism and website defacement: motivations, capabilities and potential threats," in 27th virus bulletin international conference, 2017, pp. 1-10.

P. Chang, "Multi-Layer Perceptron Neural Network for Improving Detection Performance of Malicious Phishing URLs Without Affecting Other Attack Types Classification," arXiv preprint arXiv:2203.00774, 2022.

H. A. Tariq, W. Yang, I. Hameed, B. Ahmed, and R. U. Khan, "USING black-list and white-list technique to detect malicious URLs," IJIRIS::International Journal of Innovative Research Journal in Information Security, vol. 4, pp. 01-07, 2017.

R. Kumar, X. Zhang, H. A. Tariq, and R. U. Khan, "Malicious URL detection using multi-layer filtering model," in 2017 14th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), 2017, pp. 97-100, doi: 10.1109/ICCWAMTIP.2017.8301457.

W. Chu et al., "Protect sensitive sites from phishing attacks using features extractable from inaccessible phishing URLs," in 2013 IEEE international conference on communications (ICC), 2013, pp. 1990-1994.

C. Seifert, I. Welch, and P. Komisarczuk, "Identification of malicious web pages with static heuristics," in 2008 Australasian Telecommunication Networks and Applications Conference, 2008, pp. 91-96.

L. A. T. Nguyen, B. L. To, H. K. Nguyen, and M. H. Nguyen, "A novel approach for phishing detection using URL-based heuristic," in 2014 international conference on computing, management and telecommunications (ComManTel), 2014, pp. 298-303.

M. G. Schultz, E. Eskin, F. Zadok, and S. J. Stolfo, "Data mining methods for detection of new malicious executables," in Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001, 2001, pp. 38-49.

R. A. Lekshmi and S. Thomas, "Detecting malicious URLs using machine learning techniques: a comparative literature review," Int Res J Eng Technol (IRJET), vol. 6, no. 06, 2019.

Y. Wang, "Malicious URL Detection An Evaluation of Feature Extraction and Machine Learning Algorithm," Highlights in Science, Engineering and Technology, vol. 23, pp. 117-123, 2022.

S. Abad, H. Gholamy, and M. Aslani, "Classification of Malicious URLs Using Machine Learning," Sensors, vol. 23, no. 18, p. 7760, 2023.

M. Almuhaideb and M. Anwar, "A URL-Based Social Semantic Attacks Detection With Character-Aware Language Model," IEEE Access, vol. 11, pp. 10654-10663, 2023.

M. Aljabri et al., "An assessment of lexical, network, and content-based features for detecting malicious urls using machine learning and deep learning models," Computational Intelligence and Neuroscience, vol. 2022, 2022.

S. He et al., "An effective cost-sensitive XGBoost method for malicious URLs detection in imbalanced dataset," IEEE Access, vol. 9, pp. 93089-93096, 2021.

A. Maci et al., "Unbalanced Web Phishing Classification through Deep Reinforcement Learning," Computers, vol. 12, no. 6, p. 118, 2023.

U. S. DR, A. Patil, and M. Mohana, "Malicious URL Detection and Classification Analysis using Machine Learning Models," in 2023 International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT), 2023, pp. 470-476.

X. Cho, D. Hoa, and V. Tisenko, "Malicious url detection based on machine learning," International Journal of Advanced Computer Science and Applications, 2020.

R. Patgiri et al., "Empirical study on malicious URL detection using machine learning," in Distributed Computing and Internet Technology: 15th International Conference, ICDCIT 2019, Bhubaneswar, India, January 10–13, 2019, Proceedings 15, 2019, pp. 380-388.

X. Tong et al., "MM-ConvBERT-LMS: Detecting Malicious Web Pages via Multi-Modal Learning and Pre-Trained Model," Applied Sciences, vol. 13, no. 5, p. 3327, 2023.

N. Nagy et al., "Phishing URLs Detection Using Sequential and Parallel ML Techniques: Comparative Analysis," Sensors, vol. 23, no. 7, p. 3467, 2023.

M. Alsaedi et al., "Cyber threat intelligence-based malicious url detection model using ensemble learning," Sensors, vol. 22, no. 9, p. 3373, 2022.

B. Wei et al., "A deep-learning-driven light-weight phishing detection sensor," Sensors, vol. 19, no. 19, p. 4258, 2019.

C. Hajaj, N. Hason, and A. Dvir, "Less is more: Robust and novel features for malicious domain detection," Electronics, vol. 11, no. 6, p. 969, 2022.

M. Umer et al., "Deep Learning-Based Intrusion Detection Methods in Cyber-Physical Systems: Challenges and Future Trends," Electronics, vol. 11, no. 20, p. 3326, 2022.

M. Elsadig et al., "Intelligent Deep Machine Learning Cyber Phishing URL Detection Based on BERT Features Extraction," Electronics, vol. 11, no. 22, p. 3647, 2022.

S. R. Abdul Samad et al., "Analysis of the Performance Impact of Fine-Tuned Machine Learning Model for Phishing URL Detection," Electronics, vol. 12, no. 7, p. 1642, 2023.

K. Sandra, L. ChaeHo, and S. G. Lee, "Malicious URL Detection Based on Associative Classification," entropy, 2021.

S. S. Roy et al., "Multimodel phishing url detection using lstm, bidirectional lstm, and gru models," Future Internet, vol. 14, no. 11, p. 340, 2022.

K. Fotiadou et al., "Network traffic anomaly detection via deep learning," Information, vol. 12, no. 5, p. 215, 2021.

A. M. Almuhaideb et al., "Homoglyph Attack Detection Model Using Machine Learning and Hash Function," Journal of Sensor and Actuator Networks, vol. 11, no. 3, p. 54, 2022.

J. Saxe and K. Berlin, "eXpose: A character-level convolutional neural network with embeddings for detecting malicious URLs, file paths and registry keys," arXiv preprint arXiv:1702.08568, 2017.

O. Christou et al., "Phishing URL Detection Through Top-level Domain Analysis: A Descriptive Approach," arXiv 2020, arXiv preprint arXiv:2005.06599.

S. Lee and J. Kim, "Warningbird: Detecting suspicious urls in twitter stream," in Ndss, 2012, pp. 1-13.

S. P. Tung et al., "Using a machine learning model for malicious url type detection," in International Conference on Next Generation Wired/Wireless Networking, 2021, pp. 493-505.

A. Jain and B. B. Gupta, "A novel approach to protect against phishing attacks at client side using auto-updated white-list," EURASIP Journal on Information Security, vol. 2016, 2016.

R. B. Basnet, A. H. Sung, and Q. Liu, "Learning to detect phishing URLs," International Journal of Research in Engineering and Technology, vol. 3, no. 6, pp. 11-24, 2014.

Y. A. Alsariera et al., "AI meta-learners and extra-trees algorithm for the detection of phishing websites," IEEE access, vol. 8, pp. 142532-142542, 2020.

L. Liu et al., "Learning-Based Detection for Malicious Android Application Using Code Vectorization," Security and Communication Networks, vol. 2021, pp. 1-11, 2021.

T. D. Diwan et al., "Feature entropy estimation (FEE) for malicious IoT traffic and detection using machine learning," Mobile Information Systems, vol. 2021, pp. 1-13, 2021.

Y. Wang et al., "A combined static and dynamic analysis approach to detect malicious browser extensions," Security and Communication Networks, vol. 2018, 2018.

Y. Song et al., "Permission Sensitivity-Based Malicious Application Detection for Android," Security and Communication Networks, vol. 2021, pp. 1-12, 2021.

N. Khan, J. Abdullah, and A. S. Khan, "Defending malicious script attacks using machine learning classifiers," Wireless Communications and Mobile Computing, vol. 2017, 2017.

H. Zhao, Z. Chang, G. Bao, and X. Zeng, "Malicious domain names detection algorithm based on N-gram," Journal of Computer Networks and Communications, vol. 2019, 2019.

G. Gomez, P. Kotzias, M. Dell’Amico, L. Bilge, and J. Caballero, "Unsupervised detection and clustering of malicious TLS flows," Security and Communication Networks, 2023.

Y. Zhao, B. Bo, Y. Feng, C. Xu, and B. Yu, "A feature extraction method of hybrid gram for malicious behavior based on machine learning," Security and Communication Networks, 2019.

S. A. Kamran, S. Sengupta, and A. Tavakkoli, "Semi-supervised conditional GAN for simultaneous generation and detection of phishing URLs: A game theoretic perspective," arXiv preprint arXiv:2108.01852, 2021.

J. Ispahany and R. Islam, "Detecting malicious URLs of COVID-19 pandemic using ML technologies," arXiv preprint arXiv:2009.09224, 2020.

X. Yu, "Phishing websites detection based on hybrid model of deep belief network and support vector machine," in IOP Conference Series: Earth and Environmental Science, vol. 602, no. 1, pp. 012001, IOP Publishing, November 2020.

F. A. Aboaoja, A. Zainal, F. A. Ghaleb, B. A. S. Al-rimy, T. A. E. Eisa, and A. A. H. Elnour, "Malware detection issues, challenges, and future directions: A survey," Applied Sciences, vol. 12, no. 17, pp. 8482, 2022.

Downloads

Published

23.02.2024

How to Cite

Karve, S. M. ., Kakad , S. ., Swapnaja Amol, Gavali, A. B. ., Gavali , S. B. ., & Shirkande, S. T. . (2024). An Identification and Analysis of Harmful URLs through the Application of Machine Learning Techniques. International Journal of Intelligent Systems and Applications in Engineering, 12(17s), 456–468. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/4905

Issue

Section

Research Article