Classification of Malicious Android Applications Using Naive Bayes and Support Vector Machine Algorithms



classification, machine learning, malware detection, naive bayes, support vector machine


As the use of smart devices increases, the number of malicious software is also increasing day by day. Android is the most used operating system in smart devices. That's why there is a lot of malware targeting this platform. By examining the permission properties of malicious software, it can be determined whether it is malicious or not. However, this is a complex problem. In order to solve this problem, in this study, classification processes have been carried out to determine whether the software is harmful with machine learning methods. For this purpose, a dataset containing 2854 malicious software and 2870 harmless software was created. In the dataset, there are 116 permission features for each software and a class feature that indicates whether it is malicious or not. Using these features, Support Vector Machine (SVM) and Naïve Bayes (NB) models were trained. The 10-fold cross validation method was used in training and testing processes. Accuracy, F-1 Score, precision, recall and specificity metrics were used to analyze the performances of the models. ROC curve and AUC values ​​were used to analyze the learning and prediction levels of the models. As a result of the tests of the models, 90.9% classification success was obtained from the SVM model and 92.4% from the NB model.


Download data is not yet available.

Author Biography

Murat Koklu, Selcuk University

Depertman of computer engineering


Aytekin, A., et al. Mobil cihazları etkileyen zararlı yazılımlar ve korunma yöntemleri. 2019. International Social Research and Behavioral Sciences Symposium.

Liu, K., et al., A review of android malware detection approaches based on machine learning. IEEE Access, 2020. 8: p. 124579-124607.

Xu, K., Y. Li, and R.H. Deng, Iccdetector: Icc-based malware detection on android. IEEE Transactions on Information Forensics and Security, 2016. 11(6): p. 1252-1264.

Arslan, R.S., İ.A. Doğru, and N. Barışçı, Android Mobil Uygulamalar için İzin Karşılaştırma Tabanlı Kötücül Yazılım Tespiti. Politeknik Dergisi, 2017. 20(1): p. 175-189.

Bıkmaz, Z., Sağlık Yönetimi Bölümü Öğrencilerinin Mobil Güvenlik Farkındalığı ve Dijital Veri Güvenliği Farkındalıklarının Belirlenmesi. Uluslararası Yönetim Bilişim Sistemleri ve Bilgisayar Bilimleri Dergisi, 2017. 1(1): p. 22-30.

Feizollah, A., et al., Androdialysis: Analysis of android intent effectiveness in malware detection. computers & security, 2017. 65: p. 121-134.

Utku, A., I.A. Dogru, and M.A. Akcayol. Permission based android malware detection with multilayer perceptron. in 2018 26th Signal Processing and Communications Applications Conference (SIU). 2018. IEEE.

Odusami, M., et al. Android malware detection: A survey. in International Conference on Applied Informatics. 2018. Springer.

Barbieru, D., Platforma Software Integrata Pentru Analiza Malware a Terminalelor Mobile. Buletinul Universității Naționale de Apărare „Carol I”, 2019. 6(3): p. 37-46.

Sapalo Sicato, J.C., et al., VPNFilter malware analysis on cyber threat in smart home network. Applied Sciences, 2019. 9(13): p. 2763.

Vasan, D., et al., IMCFN: Image-based malware classification using fine-tuned convolutional neural network architecture. Computer Networks, 2020. 171: p. 107138.

Tahtacı, B. and B. Canbay. Android Malware Detection Using Machine Learning. in 2020 Innovations in Intelligent Systems and Applications Conference (ASYU). 2020. IEEE.

Virus Share. 2022 [cited 2022 10 March]; Available from:

Virus Total. 2022 [cited 2022 10 March]; Available from:

Alanazi, A., Using machine learning for healthcare challenges and opportunities. Informatics in Medicine Unlocked, 2022: p. 100924.

Vembandasamy, K., R. Sasipriya, and E. Deepa, Heart diseases detection using Naive Bayes algorithm. International Journal of Innovative Science, Engineering & Technology, 2015. 2(9): p. 441-444.

Shang, F., et al., Android malware detection method based on naive Bayes and permission correlation algorithm. Cluster Computing, 2018. 21(1): p. 955-966.

Ali, W., Hybrid intelligent android malware detection using evolving support vector machine based on genetic algorithm and particle swarm optimization. IJCSNS, 2019. 19(9): p. 15.

Koklu, M. and Y.S. Taspinar, Determining the Extinguishing Status of Fuel Flames With Sound Wave by Machine Learning Methods. IEEE Access, 2021. 9: p. 86207-86216.

Al-Doori, S.K.S., Y.S. Taspinar, and M. Koklu, Distracted Driving Detection with Machine Learning Methods by CNN Based Feature Extraction. International Journal of Applied Mathematics Electronics and Computers, 2021. 9(4): p. 116-121.

Bicakci, M., et al., Metabolic imaging based sub-classification of lung cancer. IEEE Access, 2020. 8: p. 218470-218476.

Refaeilzadeh, P., L. Tang, and H. Liu, Cross-validation. Encyclopedia of database systems, 2009. 5: p. 532-538.

He, J. and X. Fan, Evaluating the performance of the k-fold cross-validation approach for model selection in growth mixture modeling. Structural Equation Modeling: A Multidisciplinary Journal, 2019. 26(1): p. 66-79.

Koklu, M., et al., Classification of Date Fruits into Genetic Varieties Using Image Analysis. Mathematical Problems in Engineering, 2021. 2021.

Singh, D., et al., Classification and Analysis of Pistachio Species with Pre-Trained Deep Learning Models. Electronics, 2022. 11(7): p. 981.

Taspinar, Y.S. and M. Selek, Object Recognition with Hybrid Deep Learning Methods and Testing on Embedded Systems. International Journal of Intelligent Systems and Applications in Engineering, 2020. 8(2): p. 71-77.

Koklu, M., I. Cinar, and Y.S. Taspinar, Classification of rice varieties with deep learning methods. Computers and Electronics in Agriculture, 2021. 187: p. 106285.

Ropelewska, E., et al., Discrimination of onion subjected to drought and normal watering mode based on fluorescence spectroscopic data. Computers and Electronics in Agriculture, 2022. 196: p. 106916.

Koklu, M., I. Cinar, and Y.S. Taspinar, CNN-based bi-directional and directional long-short term memory network for determination of face mask. Biomedical Signal Processing and Control, 2022. 71: p. 103216.

Taspinar, Y.S., I. Cinar, and M. Koklu, Classification by a stacking model using CNN features for COVID-19 infection diagnosis. Journal of X-ray science and technology, 2022(30): p. 1-16.

Ropelewska, E., K. Sabanci, and M.F. Aslan, Preservation effects evaluated using innovative models developed by machine learning on cucumber flesh. European Food Research and Technology, 2022: p. 1-9.




How to Cite

A. B. YILMAZ, Y. S. TASPINAR, and M. Koklu, “Classification of Malicious Android Applications Using Naive Bayes and Support Vector Machine Algorithms”, Int J Intell Syst Appl Eng, vol. 10, no. 2, pp. 269–274, May 2022.



Research Article

Most read articles by the same author(s)