Prediction and Analysis of Crime against Women Data using Machine Learning and Statistical Imputation Techniques


  • Tamilarasi P. Research Scholar, Department of Computer Science, Sri Sarada College for Women (Autonomous), Salem-16, Tamil Nadu. India.
  • Uma Rani R. Principal, Sri Sarada College for Women (Autonomous), Salem-16, Tamil Nadu. India


Statistics, Machine Learning, Missing Values Imputation, Fuzzy C-means, K-means, KNN


Data analytics is a vast research field all around the world. Missing data is a significant difficulty in this discipline, as missing values decrease the algorithm's performance. In research, imputations of missing values are critical; incorrect imputation of absent values leads to erroneous predictions.These types of missing data must be handled efficiently.In this article, statistical-based and Machine Learning based imputation methods are proposed. These values are applied in two different crimes against women data set.There are thirteen types of machine learning algorithms implemented by using these proposed values. Simple linear, multiple linear, KNNr, SVR. Polynomial regression, Decision Tree Regression, and Random forest regressions are used to predict the crime rate against women in India and Salem District in Tamil Nadu. The Novel algorithm KNN_ET has proposed for missing data imputation whose value is compared with other statistical and machine learning-based imputations such as mean, median, mode, K-Means cluster, fuzzy C-Means cluster, K-Medoids and K-nearest neighbor method. The main aim of this work is to reduce errors and improve machine learning accuracy. Finally, the algorithm efficiency is compared based on MSE, MAE, RMSE for Prediction. Proposed algorithm KNN_ET has reduced the maximum number of errors and gives accuracy 94.78 % for Salem District Crime Data-set and 98.7 % accuracy for India level Crime dataset .This is better accuracy compared with all other imputation techniques. In the future, this work will be very helpful to the police department for control the crimes against women in India and Salem.


Download data is not yet available.


Liu, C.-H., Tsai, C.-F., Sue, K.-L., & Huang, M.-W. (2020). The Feature Selection Effect on Missing Value Imputation of Medical Datasets. Applied Sciences, 10(7), 2344. doi:10.3390/app10072344.

Waseem Shahzad, Qamar Rehman, Ejaz Ahmed (2017) Missing Data Imputation using Genetic Algorithm for Supervised Learning, International Journal of Advanced Computer Science and Applications,Vol. 8, No. 3, 2017,PP 438-445.

Emmanuel, T., Maupong, T., Mpoeleng, D., Semong, T., Mphago, B., & Tabona, O. (2021). A survey on missing data in machine learning. Journal of Big Data, 8(1), 1-37.

Dimitris Bertsimas,Colin Pawlowski,Ying Daisy Zhuo(2018)From Predictive Methods to Missing Data Imputation: An Optimization Approach,Journal of Machine Learning Research,pp 1-39.

KohbalanMoorthy, Mohammed HasanAli, MohdArfianIsmail, Chan Weng Howe, MohdSaber Mohamad, SafaaiDeris(2019),An Evaluation of Machine Learning Algorithms for Missing Values Imputation,Internationa Journal of Innovative Technology and Exploring Engineering (IJITEE) ,Volume-8 Issue-12S2,PP 415-420

Wang, H., Tang, J., Wu, M. et al. Application of machine learning missing data imputation techniques in clinical decision making: taking the discharge assessment of patients with spontaneous supratentorial intracerebral hemorrhage as an example. BMC Med Inform Decis Mak 22, 13 (2022). 01752-6.

Richman, M.B., Trafalis, T.B., Adrianto, I. (2009).Missing Data Imputation Through Machine Learning Algorithms.In: Haupt, S.E., Pasini, A., Marzban, C. (eds) Artificial Intelligence Methods in the Environmental Sciences. Springer,Dordrecht.

C.G Marcelino, G. M. C. Leite, P. Celes & C. E. Pedreira(2022)Missing Data Analysis in egression,AppliedArtificial Intelligence, 36:1, D.

DOI: 10.1080/08839514.2022.2032925

Iwueze, E. Nwogu, O. Johnson and J. Ajaraogu, "Uses of the Buys-Ballot Table in Time Series Analysis," Applied Mathematics, Vol. 2 No. 5, 2011, pp. 633-645. doi: 10.4236/am.2011.25084.

Tseng, S., Wang, K., & Lee, C. (2003). A pre-processing method to deal with missing values by integrating clustering and regression techniques. Applied Artificial Intelligence, 17(5-6), 535–544. doi:10.1080/713827170

Gad, I., Hosahalli, D., Manjunatha, B. R., & Ghoneim, O. A. (2020). A robust deep learning model for missing value imputation in big NCDC dataset. Iran Journal of Computer Science. doi:10.1007/s42044-020-00065-z

]Nwosu, Ugochinyere & Obite, Chukwudi. (2020). Methods for Estimating Missing Values in Descriptive Time Series Statistics: Novelty and Efficiency under Buys-Ballot. 0.18488/journal.24.2020.91.72.80.

Amjad Ali* and Qamruz Zaman(2019)A New Method of Imputation for the Missing Value in the IN/OUT Procedure of the Random Forest (RF),Indian Journal o Science and Technolog,Year: 2019, Volume: 12, Issue: 14, Pages: 1-11,DOI: 10.17485/ijst/2019/v12i14/141616.

Kritbodin Phiwhorm1 , Charnnarong Saikaew2 , Carson K. Leung3 , Pattarawit Polpinit1 and Kanda Runapongsa Saikaew1* (2022),Adaptive multiple imputations of missing values using the class center, Journal of Big Data,

Cheng, C.-Y., Tseng, W.-L., Chang, C.-F., Chang, C.-H., & Gau, S.S.-F. (2020). A Deep Learning Approach for Missing Data Imputation of Rating Scales Assessing Attention-Deficit Hyperactivity Disorder. Frontiers in Psychiatry, 11. doi:10.3389/fpsyt.2020.00673

Li, Huaxiong. (2013). Missing Values Imputation Based on Iterative Learning. International Journal of Intelligence Science. 03. 50-55. 10.4236/ijis.2013.31A006.

Cheng, C.-Y., Tseng, W.-L., Chang, C.-F., Chang, C.-H., & Gau,S.S.-F (2020). A Deep Learning Approach for Missing Data Imputation of Rating Scales AssessinAttention-Deficit Hyperactivity Disorder. Frontiers in Psychiatry, 11. doi:10.3389/fpsyt.2020.00673.

Raja, P.S., Thangavel, K. Missing value imputation using unsupervised machine learning techniques. Soft Comput 24, 4361–4392 (2020).

Gopal Krishna M, Durgaprasad N, Deepa Kanmani S, Sravan Reddy G, Revanth Reddy D(2019),Comparative Analysis Of Different Imputation Techniques For Handling Missing Dataset,International Journal of Innovative Technology and Exploring Engineering (IJITEE), Volume-8 Issue-7, May, 2019,PP347-351.

Fouad KM, Ismail MM, Azar AT, Arafa MM. Advanced methods for missing values imputation based on similarity learning. PeerJ Comput Sci. 2021 Jul 21;7:e619. doi: 10.7717/peerj-cs.619. PMID: 34395861; PMCID: PMC8323724.




How to Cite

P., T. ., & Rani R., U. . (2024). Prediction and Analysis of Crime against Women Data using Machine Learning and Statistical Imputation Techniques. International Journal of Intelligent Systems and Applications in Engineering, 12(13s), 483–495. Retrieved from



Research Article