Prediction and Analysis of Crime against Women Data using Machine Learning and Statistical Imputation Techniques
Keywords:
Statistics, Machine Learning, Missing Values Imputation, Fuzzy C-means, K-means, KNNAbstract
Data analytics is a vast research field all around the world. Missing data is a significant difficulty in this discipline, as missing values decrease the algorithm's performance. In research, imputations of missing values are critical; incorrect imputation of absent values leads to erroneous predictions.These types of missing data must be handled efficiently.In this article, statistical-based and Machine Learning based imputation methods are proposed. These values are applied in two different crimes against women data set.There are thirteen types of machine learning algorithms implemented by using these proposed values. Simple linear, multiple linear, KNNr, SVR. Polynomial regression, Decision Tree Regression, and Random forest regressions are used to predict the crime rate against women in India and Salem District in Tamil Nadu. The Novel algorithm KNN_ET has proposed for missing data imputation whose value is compared with other statistical and machine learning-based imputations such as mean, median, mode, K-Means cluster, fuzzy C-Means cluster, K-Medoids and K-nearest neighbor method. The main aim of this work is to reduce errors and improve machine learning accuracy. Finally, the algorithm efficiency is compared based on MSE, MAE, RMSE for Prediction. Proposed algorithm KNN_ET has reduced the maximum number of errors and gives accuracy 94.78 % for Salem District Crime Data-set and 98.7 % accuracy for India level Crime dataset .This is better accuracy compared with all other imputation techniques. In the future, this work will be very helpful to the police department for control the crimes against women in India and Salem.
Downloads
References
Liu, C.-H., Tsai, C.-F., Sue, K.-L., & Huang, M.-W. (2020). The Feature Selection Effect on Missing Value Imputation of Medical Datasets. Applied Sciences, 10(7), 2344. doi:10.3390/app10072344.
Waseem Shahzad, Qamar Rehman, Ejaz Ahmed (2017) Missing Data Imputation using Genetic Algorithm for Supervised Learning, International Journal of Advanced Computer Science and Applications,Vol. 8, No. 3, 2017,PP 438-445.
Emmanuel, T., Maupong, T., Mpoeleng, D., Semong, T., Mphago, B., & Tabona, O. (2021). A survey on missing data in machine learning. Journal of Big Data, 8(1), 1-37. https://doi.org/10.1186/s40537-021-00516-9
Dimitris Bertsimas,Colin Pawlowski,Ying Daisy Zhuo(2018)From Predictive Methods to Missing Data Imputation: An Optimization Approach,Journal of Machine Learning Research,pp 1-39.
KohbalanMoorthy, Mohammed HasanAli, MohdArfianIsmail, Chan Weng Howe, MohdSaber Mohamad, SafaaiDeris(2019),An Evaluation of Machine Learning Algorithms for Missing Values Imputation,Internationa Journal of Innovative Technology and Exploring Engineering (IJITEE) ,Volume-8 Issue-12S2,PP 415-420
Wang, H., Tang, J., Wu, M. et al. Application of machine learning missing data imputation techniques in clinical decision making: taking the discharge assessment of patients with spontaneous supratentorial intracerebral hemorrhage as an example. BMC Med Inform Decis Mak 22, 13 (2022). https://doi.org/10.1186/s12911-022- 01752-6.
Richman, M.B., Trafalis, T.B., Adrianto, I. (2009).Missing Data Imputation Through Machine Learning Algorithms.In: Haupt, S.E., Pasini, A., Marzban, C. (eds) Artificial Intelligence Methods in the Environmental Sciences. Springer,Dordrecht. https://doi.org/10.1007/978-1-4020-9119-3_7.
C.G Marcelino, G. M. C. Leite, P. Celes & C. E. Pedreira(2022)Missing Data Analysis in egression,AppliedArtificial Intelligence, 36:1, D.
DOI: 10.1080/08839514.2022.2032925
Iwueze, E. Nwogu, O. Johnson and J. Ajaraogu, "Uses of the Buys-Ballot Table in Time Series Analysis," Applied Mathematics, Vol. 2 No. 5, 2011, pp. 633-645. doi: 10.4236/am.2011.25084.
Tseng, S., Wang, K., & Lee, C. (2003). A pre-processing method to deal with missing values by integrating clustering and regression techniques. Applied Artificial Intelligence, 17(5-6), 535–544. doi:10.1080/713827170
Gad, I., Hosahalli, D., Manjunatha, B. R., & Ghoneim, O. A. (2020). A robust deep learning model for missing value imputation in big NCDC dataset. Iran Journal of Computer Science. doi:10.1007/s42044-020-00065-z
]Nwosu, Ugochinyere & Obite, Chukwudi. (2020). Methods for Estimating Missing Values in Descriptive Time Series Statistics: Novelty and Efficiency under Buys-Ballot. 0.18488/journal.24.2020.91.72.80.
Amjad Ali* and Qamruz Zaman(2019)A New Method of Imputation for the Missing Value in the IN/OUT Procedure of the Random Forest (RF),Indian Journal o Science and Technolog,Year: 2019, Volume: 12, Issue: 14, Pages: 1-11,DOI: 10.17485/ijst/2019/v12i14/141616.
Kritbodin Phiwhorm1 , Charnnarong Saikaew2 , Carson K. Leung3 , Pattarawit Polpinit1 and Kanda Runapongsa Saikaew1* (2022),Adaptive multiple imputations of missing values using the class center, Journal of Big Data,https://doi.org/10.1186/s40537-022-00608-0.
Cheng, C.-Y., Tseng, W.-L., Chang, C.-F., Chang, C.-H., & Gau, S.S.-F. (2020). A Deep Learning Approach for Missing Data Imputation of Rating Scales Assessing Attention-Deficit Hyperactivity Disorder. Frontiers in Psychiatry, 11. doi:10.3389/fpsyt.2020.00673
Li, Huaxiong. (2013). Missing Values Imputation Based on Iterative Learning. International Journal of Intelligence Science. 03. 50-55. 10.4236/ijis.2013.31A006.
Cheng, C.-Y., Tseng, W.-L., Chang, C.-F., Chang, C.-H., & Gau,S.S.-F (2020). A Deep Learning Approach for Missing Data Imputation of Rating Scales AssessinAttention-Deficit Hyperactivity Disorder. Frontiers in Psychiatry, 11. doi:10.3389/fpsyt.2020.00673.
Raja, P.S., Thangavel, K. Missing value imputation using unsupervised machine learning techniques. Soft Comput 24, 4361–4392 (2020). https://doi.org/10.1007/s00500-019-04199-6.
Gopal Krishna M, Durgaprasad N, Deepa Kanmani S, Sravan Reddy G, Revanth Reddy D(2019),Comparative Analysis Of Different Imputation Techniques For Handling Missing Dataset,International Journal of Innovative Technology and Exploring Engineering (IJITEE), Volume-8 Issue-7, May, 2019,PP347-351.
Fouad KM, Ismail MM, Azar AT, Arafa MM. Advanced methods for missing values imputation based on similarity learning. PeerJ Comput Sci. 2021 Jul 21;7:e619. doi: 10.7717/peerj-cs.619. PMID: 34395861; PMCID: PMC8323724.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.