Dataset Normalization in Cricket Score Prediction Using Weighted K-Means Clustering


  • M. Chandru, S. Prasath


Cricket Score Prediction, Feature Selection, Machine Learning, Weighted K-Means Clustering


Cricket, as a highly dynamic and unpredictable sport, presents a unique challenge for accurate score prediction. This study proposes a novel approach to cricket score prediction by integrating machine learning techniques with feature selection through weighted k-means clustering. The goal is to enhance the predictive accuracy by identifying and leveraging the most relevant features from a pool of diverse cricket match attributes. The methodology begins with the collection of comprehensive cricket match data, including player statistics, team performance metrics, and match conditions. These features form the basis for building a predictive model. To address the challenge of feature selection, weighted k-means clustering is employed. This technique assigns weights to features based on their importance, ensuring that the model focuses on the most influential variables. The dataset is preprocessed to handle missing values, normalize data, and address outliers. The preprocessed data is then subjected to weighted k-means clustering, where features are grouped into clusters, and weights are assigned based on the intrinsic significance of each feature within its cluster. This ensures that the model prioritizes features with higher weights during the prediction process. The machine learning model is constructed using an ensemble of algorithms, such as decision trees, random forests, and gradient boosting, to harness the collective power of diverse approaches. The selected features from the weighted k-means clustering are incorporated into the model, enhancing its ability to capture the intricate patterns inherent in cricket matches.


Download data is not yet available.


Anik, A. I., Yeaser, S., Hossain, A. G. M. I., & Chakrabarty, A. (2018). Player’s Performance Prediction in ODI Cricket Using Machine Learning Algorithms. 2018 4th International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT). doi:10.1109/ceeict.2018.8628118

Basit, A., Alvi, M. B., Jaskani, F. H., Alvi, M., Memon, K. H., & Shah, R. A. (2020). ICC T20 Cricket World Cup 2020 Winner Prediction Using Machine Learning Techniques. 2020 IEEE 23rd International Multitopic Conference (INMIC). doi:10.1109/inmic50486.2020.9318077

Emon, S. H., Annur, A. H. ., Xian, A. H., Sultana, K. M., & Shahriar, S. M. (2020). Automatic Video Summarization from Cricket Videos Using Deep Learning. 2020 23rd International Conference on Computer and Information Technology (ICCIT). doi:10.1109/iccit51783.2020.9392707

Faruque, M. A., Rahman, S., Chakraborty, P., Choudhury, T., Um, J.-S., & Singh, T. P. (2021). Ascertaining polarity of public opinions on Bangladesh cricket using machine learning techniques. Spatial Information Research. doi:10.1007/s41324-021-00403-8

Fiaidhi, J., Bhattacharyya, D., & Rao, N. T. (Eds.). (2020). Smart Technologies in Data Science and Communication. Lecture Notes in Networks and Systems. doi:10.1007/978-981-15-2407-3

Hatharasinghe, M. M., & Poravi, G. (2019). Data Mining and Machine Learning in Cricket Match Outcome Prediction: Missing Links. 2019 IEEE 5th International Conference for Convergence in Technology (I2CT). doi:10.1109/i2ct45611.2019.9033698

I.M. Devi and S. Juliet, "Game Statistics Forecast Based on Sports Using Machine Learning," 2023 International Conference on Circuit Power and Computing Technologies (ICCPCT), Kollam, India, 2023, pp. 645-650, doi: 10.1109/ICCPCT58313.2023.10245637.

Iyer, G. N., Vignesh S, B., Sohan, B., R, D., & Raman, V. (2020). Automated Third Umpire Decision Making in Cricket Using Machine Learning Techniques. 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS). doi:10.1109/iciccs48265.2020.9121078

Jadhav, R., Pawar, B., Bhat, N., Kawale, S., & Gawai, A. (2021). Predicting Optimal Cricket Team using Data Analysis. 2021 International Conference on Emerging Smart Computing and Informatics (ESCI). doi:10.1109/esci50559.2021.9396861

Jhansi Rani, P., Vidyadhar Kamath, A., Menon, A., Dhatwalia, P., Rishabh, D., & Kulkarni, A. (2020). Selection of Players and Team for an Indian Premier League Cricket Match Using Ensembles of Classifiers. 2020 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT). doi:10.1109/conecct50063.2020.9198371

M. Sumathi, S. Prabu and M. Rajkamal, "Cricket Players Performance Prediction and Evaluation Using Machine Learning Algorithms," 2023 International Conference on Networking and Communications (ICNWC), Chennai, India, 2023, pp. 1-6, doi: 10.1109/ICNWC57852.2023.10127503.

Modani, N., Kilaru, M., Kaur, A., Sinha, R., & Khetan, H. (2020). Predicting Outcomes in Limited-Overs Cricket Matches. Proceedings of the 7th ACM IKDD CoDS and 25th COMAD. doi:10.1145/3371158.3371166

Rahman, R., Rahman, M. A., Islam, M. S., & Hasan, M. (2021). DeepGrip: Cricket Bowling Delivery Detection with Superior CNN Architectures. 2021 6th International Conference on Inventive Computation Technologies (ICICT). doi:10.1109/icict50816.2021.9358572

Raj, J. S., Iliyasu, A. M., Bestak, R., & Baig, Z. A. (Eds.). (2021). Innovative Data Communication Technologies and Application. Lecture Notes on Data Engineering and Communications Technologies. doi:10.1007/978-981-15-9651-3

Shingrakhia, H., & Patel, H. (2021). SGRNN-AM and HRF-DBN: a hybrid machine learning model for cricket video summarization. The Visual Computer. doi:10.1007/s00371-021-02111-8

Shukla, R. K., Agrawal, J., Sharma, S., Chaudhari, N. S., & Shukla, K. K. (Eds.). (2020). Social Networking and Computational Intelligence. Lecture Notes in Networks and Systems. doi:10.1007/978-981-15-2071-6

Smys, S., Balas, V. E., Kamel, K. A., & Lafata, P. (Eds.). (2021). Inventive Computation and Information Technologies. Lecture Notes in Networks and Systems. doi:10.1007/978-981-33-4305-4

Tyagi, S., Kumari, R., Makkena, S. C., Mishra, S. S., & Pendyala, V. S. (2020). Enhanced Predictive Modeling of Cricket Game Duration Using Multiple Machine Learning Algorithms. 2020 International Conference on Data Science and Engineering (ICDSE). doi:10.1109/icdse50459.2020.9310081

V. V. Tharoor and N. M. Dhanya, "Performance of Indian Cricket Team in Test Cricket: A comprehensive Data Science analysis," 2022 International Conference on Electronic Systems and Intelligent Computing (ICESIC), Chennai, India, 2022, pp. 128-133, doi: 10.1109/ICESIC53714.2022.9783492.

Vetukuri, V. S., Sethi, N., & Rajender, R. (2020). Generic model for automated player selection for cricket teams using recurrent neural networks. Evolutionary Intelligence, 14(2), 971–978. doi:10.1007/s12065-020-00488-4

Ikotun, A.M., Ezugwu, A.E., Abualigah, L., Abuhaija, B. and Heming, J., 2023. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Information Sciences, 622, pp.178-210.




How to Cite

M. Chandru. (2024). Dataset Normalization in Cricket Score Prediction Using Weighted K-Means Clustering. International Journal of Intelligent Systems and Applications in Engineering, 12(21s), 2651–2659. Retrieved from



Research Article