Impact of Data Pre-Processing on Covid-19 Diagnosis Using Machine Learning Algorithms
Keywords:
COVID-19, Deep Learning, Machine Learning, K-Nearest Neighbours, Support Vector MachineAbstract
Human coronaviruses present a significant disease burden. Identifying infected coronavirus patients using artificial intelligence draws researchers’ attention all over the world. Blood test is a striking element that can significantly contribute to provide a reliable, accurate, and quick automated detection tool of covid-19 diagnosis. Medical datasets are known to be associated with different data problems mainly, unbalancing, missing values, and amplitude variations. Performance of classifiers cannot be correctly assessed without handling those problems. For this, the paper at hand proposed multiple solutions that merge several data pre-processing techniques with three dominant classifiers namely Deep Learning (DL), K-Nearest Neighbors (KNN), and Support Vector Machines (SVM). After detailed dataset treatment, all three classifiers achieved good performance according to the gold standard with SVM scoring the highest accuracy and sensitivity of 86% and 95% respectively. This study showed the clinical soundness and feasibility of utilizing blood test analysis and machine learning as a replacement to rRT-PCR for detecting COVID-19-positive cases.
Downloads
References
S. Yang, L. Jiang, Z. Cao, L. Wang, J. Cao, R. Feng, Z. Zhang, X. Xue, Y. Shi, and F. Shan, “Deep learning for detecting coronavirus disease (COVID-19) on high-resolution computed tomography: a pilot study,” Ann Transl. Med., vol. 8(7):450, Apr. 2020.
E. M. Hashim, and M. S. Mabrouk, “Protein-ligand In-silico molecular docking model for discovering potential drugs of covid-19,” Advanced Engineering Trends, vol. 42(1), Jan. 2022.
L. Wynants et al., “Prediction models for diagnosis and prognosis of COVID-19: systematic review and critical appraisal,” BMJ, vol. 369, Mar. 2020.
N. Chen, M. Zhou, X. Dong, J. Qu, F. Gong, Y. Han, Y. Qiu, J. Wang, Y. Liu, Y. Wei, J. Xia, T. Yu, X. Zhang, and L. Zhang, “Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study,” Lancet, vol. 395, pp. 507–513, Feb. 2020.
A. M. Karim, H. Kaya, V. Alcan, and B. Sen, “New optimized deep learning application for COVID-19 detection in chest X-ray images,” Symmetry, vol. 14(1003), May2022.
Y. Haochen, Z. Nan, Z. Ruochi, D. Meiyu, X. Tianqi, P. Jiahui, P. Ejun, H. Juanjuan, Z. Yingli, X. Xiaoming, X. Hong, Z. Fengfeng, and W. Guoqing, “Severity detection for the coronavirus disease 2019 (COVID-19) patients using a machine learning model based on the blood and urine tests,” Frontiers in Cell and Developmental Biology, vol. 8, July 2020.
M. Ahishali, A. Degerli, M. Yamac, S. Kiranyaz, M. E. H. Chowdhury, K. Hameed, T. Hamid, R. Mazhar, and M. Gabbouj, “Warning methodologies for COVID-19 using chest x-ray images,” IEEE Access, vol. 9, pp. 41052–41065, Mar. 2021.
D. Li, D. Wang, J. Dong, N. Wang, H. Huang, H. Xu, and C. Xia, “False-Negative results of real-time reverse-transcriptase polymerase chain reaction for severe acute respiratory syndrome coronavirus 2: role of deep-learning-based CT diagnosis and insights from two Cases,” Korean Journal of Radiology, vol. 21(4), pp. 505–508, Apr. 2020.
P. Chatterjee, M. Biswas, and A. K. Das, “Specialized covid-19 detection techniques with machine learning,” J. Phys.: Conf. Ser, vol. 1797(1), pp. 012–033, Feb. 2021.
L. Deng, “A tutorial survey of architectures, algorithms, and applications for deep learning,” APSIPA Transactions on Signal and Information Processing, vol. 3, Jan. 2014.
M. R. H. Mondal, S. Bharati, P. Podder, and P. Podder, “Data analytics for novel coronavirus disease,” Informatics in Medicine Unlocked, vol. 20, June 2020.
L. Sun, F. Song, N. Shi, et al., “Combination of four clinical indicators predicts the severe/critical symptom of patients infected COVID-19,” Journal of Clinical Virology, vol.128, July 2020.
L. Yan, H. T. Zhang, J. Goncalves, et al., “An interpretable mortality prediction model for COVID-19 patients,” Nat Mach Intell, vol. 2(5), ppt. 283-288, May 2020.
F. Ucar, and D. Korkmaz, “COVIDiagnosis-Net: Deep Bayes-SqueezeNet based diagnosis of the coronavirus disease 2019 (COVID-19) from X-ray images,” Med Hypotheses, vol. 140, July 2020.
K. H. Abdulkareem et al., “Realizing an Effective COVID-19 Diagnosis System Based on Machine Learning and IoT in Smart Hospital Environment,” in IEEE Internet of Things Journal, vol. 8, no. 21, pp. 15919-15928, Nov. 2021.
P. Schwab, A. D. Schütte, B. Dietz, and S. Bauer, “Clinical Predictive Models for COVID-19: Systematic Study,” J. Med. Internet Res, vol. 22(60), Oct. 2020.
D. Brinati, A. Campagner, D. Ferrari, M. Locatelli, G. Banfi, and F. Cabitza, “Detection of COVID-19 Infection from Routine Blood Exams with Machine Learning: A Feasibility Study,” J Med Syst, vol. 44(8):135, July 2020.
S. Aktar, M. M. Ahamad, M. Rashed-Al-Mahfuz, A. Azad, S. Uddin, A. Kamal, et al., “Machine learning approach to predicting covid-19 disease severity based on clinical blood test data: Statistical analysis and model development,” JMIR Medical Informatics, vol. 9 (4), Apr. 2021.
A. Dairi, F. Harrou and Y. Sun, “Deep Generative Learning-Based 1-SVM Detectors for Unsupervised COVID-19 Infection Detection Using Blood Tests,” IEEE Transactions on Instrumentation and Measurement, vol. 71, pp. 1-11, Nov. 2021.
S. Almuhaideb, M. E. B Menai, “Impact of preprocessing on medical data classification,” Front. Comput. Sci., vol.10(6), pp. 1082–1102, Oct. 2016.
Z. Zhang, “Missing values in big data research: some basic skills,” Ann Transl Med., vol. 3(21), Dec. 2015.
D. L. Langkamp, A. Lehman, and S. Lemeshow, “Techniques for handling missing data in secondary analyses of large surveys,” Acad Pediatr., vol. 10(3), pp. 205–210. May-Jun 2010.
A. R. Donders, G. j. Heijden, T. Stijnen, and k. G. Moons, “Review: a gentle introduction to imputation of missing values,” J Clin Epidemiol, vol. 59(10), pp. 1087–1091, Oct. 2006.
T. Emmanuel, T. Maupong, D. Mpoeleng, et al, “A survey on missing data in machine learning,” J Big Data, vol. 8(140), Oct. 2021.
N. S. Altman, “An introduction to kernel and nearest-neighbor nonparametric regression,” The American Statistician, vol. 46, no. 3, pp.175-185, Aug. 1992.
O. Altay, and M. Ulas, “Prediction of the autism spectrum disorder diagnosis with linear discriminant analysis classifier and K-nearest neighbor in children,” ISDFS, pp. 1-4, March 2018.
D. A. Salem, R. A. Abul Seoud, and Y. Kadah, “Conformational B-cell epitopes classification using machine learning techniques,” Journal of Engineering and Applied Science, Jul. 2013.
B. Schoslkopf, A. Smola, “Learning with Kernels, Support Vector Machines,” MIT, Mar. 2002.
J. Brownlee, Deep Learning with Python, 1st Ed., 2016.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.