Innovative Methods for Classifying COVID-19 using Amino Acid Encoding Combined with Recursive Feature Elimination for XGBoost Classification

Authors

  • Anurag Golwalkar Department of Computer Science and Engineering Research Scholar, SAGE University, Indore
  • Abhay Kothari Department of Computer Science and Engineering Research Supervisor, SAGE University, Indore, (M.P.), India.

Keywords:

COVID-19, Amino Acid Sequences, Recursive Feature Elimination, XGBoost, NGDC dataset

Abstract

Introduces a groundbreaking method for identifying COVID-19 by analyzing amino acid sequences. This method employs a two-fold approach: firstly, encoding the amino acids to transform biological data into a format suitable for computational analysis, and secondly, utilizing Recursive Feature Elimination (RFE) to refine the dataset for enhanced classification accuracy with the XGBoost algorithm. The study's core lies in its innovative use of RFE, a technique that iteratively evaluates and discards the least significant features, thereby streamlining the classification process. When combined with the robust, gradient-boosting framework of XGBoost, this approach not only simplifies the complex amino acid sequences but also significantly improves the classification performance. The results are compelling: the method achieved an impressive accuracy of 99.89%, with a sensitivity of 99.87% and specificity of 99.75%. These metrics were obtained using just 7 features, a notable reduction compared to other methods, which not only underscores the efficiency of the approach but also reduces computational time to just 2.43 seconds. This research contributes significantly to the field of bioinformatics and epidemiology by offering a fast, accurate, and efficient method for COVID-19 classification. The approach's simplicity and high accuracy make it a promising tool for rapid screening and early detection of COVID-19, which is crucial in managing and controlling outbreaks.

Downloads

Download data is not yet available.

References

Matsuki, Yoshio, Aleksandr Gozhyj, Irina Kalinina, and Peter Bidyuk. "Method to Find the Original Source of COVID-19 by Genome Sequence and Probability of Electron Capture." In International Scientific Conference “Intellectual Systems of Decision Making and Problem of Computational Intelligence”, pp. 214-230. Cham: Springer International Publishing, 2022.

Setthapramote, Chayanee, Thanwa Wongsuk, Chuphong Thongnak, Uraporn Phumisantiphong, Tonsan Hansirisathit, and Maytawan Thanunchai. "SARS-CoV-2 Variants by Whole-Genome Sequencing in a University Hospital in Bangkok: First to Third COVID-19 Waves." Pathogens 12, no. 4 (2023): 626.

Nagata, Naoyoshi, Tadashi Takeuchi, Hiroaki Masuoka, Ryo Aoki, Masahiro Ishikane, Noriko Iwamoto, Masaya Sugiyama et al. "Human gut microbiota and its metabolites impact immune responses in COVID-19 and its complications." Gastroenterology 164, no. 2 (2023): 272-288.

Zhang, Lizhou, Kunal R. More, Amrita Ojha, Cody B. Jackson, Brian D. Quinlan, Hao Li, Wenhui He, Michael Farzan, Norbert Pardi, and Hyeryun Choe. "Effect of mRNA-LNP components of two globally-marketed COVID-19 vaccines on efficacy and stability." npj Vaccines 8, no. 1 (2023): 156.

Bi, DeWu, XiaoLu Luo, ZhenCheng Chen, ZhouHua Xie, Ning Zang, LiDa Mo, ZeDuan Liu et al. "Genomic epidemiology reveals early transmission of SARS-CoV-2 and mutational dynamics in Nanning, China." Heliyon (2023).

Gama-Almeida, Marcos C., Gabriela DA Pinto, Lívia Teixeira, Eugenio D. Hottz, Paula Ivens, Hygor Ribeiro, Rafael Garrett et al. "Integrated NMR and MS Analysis of the Plasma Metabolome Reveals Major Changes in One-Carbon, Lipid, and Amino Acid Metabolism in Severe and Fatal Cases of COVID-19." Metabolites 13, no. 7 (2023): 879.

Zhang, Xiaoxiao, Ying Zhang, Ling Wen, Jess Lan Ouyang, Weiwei Zhang, Jiaming Zhang, Yuchuan Wang, and Qiuyun Liu. "Neurological Sequelae of COVID-19: A Biochemical Perspective." ACS omega 8, no. 31 (2023): 27812-27818.

Sreejith, S., J. Ajayan, J. M. Radhika, B. Sivasankari, Shubham Tayal, and M. Saravanan. "A comprehensive review on graphene FET bio-sensors and their emerging application in DNA/RNA sensing & rapid Covid-19 detection." Measurement 206 (2023): 112202.

Zhou, Shilin, Panpan Lv, Mingxue Li, Zihui Chen, Hong Xin, Svetlana Reilly, and Xuemei Zhang. "SARS-CoV-2 E protein: Pathogenesis and potential therapeutic development." Biomedicine & Pharmacotherapy (2023): 114242.

Benazraf, Amit, and Isaiah T. Arkin. "Exhaustive mutational analysis of severe acute respiratory syndrome coronavirus 2 ORF3a: An essential component in the pathogen's infectivity cycle." Protein Science 32, no. 1 (2023): e4528.

Bodaghi, Ali, Nadia Fattahi, and Ali Ramazani. "Biomarkers: Promising and valuable tools towards diagnosis, prognosis and treatment of Covid-19 and other diseases." Heliyon (2023).

de Oliveira Andrade, Luis Jesuino, Luisa Correia Matos de Oliveira, Gabriela Correia Matos de Oliveira, Catharina Peixoto Silva, and Luís Matos de Oliveira. "From infection to autoimmunity: ZnT8-mediated molecular mimicry in the triggering of post-COVID 19 type 1 diabetes mellitus." (2023).

Choi, Gihoon, Taylor J. Moehling, and Robert J. Meagher. "Advances in RT-LAMP for COVID-19 testing and diagnosis." Expert Review of Molecular Diagnostics 23, no. 1 (2023): 9-28.

Ana Paula, C., Fernando A. Bozza, Patrícia T. Bozza, and Gilson C. dos Santos. "Integrated NMR and MS Analysis of the Plasma Metabolome Reveals Major Changes in One-Carbon, Lipid, and Amino Acid Metabolism in Severe and Fatal Cases of COVID-19." (2023).

Siebert, Hans-Christian, Thomas Eckert, Anirban Bhunia, Nele Klatte, Marzieh Mohri, Simone Siebert, Anna Kozarova et al. "Blood pH Analysis in Combination with Molecular Medical Tools in Relation to COVID-19 Symptoms." Biomedicines 11, no. 5 (2023): 1421.

Zhang, Jingjing, Fengting Liu, Yaran Suo, Dudu Tong, Jinyu Hu, Hai‐Ning Lyu, Jingjing Liao, Jiaqi Wang, Jigang Wang, and Chengchao Xu. "The “outsized” role of the I‐helix kink in human Cytochrome P450s." Clinical and Translational Medicine 13, no. 9 (2023).

Ferreira, Luís Marcos Cerdeira, Dhésmon Lima, Humberto Marcolino-Junior, Marcio Fernando Bergamini, Sabine Kuss, and Fernando Campanhã Vicentini. "Cutting-edge biorecognition strategies to boost the detection performance of COVID-19 electrochemical Biosensors: A review." Bioelectrochemistry (2023): 108632.

Chen, Ke‐Lin, and Feng‐Sheng Wang. "Cell‐specific genome‐scale metabolic modeling of SARS‐CoV‐2‐infected lung to identify antiviral enzymes." FEBS Open bio (2023).

Wilner, Ofer I., Doron Yesodi, and Yossi Weizmann. "Point-of-care nucleic acid tests: assays and devices." Nanoscale 15, no. 3 (2023): 942-952.

Hossain, Kazi Amirul, Mateusz Kogut, Joanna Słabońska, Subrahmanyam Sappati, Miłosz Wieczór, and Jacek Czub. "How acidic amino acid residues facilitate DNA target site selection." Proceedings of the National Academy of Sciences 120, no. 3 (2023): e2212501120.

Taysi, Seyithan, Firas Shawqi Algburi, Muhammed Enes Taysi, and Cuneyt Caglayan. "Caffeic acid phenethyl ester: A review on its pharmacological importance, and its association with free radicals, COVID‐19, and radiotherapy." Phytotherapy Research 37, no. 3 (2023): 1115-1135.

Nakhaie, Mohsen, Mohammad Rezaei Zadeh Rukerd, Hedyeh Askarpour, and Nasir Arefinia. "Novel mutations in the non-structure protein 2 of SARS-CoV-2." Mediterranean Journal of Hematology and Infectious Diseases 15, no. 1 (2023).

Jerca, Florica Adriana, Cristina Muntean, Katrien Remaut, Valentin Victor Jerca, Koen Raemdonck, and Richard Hoogenboom. "Cationic amino-acid functionalized polymethacrylamide vectors for siRNA transfection based on modification of poly (2-isopropenyl-2-oxazoline)." Journal of Controlled Release 364 (2023): 687-699.

Rtayli, Naoufal, and Nourddine Enneya. "Enhanced credit card fraud detection based on SVM-recursive feature elimination and hyper-parameters optimization." Journal of Information Security and Applications 55 (2020): 102596.

S. Lefkovits, L. Lefkovits, Gabor feature selection based on information gain, Process Eng. 181 (2017) 892–898.

F. Ardelean, Case study using analysis of variance to determine groups' variations, MATEC Web Conferen. 126 (2017), 04008.

E. Benhamou, V. Melot, Seven proofs of the Pearson chi-squared independence test and its graphical interpretation, SSRN (2010), https://doi.org/10.2139/ ssrn.3239829.

Berezhnoy, Georgy, Rosi Bissinger, Anna Liu, Claire Cannet, Hartmut Schäfer, Katharina Kienzle, Michael Bitzer et al. "Maintained imbalance of triglycerides, apolipoproteins, energy metabolites and cytokines in long-term COVID-19 syndrome patients." Frontiers in Immunology 14 (2023): 1144224.

Fopase, Rushikesh, Chinmaya Panda, Amarnath P. Rajendran, Hasan Uludag, and Lalit M. Pandey. "Potential of siRNA in COVID-19 therapy: Emphasis on in silico design and nanoparticles based delivery." Frontiers in Bioengineering and Biotechnology 11 (2023): 1112755.

D. Xiuquan, L. Xinrui, H. Zhang, Y. Zhang, Prediction of protein-protein interaction by metasample-based sparse representation, Math. Probl Eng. (2015) 858256.

J. Philip, R. Keith, I.J. Probert, R. Jonathan, J. Stewart, J. Chris, Density functional theory in the solid-state, Phil. Trans. R. Soc 372 (2014) 20130270.

N. Xiao, D.S. Cao, M.F. Zhu, Q.S. Xu, protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences, Bioinformatics 31 (2015) 1857–1859.

X. Wang, Y. Wu, R. Wang, Y. Wei, Y. Gui, A novel matrix of sequence descriptors for predicting protein-protein interactions from amino acid sequences, PLoS ONE 14 (2019) e0217312.

NCBI coronavirus datasets. Available from: https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/virus?SeqType_s=Nucleotide&VirusLineage_ss=taxid:2697049 (accessed on 24 October 2021)

Alkady, Walaa, Khaled ElBahnasy, Víctor Leiva, and Walaa Gad. "Classifying COVID-19 based on amino acids encoding with machine learning algorithms." Chemometrics and Intelligent Laboratory Systems 224 (2022): 104535.

X. Qiang, P. Xu, G. Fang, W. Liu, Z. Kou, Using the spike protein feature to predict infection risk and monitor the evolutionary dynamic of coronavirus, Infect. Dis. Poverty 9 (2020) 33.

Umbarkar, A.M., Sherie, N.P., Agrawal, S.A., Kharche, P.P., Dhabliya, D. Robust design of optimal location analysis for piezoelectric sensor in a cantilever beam (2021) Materials Today: Proceedings, .

Downloads

Published

12.01.2024

How to Cite

Golwalkar , A. ., & Kothari , A. . (2024). Innovative Methods for Classifying COVID-19 using Amino Acid Encoding Combined with Recursive Feature Elimination for XGBoost Classification. International Journal of Intelligent Systems and Applications in Engineering, 12(12s), 466–486. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/4532

Issue

Section

Research Article