A Scalable Parallel Gene Selection Method Based on Hybrid Bio-Inspired Metaheuristic Algorithms with Shapley Value Analysis

Authors

  • Vijaya Lakshmi Alluri, Karteeka Pavan Kanadam, Hymavathi Thottathyl

Keywords:

Hybrid Genetic Dung beetle Optimization (HGDBO), Synthetic Minority Oversampling Technique (SMOTE), Cooperative based Kernel Shapley Values (CkSV).

Abstract

The development of microarray technology has made a significant contribution to the prediction of various cancer types and their subtypes through gene selection. Effective hybrid approaches are currently inadequate for the challenging problem of predicting highly discriminative genes in microarrays. Thus, this study proposes a novel approach to selecting parallel gene based on a hybrid bio inspired feature selection method. Initially, microarray data is augmented using the Synthetic Minority Oversampling Technique (SMOTE) to enhance dataset sizes. Then, the Cooperative based Kernel Shapley Values (CkSV) approach is employed to extract features and determine Shapley values. The Hybrid Genetic Dung beetle Optimization (HGDBO) approach was employed to identify the most valuable features. Also, the process is executed on the Apache Hadoop Distributed File System for storing large datasets and cost effectiveness. In addition, the features are classified using several machine learning methods such as Support Vector Machine (SVM), Naive Bayes (NB), Random Forest (RF), and K-nearest neighbour. As a result, the proposed approach is compared to other machine learning based classification algorithms. Eleven datasets are used to assess the outcomes of the analysis of the proposed method, which is conducted using the Python tool. The results of the simulations show that the proposed strategy outperforms the existing methods. For SVM classifier, dataset 1 has an accuracy (0.97), dataset 2 (0.98), dataset3 (0.968), dataset 4 (0.975), dataset 5 (0.973), dataset 6 (0.979), dataset 7 (0.985), dataset 8(0.972), dataset 9 (0.973), dataset 10(0.980) and dataset 11(0.979), respectively.

Downloads

Download data is not yet available.

References

E.A. Alhenawi, R. Al-Sayyed, A. Hudaib and S. Mirjalili, “Feature selection methods on gene expression microarray data for cancer classification: A systematic review.” Computers in Biology and Medicine vol. 140, pp. 105051, 2022.

Wang, H. Liu, J. Yang and G. Chen, “Ensemble feature selection for stable biomarker identification and cancer classification from microarray expression data.” Computers in biology and medicine vol. 142, pp. 105208, 2022.

S. Osama, H. Shaban, and A.A. Ali, “Gene reduction and machine learning algorithms for cancer classification based on microarray gene expression data: A comprehensive review.” Expert Systems with Applications vol. 213, pp. 118946, 2023.

H.Z. Almarzouki, “Deep-learning-based cancer profiles classification using gene expression data profile.” Journal of Healthcare Engineering vol. 2022, 2022.

Jahwar and N. Ahmed, “Swarm intelligence algorithms in gene selection profile based on classification of microarray data: a review.” Journal of Applied Science and Technology Trends vol. 2, no. 01, pp. 01-09, 2021.

X. Zheng and C. Zhang, “Gene selection for microarray data classification via dual latent representation learning.” Neurocomputing vol. 461, pp. 266-280, 2021.

H. Almazrua and H. Alshamlan, “A comprehensive survey of recent hybrid feature selection methods in cancer microarray gene expression data.” IEEE Access vol. 10, pp. 71427-71449, 2022.

S. Gupta, M.K. Gupta, M. Shabaz and A. Sharma, “Deep learning techniques for cancer classification using microarray gene expression data.” Frontiers in Physiology vol. 13, pp. 952709, 2022.

H.S. Basavegowda and G. Dagnew, “Deep learning approach for microarray cancer data classification.” CAAI Transactions on Intelligence Technology vol. 5, no. 1, pp. 22-33, 2020.

Dabba, A. Tari, S. Meftali and R. Mokhtari, “Gene selection and classification of microarray data method based on mutual information and moth flame algorithm.” Expert Systems with Applications vol. 166, pp. 114012, 2021.

Haznedar, M.T. Arslan and A. Kalinli, “Optimizing ANFIS using simulated annealing algorithm for classification of microarray gene expression cancer data.” Medical & Biological Engineering & Computing vol. 59, pp. 497-509, 2021.

E.H. Houssein, D.S. Abdelminaam, H.N. Hassan, M.M. Al-Sayed and E. Nabil, “A hybrid barnacles mating optimizer algorithm with support vector machines for gene selection of microarray cancer classification.” IEEE Access vol. 9, pp. 64895-64905, 2021.

M.L.R. AbdElNabi, M.W. Jasim, H.M. El-Bakry, M.H.N. Taha, and N.E.M. Khalifa, “Breast and colon cancer classification from gene expression profiles using data mining techniques.” Symmetry vol. 12, no. 3, pp. 408, 2020.

Y.K. Saheed, “Effective dimensionality reduction model with machine learning classification for microarray gene expression data.” In Data science for genomics, Academic Press, pp. 153-164, 2023.

M. Rostami, K. Berahmand and S. Forouzandeh, “A novel community detection based genetic algorithm for feature selection.” Journal of Big Data vol. 8, no. 1, pp. 2, 2021.

U. Rahardja, A. Sari, A.H. Alsalamy, S. Askar, A.H.R. Alawadi and B. Abdullaeva, “Tribological properties assessment of metallic glasses through a genetic algorithm-optimized machine learning model.” Metals and Materials International vol. 30, no. 3, pp. 745-755, 2024.

M. Rostami, S. Forouzandeh, K. Berahmand, M. Soltani, M. Shahsavari and M. Oussalah, “Gene selection for microarray data classification via multi-objective graph theoretic-based method.” Artificial Intelligence in Medicine vol. 123, pp. 102228, 2022.

Ž. Avsec, V. Agarwal, D. Visentin, J.R. Ledsam, A. Grabska-Barwinska, K.R. Taylor, Y. Assael, J. Jumper, P. Kohli and D.R. Kelley, “Effective gene expression prediction from sequence by integrating long-range interactions.” Nature methods vol. 18, no. 10, pp. 1196-1203, 2021.

S.H. Shah, M.J. Iqbal, I. Ahmad, S. Khan and J.J.P.C. Rodrigues, “Optimized gene selection and classification of cancer from microarray gene expression data using deep learning.” Neural Computing and Applications pp. 1-12, 2020.

S.K. Baliarsingh, S. Vipsita and B. Dash, “A new optimal gene selection approach for cancer classification using enhanced Jaya-based forest optimization algorithm.” Neural Computing and Applications vol. 32, pp. 8599-8616, 2020.

W. Ali and F. Saeed, “Hybrid filter and genetic algorithm-based feature selection for improving cancer classification in high-dimensional microarray data.” Processes vol. 11, no. 2, pp. 562, 2023.

M. Akhavan and S.M.H. Hasheminejad, “A two-phase gene selection method using anomaly detection and genetic algorithm for microarray data.” Knowledge-Based Systems vol. 262, pp. 110249, 2023.

O.A. Alomari, S.N. Makhadmeh, M.A. Al-Betar, Z.A.A. Alyasseri, I.A. Doush, A.K. Abasi, M.A. Awadallah and R.A. Zitar, “Gene selection for microarray data classification based on Gray Wolf Optimizer enhanced with TRIZ-inspired operators.” Knowledge-Based Systems vol. 223, pp. 107034, 2021.

X. Deng, M. Li, S. Deng, and L. Wang, “Hybrid gene selection approach using XGBoost and multi-objective genetic algorithm for cancer classification.” Medical & Biological Engineering & Computing vol. 60, no. 3, pp. 663-681, 2022.

S. Azadifar, M. Rostami, K. Berahmand, P. Moradi, and M. Oussalah, “Graph-based relevancy-redundancy gene selection method for cancer diagnosis.” Computers in Biology and Medicine vol. 147, pp. 105766, 2022.

H. Mansourifar and W. Shi, “Deep synthetic minority over-sampling technique.” arXiv preprint arXiv:2003.09788, 2020.

L. Merrick and A. Taly, “The explanation game: Explaining machine learning models using shapley values.” In Machine Learning and Knowledge Extraction: 4th IFIP TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9 International Cross-Domain Conference, CD-MAKE 2020, Dublin, Ireland, August 25–28, 2020, Proceedings 4, Springer International Publishing, pp. 17-38, 2020.

M.A. Albadr, S. Tiun, M. Ayob and F. Al-Dhief, “Genetic algorithm based on natural selection theory for optimization problems.” Symmetry vol. 12, no. 11 (2020): 1758.

Xue, Jiankai, and Bo Shen. “Dung beetle optimizer: A new meta-heuristic algorithm for global optimization.” The Journal of Supercomputing 79, no. 7, pp. 7305-7336, 2023.

G. Gunawan, H. Hanes and C. Catherine, “C4. 5, K-Nearest Neighbor, Naïve Bayes, and Random Forest Algorithms Comparison to Predict Students’ on TIME Graduation.” Indonesian Journal of Artificial Intelligence and Data Mining vol. 4, no. 2, pp. 62-71, 2021.

https://csse.szu.edu.cn/staff/zhuzx/datasets.html

Downloads

Published

03.07.2024

How to Cite

Vijaya Lakshmi Alluri. (2024). A Scalable Parallel Gene Selection Method Based on Hybrid Bio-Inspired Metaheuristic Algorithms with Shapley Value Analysis. International Journal of Intelligent Systems and Applications in Engineering, 12(4), 1108 –. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/6354

Issue

Section

Research Article