A Survey of Feature Selection Methods for the Analysis of Microarrays Data in Cancer

Authors

  • Shemim Begum Dr. S. Begum is with the govt. College of Engg. and Textile Technology, Berhampore, MSD. She is with the Department of CSE
  • Ebne Saud Khan Ebne Saud Khan, Ph. D Research Scholar, Ram Krishana Dharmarth Foundation University, Ranchi
  • Debasis Chakraborty Dr. Debasis Chakraborty is with the Asansol Engg. college, Asansol, W.B., India. He is with the Department of CSE

Keywords:

cancer identification, feature selection, microarray data, Support vector machines

Abstract

Cancerous gene selection and cancer identification are of great concern to biologists in interpreting the movements of genes in tissues at the molecular level. A Huge number of genes compared to fewer samples in microarray data pose a great difficulty in designing an appropriate machine learning model. To diagnose cancer and classify its types, obtaining significant genes analogous to cancer is crucial. Hence, it is a feature selection (FS) from gene expression data. Microarray datasets are noisy. Hence, significant FS algorithms are essential to select significant genes for classification. This paper depicts a review of FS methods, that have been reported in many journals to make use of microarray data-based cancer diagnosis. We hope this review will guide researchers to upgrade algorithmic developments in cancerous gene identification.

Downloads

Download data is not yet available.

References

U. Maulik, A. Mukhopadhyay, and S. Bandyopadhyay, “Combining Pareto-optimal clusters using supervised learning for identifying co-expressed genes,” BMC Bioinformatics, vol. 10, no. 27, 2009.

U. Maulik, S. Bandyopadhyay, and A. Mukhopadhyay, “Multi-class clustering of cancer subtypes through SVM based ensemble of Pareto-optimal solutions for gene marker identification,” PLoS One, vol. 5, no. 11, p. e13803, 2010.

Wu, “Differential gene expression detection and sample classification using penalized linear regression models,” Bioinformatics, vol. 22, pp. 472-476, 2006.

U. Maulik and A. Mukhopadhyay, “Simulated annealing based automatic fuzzy clustering combined with ANN classification for analyzing microarray data,”’ Computers & Operations Research, vol. 37, no. 8, pp. 1369-1380, 2010.

S. Mallik, A. Mukhopadhyay, and U. Maulik, “Integrated statistical and rule-mining techniques for DNA methylation and gene expression data analysis,” Journal of Artificial Intelligence and Soft Computing Research, 2013.

U. Maulik, A. Mukhopadhyay, and D. Chakraborty, “Gene-expression based cancer subtypes prediction through feature selection and transductive SVM,” IEEE Transactions on Biomedical Engineering, vol. 60, no. 4, pp. 1111–1117, 2013.

U. Maulik and D. Chakraborty, “Fuzzy preference-based feature selection and semi-supervised SVM for cancer classification,” IEEE Transactions on Nano Bioscience, vol. 13, no. 2, pp. 152–160, 2014.

Chakraborty and U. Maulik, “Identifying cancer biomarkers from microarray data using feature selection and semi-supervised learning,” IEEE Journal of Translational Engineering in Health and Medicine, vol 2, 2014.

G. Piatetsky-Shapiro and P. Tamayo, “Microarray data mining: facing the challenges,” SIGKDD Explorations Newsletter, vol. 5, pp. 1–5, December 2003.

M.Rocha, R.Mendas, P.Maria, et al., “A platform for the selection of genes in DNA Microarray data using evolutionary algorithms,” in Proceedings of 8th Annual Conf. on Genetic and Evolutionary Computation, London, England, 2007, pp 415–423.

Q. Shen and C. Shang, “Aiding classification of gene expression data with feature selection: A comparative study,” Journal of Computational Intelligence Research, vol. 1, pp. 68–76, 2006.

J. C. Rajapakse and P. A. Mundra, “Multiclass gene selection using Pareto-fronts,” IEEE Trans. Comput. Biol. Bioinformatics., vol.10, no. 1, pp. 87–97, Jan./Feb. 2013.

Guyon and A. Elisseeff, “An introduction to variable and feature selection,” Journal of Machine Learning Research, vol. 3, pp. 1157-1182, 2003.

H. Liu and H. Motoda, Feature Selection for Knowledge Discovery and Data Mining, Boston: Kluwer Academic Publishers, 1998.

H. Liu and H. Motoda, (Eds.), Computational Methods of Feature Selection, Chapman, and Hall/CRC Press, 2007.

M. R. Sikonja and I. Kononenko, “Theoretical and empirical analysis of ReliefF and RReliefF,” Machine Learning, vol. 53, no. 1-2, pp. 23–69, 2003.

J. Weston, A. Elisseeff, B. Schoelkopf, et al., “Use of the zero norm with linear models and kernel methods,” Journal of Machine Learning Research, vol. 3, pp. 1439–1461, 2003.

J. G. Dy and C. E. Brodley, “Feature selection for unsupervised learning,” Journal of Machine Learning Research, vol. 5, pp. 845–889, 2004.

X. He, D. Cai, and P. Niyogi, “Laplacian score for feature selection,” Advances in Neural Information Processing Systems 18, Cambridge, MA, 2005. MIT Press.

Z. Zhao and H. Liu, “Spectral feature selection for supervised and unsupervised learning,” In Proceedings of the 24th International Conf. on Machine Learning (ICML), pp. 1151–1157, 2007.

Z. Xu, R. Jin, J. Ye, et al., “Discriminative semi-supervised feature selection via manifold regularization,” In IJCAI-09: Proceedings of the 21st International Joint Conf. on Artificial Intelligence, 2009.

M. L. Raymer, W. F. Punch, E. D. Goodman, et al., “Dimensionality reduction using genetic algorithms,” IEEE Transactions on Evolutionary Computation, Vol. 4, No. 2, pp. 164–171, 2000.

Y. Saeys, I. Inza and P. Larranaga, “A review of feature selection techniques in bioinformatics,” Bioinformatics, vol.23, no. 19, pp. 2507– 2517, 2007.

J. Hand, Discrimination and Classification, Chichester, U.K.: Wiley, 1981.

S. M. Weiss and C. A. Kulikowski, Computer Systems that Learn: Classification and Prediction Methods from Statistics, Neural Networks, Machine Learning, and Expert Systems, San Francisco, Calif.: Morgan Kaufmann, 1991.

O. Chapelle, B. Scholkopf, and A. Zien, (Eds.), Semi-Supervised Learning, MIT Press, Cambridge, MA, USA, 2006.

Yarowsky, “Unsupervised word sense disambiguation rivaling supervised methods,” In Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, pp. 189–196, 1995.

Blum and T. M. Mitchell, “Combining labeled and unlabeled data with co-training,” COLT: In Proceedings of the Eleventh Annual Conf. on Computational Learning Theory, pp. 92–100, 1998.

Levin, P. Viola, and Y. Freund, “Unsupervised improvement of visual detectors using co-training,” In Proceedings of the Ninth IEEE International Conf. on Computer Vision, pp. 626–633, 2003.

K. Nigam, A. K. McCallum, S. Thrun, et al., “Text classification from labeled and unlabeled documents using EM,” Machine Learning, vol. 39, no. 2-3, pp. 103-134, 2000.

X. Zhu, Z. Ghahramani and J. Lafferty, “Semi-supervised learning using Gaussian fields and harmonic functions,” In the 20th International Conference on Machine Learning (ICML), 2003.

Abudayur and O.U. Nalbantoglu, “A survey of feature selection strategies for DNA microarray classification”, Computer Engineering and Intelligent Systems, vol. 14, No. 2, 2023.

Tjaden and J. Cohen, “A survey of computational methods used in microarray data interpretation,” Applied Mycology and Biotechnology, Bioinformatics, vol. 6, pp. 7–18, 2006.

H. Yu and S. Xu, “Simple rule-based ensemble classifiers for cancer DNA microarray data classification,” International Conference on Computer Science and Service System (CSSS), pp. 2555–2558, 2011.

U. Alon, N. Barkai, D. A. Notterman, et al., “Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays,” Proc Natl Acad Sci USA, vol.96, pp. 6745–6750, 1999.

Ben-Dor, L. Bruhn, N. Friedman, et al., “Tissue classification with gene expression proles,” J Computational Biol, vol. 7, pp. 559–584,2000.

Y. Lu and J. Han “Cancer classification using gene expression data,” Inform Syst vol. 28, no. 4, pp. 243–268, 2003.

Ooiand P. Tan “Genetic algorithms applied to multi-class prediction for the analysis of gene expression data,” Bioinformatics, vol. 19, pp.37–44, 2003.

R. Fox and M. Dimmic, “A two-sample Bayesian t-test for microarray data,” BMC Bioinformatics, vol. 7, pp. 126, 2006.

Q. Shen, W. M. Shi, and W. Kong, “New gene selection method for multiclass tumor classification by class centroid,” J Biomed Inform vol. vol. 42, pp. 59–65, 2009.

X. Zhou, K. Y. liu and S. T. C.Wong, “Cancer classification and prediction using logistic regression with Bayesian gene selection,” J Biomed Inform, vol. 37, pp. 249–259, 2004.

Boln-Canedo, N. Sanchez-Maroo, A. Alonso-Betanzos, et al., “A review of microarray datasets and applied feature selection methods,” Information Sciences, vol. 282, pp. 111–135, 2014.

S. Dudoit, J. Fridlyand, and T.P. Speed, “Comparison of discrimination methods for the classification of tumors using gene expression data,” J. Am. Statistical Assoc., vol. 97, pp. 77-87, 2002.

J. Lee, J. B. Lee, M. Park, et al. “An extensive comparison of recent classification tools applied to microarray data,” Computational. Stat. and Data Anal., vol. 48, pp. 869885, 2005.

T. R. Golub, D.K. Slonim, P. Tamayo, et al., “Molecular classification of cancer: class discovery and class prediction by gene expression monitoring,” Science, vol. 286, pp. 531–537, 1999.

S. Ramaswamy, P. Tamayo, R. Rifkin, et al., “Multiclass cancer diagnosis using tumor gene expression signatures,” Proceedings of the National Academy of Sciences of the United States of America, vol. 98, no. 26, pp. 15149–15154, 2001.

W. D. Shannon, M. A. Watson, A. Perry, et al., “Mantel statistics to correlate gene expression levels from microarrays with clinical covariates,” Genetic Epidemiology vol. 23, no. 1, pp. 87–96, 2002.

M. Watson, A. Perry, V. Budhjara, et al., “Gene expression profiling with oligonucleotide microarrays distinguish WHO grade of oligodendrogliomas,” Cancer Research, vol. 61, no. 5, pp. 1825–1829, 2001.

W. Shannon, R. Culverhouse, and J. Duncan, “Analyzing microarray data using cluster analysis,” Pharmacogenomics, vol. 4, no. 1, pp. 41–52, 2003.

J. Li and L. Wong, “Identifying good diagnostic genes groups from gene expression profiles using the concept of emerging patterns, Bioinformatics, vol. 18, no. 5, pp. 725–734, 2002.

Dong and J. Li, “Efficient mining of emerging patterns: discovering trends and differences,” Proceedings of the fifth ACM SIGKDD international Conference on Knowledge Discovery and Data Mining, ACM Press, San Diego, CA, pp. 43–52, 1999.

U. Alon, N. Barkai, D. A. Notterman, et al., “Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays,” Proc. Natl Acad. Sci., USA, pp. 6745–6750, 1999.

T. B and I. Jonassen, “New feature subset selection procedures for classification of expression profiles,” Genome Biology, vol. 3, no. 4, pp. research0017.1-research0017.11, 2002.

S. Ma, “Empirical study of supervised gene screening,” BMC Bioinformatics, vol. 7, no. 537, 2006.

Huang, “An integrated method for cancer classification and rule extraction from microarray data,” J Biomed Sci., vol.16, no.1, 2009.

Y. Leung and Y. Hung, “A multi-filter-multi-wrapper approach to gene selection and microarray data classification,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vo1. 7, no .1, pp.108-117,2010.

Y. Lee and C. K. Lee, “Classification of multiple cancer types by multicategory support vector machines using gene expression data,” Bioinformatics, vol. 19, no. 9, 2003.

Huang, T. W. S. Chow, E. W. M. Ma, et al., “Efficient selection of discriminative genes from microarray gene expression data for cancer diagnosis,” IEEE Transactions on Circuits and Systems, vol. 52, no. 9, pp. 1909–1918, 2005.

M. Hall, “A decision tree-based attribute weighting filter for naïve Bayes,” Knowledge-Based Systems, Vol. 20, no. 2, pp. 120–126, 2007.

J. G. Zhang and H. W. Deng, “Gene selection for classification of microarray data based on the Bayes error,” BMC Bioinformatics, vol. 8, no. 370, 2007.

L. Wang, F. Chu, and W. Xie, “Accurate cancer classification using expressions of very few genes,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 4, no. 1, pp. 40–53, 2007.

J. Devore and R. Peck, Statistics: The Exploration and Analysis of Data, third ed. Duxbury Press, 1997.

Y. Fray man and L. Wang, “Data mining using dynamically constructed fuzzy neural networks,” Lecture Notes in Artificial Intelligence, vol. 1394, pp. 122-131, 1998.

T. T. Wong and C. H. Hsu, “Two-stage classification methods for microarray data,” Expert Systems with Applications, vol. 34, no. 1, pp.375-383, 2008.

P. Maji and S. K. Pal, “Fuzzy Rough sets for information measures and selection of relevant genes from microarray data,” IEEE Transactions on systems, man, and cybernetics part b: cybernetics, vol. 40, no. 3, pp. 741–752, 2010.

X. Wang and O. Gotoh, “A Robust Gene selection Method for microarray-based cancer classification,” Cancer Informatics, vol. 9, pp. 15–30, 2010.

Chandra and Manish Gupta, “An efficient statistical feature selection approach for classification of gene expression data, “Journal of Biomedical Informatics, vol. 44, no. 4, pp. 529–535, 2011.

J. Dai and Q Ju, “Attribute selection based on information ratio in fuzzy rough set theory with application to tumor classification,” Applied Soft Computing, vol. 13, pp. 211–221, 2013.

N. D. Cilia, De Stefano, C., F. Fontanella, S. Raimondo, & A. cotto di Freca, A., “An experimental comparison of feature-selection and classification methods for microarray datasets”, Information, vol. 10, no. 3, pp. 109, 2019.

J. Lee, I. Choi, & C. H Jun, “An efficient multivariate feature ranking method for gene selection in high-dimensional microarray data”, Expert Systems with Applications, 113971, 2020.

M. Bittner, P. Meltzer, Y. Chen, et al., “Molecular classification of cutaneous malignant melanoma by gene expression profiling,” Nature, vol. 406, no. 6795, pp. 536–540, 2000.

T. Hastie, R. Tibshirani, M. B. Eisen, et al., “Gene shaving as a method for identifying distinct sets of genes with similar expression patterns,” Biology, vol. 1, no. 2, pp. research0003.10003.21, 2000.

L. Li, C. R. Weinberg, T. A. Darden, et al., “Gene selection for sample classification based on gene expression data: study of sensitivity to choose of parameters of the GA/KNN method,” Bioinformatics, vol. 17, no. 12, pp. 1131-1142, 2001.

K. Y. Yeung and W. L. Ruzzo, “Principal component analysis for clustering gene expression data,” Bioinformatics, vol. 17, no.9, pp. 763- 774, 2001.

K. E. Basford, G. J. McLachlan, and S. I. Rathnayake, “On the classification of microarray gene-expression data,” Briefings in Bioinformatics, vol. 14, no. 4, pp. 402–410, 2012.

R. Blanco, P. Larranaga, I. Inza, et al., “Gene selection for cancer classification using wrapper approaches,” International Journal of Pattern Recognition and Artificial Intelligence, vol. 18, no. 8, pp. 1373–1390, 2004.

U.T. Jirapech and S. Aitken, “Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes,” BMC Bioinformatics, vol. 6, 148, 2005.

Z. Chu, Z. Ghahramani, F. Falciani, et al., “Biomarker discovery in microarray gene expression data with Gaussian processes,” Bioinformatics, vol. 21. no. 16, pp. 3385–3393, 2005.

R. Ruiz, Jos C. Riquelme, and Jess S. Aguilar-Ruizb, “Incremental wrapper-based gene selection from microarray data for cancer classification,” Pattern Recognition, vol. 39, no. 12, pp. 2383–2392, 2006.

Z. Chen, J. Li, and L. Wei, “A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue,” Artificial Intelligence in Medicine, vol. 41, no. 2, pp. 161–175, 2007.

Pashaei, E., & Pashaei, E., “An efficient binary chimp optimization algorithm for feature selection in biomedical data classification”, Neural Computing and Applications, vol. 34, no.8, pp. 6427–6451, 2022.

Balakrishnan, K., Dhanalakshmi, R., & Khaire, U. M., “Improved salp swarm algorithm based on the levy flight for feature selection”, The Journal of Supercomputing, vol. 7, no. 11, pp. 12399–12419,2021.

Guyon, J. Weston, S. Barnhill, et al., “Gene selection for cancer classification using support vector machines,” Machine Learning, vol. 46, no. 1, pp. 389–422, 2002.

Marchiori and M. Sebag, “Bayesian learning with Local support vector machines for cancer classification with gene expression data,” Applications of Evolutionary Computing, Lecture Notes in Computer Science, vol. 3449, pp. 74-83, 2005.

Kai-Bo Duan, and J. C. Rajapakse, “Multiple SVM-RFE for gene selection in cancer classification with expression data, IEEE Transactions on Nano bioscience, vol. 4, no. 3, pp. 228–234, 2005.

X. Zhang, X. Lu, Q. Shi, et al. “Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data,” BMC Bioinformatics, vol. 7, no. 197, 2006.

P. A. Mundra and J. C. Rajapakse, “SVM-RFE with MRMR filter for gene selection,” IEEE Transactions on Nano bioscience, vol. 9, no.1, pp. 31–37, 2010. [144] S. Niijima and S. Kuhara, “Recursive gene selection based on maximum margin criterion: a comparison with

SVM-RFE,” BMC Bioinformatics, vol. 7, no. 543, 2006.

S. Niijiima and S. Kuhara, “Recursive gene selection based on maximum margin criterion: A comparison with SVM-RFE”, BMC Bioinformatics, vol. 7, no. 543, 2006.

X. Zhou and D. P. Tuck, “MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data,” Bioinformatics, vol. 23, no. 9, pp. 1106–1114, 2007.

B. Krishnapuram, L. Carin and A. Hartemink, “Joint classifier and feature optimization for comprehensive cancer diagnosis using gene expression data,” J Comput Biol., vol. 11, no. 23, pp.227–242, 2004.

H. Zhang, J. Y. Ahn, X. D. Lin, et al., “Gene selection using support vector machines with a non-convex penalty,” Bioinformatics, vol.22, no. 1, pp. 88-96, 2006.

M. Shah, M. Marchand, and J. Corbeil, “Feature selection with conjunctions of decision stumps and learning from microarray data,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 1, pp. 174–186, 2012.

S. M. Shafi, M. I. Molla, J. J. Jui, & M. M. Rahman, “Detection of colon cancer based on microarray dataset using machine learning as a feature selection and classification techniques”, SN Applied Sciences, vol. 2, no. 7, pp. 1–8, 2020.

Ding, X., Yang, F., Jin, S., & Cao, J. (2021), “An Efficient Alpha Seeding Method for Optimized Extreme Learning Machine-based Feature Selection Algorithm”, Computers in Biology and Medicine, 104505, 2021.

Peng, F. Long and C. Ding, “Feature Selection on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 1226-1238, 2005.

Ding and H. Peng, “Minimum redundancy feature selection from microarray gene expression data,” J. Bioinformatics and Computational Biol., vol. 3, no. 2, pp. 185-205, 2005.

X. Liu, A. Krishnan, and A. Mondry, “An entropy-based gene selection method for cancer classification using microarray data,” BMC Bioinformatics, vol. 6, no. 76, 2005.

Y. Zhang, C. Ding, and T. Li. “Gene selection algorithm by combining relief and mRMR,” BMC Genomics, 9(Suppl 2): S27, 2008.

Tang, Y. Zhang, Z. Huang, et al., “Recursive fuzzy granulation for gene subsets extraction and cancer classification,” IEEE Transactions on Information Technology in Biomedicine, vol. 2, no. 6, pp. 723–730, 2008.

L. Chen, J. Xuan, R. B. Riggins, et al., “Identifying cancer biomarkers by network constrained support vector machines,” BMC Systems Biology, vol. 5, no. 161, 2011.

Y. Piao, M. Piao, K. Park, et al., “An Ensemble Correlation-Based Gene Selection Algorithm for Cancer Classification with Gene Expression Data,” Bioinformatics, vol. 28, no. 24, pp. 3306–3315, 2012.

J. Liu, G. Cutler, W. Li, et al., “Multiclass cancer classification and biomarker discovery using GA-based algorithms,” Bioinformatics, vol. 21, no. 11, pp. 2691–2697, 2005.

Mukhopadhyay and U. Maulik, “An SVM-wrapped multi-objective evolutionary feature selection approach for identifying cancer-microRNA markers, “IEEE Transactions on Nano Bioscience, vol. 12, no. 4, pp. 275– 281, 2013.

R. Kundu , S.Chattopadhyay a, E. Cuevas b, R. Sarkar, “AltWOA: Altruistic Whale Optimization Algorithm for feature selection on microarray datasets”, Vol. 144, no. 105349, 2022.

Alhenawi, R. AlSayyed, A. Hudaib, S. Mirjalili, “Improved intelligent water drop-based hybrid feature selection method for microarray data processing”, Computational Biology and Chemistry, Vol. 103, no. 107809, 2023.

S, D. A. (2021). CCT Analysis and Effectiveness in e-Business Environment. International Journal of New Practices in Management and Engineering, 10(01), 16–18. https://doi.org/10.17762/ijnpme.v10i01.97

Agrawal, S. A., Umbarkar, A. M., Sherie, N. P., Dharme, A. M., & Dhabliya, D. (2021).Statistical study of mechanical properties for corn fiber with reinforced of polypropylene fiber matrix composite. Materials Today: Proceedings, doi:10.1016/j.matpr.2020.12.1072

Downloads

Published

16.08.2023

How to Cite

Begum, S. ., Khan, E. S. ., & Chakraborty, D. . (2023). A Survey of Feature Selection Methods for the Analysis of Microarrays Data in Cancer. International Journal of Intelligent Systems and Applications in Engineering, 11(10s), 472–482. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/3302

Issue

Section

Research Article