A Review of Feature Selection Methods in Big Data

Authors

  • Hiba ALMarwi, Ghaleb H. AL-Gaphari

Keywords:

data dimensionality reduction, evolutionary algorithms, feature selection, filter methods, high dimensional data, machine learning.

Abstract

The exponential growth of available data has led to the emergence of high dimensional datasets characterized by hundreds of variables. Consequently, feature selection has emerged as a critical task in data mining and machine learning. Its primary objective is to mitigate the curse of dimensionality by reducing data dimensionality and enhancing algorithm performance, particularly in classification tasks. This paper aims to provide the reader with a comprehensive understanding of feature selection techniques beyond the traditional categorization into filter, wrapper, and embedded approaches.  Instead, we present feature selection as a combinatorial optimization or search problem, leading to a novel categorization of methods into exhaustive search, heuristic search, metaheuristic search, embedded methods, and hybrid approaches.  By examining the strengths and weaknesses of these methods, we shed light on their strengths and weaknesses.  Furthermore, we address current problems and challenges in the field, paving the way for future research and identifying promising areas of investigation.

Downloads

Download data is not yet available.

References

Qasim, O.S. and Z.Y. Algamal, Feature selection using different transfer functions for binary bat algorithm. International Journal of Mathematical, Engineering and Management Sciences, 2020. 5(4): p. 697.

Li, J. and H. Liu, Challenges of feature selection for big data analytics. IEEE Intelligent Systems, 2017. 32(2): p. 9-15.

Saeys, Y., I. Inza, and P. Larranaga, A review of feature selection techniques in bioinformatics. bioinformatics, 2007. 23(19): p. 2507-2517.

Mitra, S., R. Das, and Y. Hayashi, Genetic networks and soft computing. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2009. 8(1): p. 94-107.

Phan, J.H., C.F. Quo, and M.D. Wang, Cardiovascular genomics: a biomarker identification pipeline. IEEE Transactions on Information Technology in Biomedicine, 2012. 16(5): p. 809-822.

Chen, C.C., et al., Methods for identifying SNP interactions: a review on variations of Logic Regression, Random Forest and Bayesian logistic regression. IEEE/ACM transactions on computational biology and bioinformatics, 2011. 8(6): p. 1580-1591.

Kourou, K., et al., Machine learning applications in cancer prognosis and prediction. Computational and structural biotechnology journal, 2015. 13: p. 8-17.

Braga-Neto, U., Fads and fallacies in the name of small-sample microarray classification-a highlight of misunderstanding and erroneous usage in the applications of genomic signal processing. IEEE Signal Processing Magazine, 2007. 24(1): p. 91-99.

Liang, M.P., et al., Computational functional genomics. IEEE Signal Processing Magazine, 2004. 21(6): p. 62-69.

Langley, P. and S. Sage, Induction of selective Bayesian classifiers, in Uncertainty Proceedings 1994. 1994, Elsevier. p. 399-406.

Langley, P., Elements of machine learning. 1996: Morgan Kaufmann.

Fu, X. and L. Wang, Data dimensionality reduction with application to simplifying RBF network structure and improving classification performance. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2003. 33(3): p. 399-409.

Ben-Dor, A., et al. Tissue classification with gene expression profiles. in Proceedings of the fourth annual international conference on Computational molecular biology. 2000.

Troyanskaya, O.G., et al., Nonparametric methods for identifying differentially expressed genes in microarray data. Bioinformatics, 2002. 18(11): p. 1454-1461.

Liu, H. and L. Yu, Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on knowledge and data engineering, 2005. 17(4): p. 491-502.

Guyon, I. and A. Elisseeff, An introduction to variable and feature selection. Journal of machine learning research, 2003. 3(Mar): p. 1157-1182.

Qasim, O.S., N.A. Al-Thanoon, and Z.Y. Algamal, Feature selection based on chaotic binary black hole algorithm for data classification. Chemometrics and Intelligent Laboratory Systems, 2020. 204: p. 104104.

Wang, L. and X. Fu, Data mining with computational intelligence. 2005: Springer Science & Business Media.

Shukla, A.K., P. Singh, and M. Vardhan, A hybrid framework for optimal feature subset selection. Journal of Intelligent & Fuzzy Systems, 2019. 36(3): p. 2247-2259.

Paul, S. and S. Das, Simultaneous feature selection and weighting–an evolutionary multi-objective optimization approach. Pattern Recognition Letters, 2015. 65: p. 51-59.

Daelemans, W., et al. Combined optimization of feature selection and algorithm parameters in machine learning of language. in Machine Learning: ECML 2003: 14th European Conference on Machine Learning, Cavtat-Dubrovnik, Croatia, September 22-26, 2003. Proceedings 14. 2003. Springer.

Raman, B. and T.R. Ioerger, Enhancing learning using feature and example selection. Journal of Machine Learning Research (submitted for publication), 2003.

Halgamuge, S.K. and L. Wang, Classification and clustering for knowledge discovery. Vol. 4. 2005: Springer Science & Business Media.

Wang, L., Y. Wang, and Q. Chang, Feature selection methods for big data bioinformatics: A survey from the search perspective. Methods, 2016. 111: p. 21-31.

Haar, L., et al. Comparison between Supervised and Unsupervised Feature Selection Methods. in ICPRAM. 2019.

Guyon, I. and A. Elisseeff, An introduction to feature extraction, in Feature extraction: foundations and applications. 2006, Springer. p. 1-25.

Law, M.H., M.A. Figueiredo, and A.K. Jain, Simultaneous feature selection and clustering using mixture models. IEEE transactions on pattern analysis and machine intelligence, 2004. 26(9): p. 1154-1166.

Marczyk, M., et al., Adaptive filtering of microarray gene expression data based on Gaussian mixture decomposition. BMC bioinformatics, 2013. 14: p. 1-12.

Ding, C. and H. Peng, Minimum redundancy feature selection from microarray gene expression data. Journal of bioinformatics and computational biology, 2005. 3(02): p. 185-205.

Reinelt, W., A. Garulli, and L. Ljung, Comparing different approaches to model error modeling in robust identification. Automatica, 2002. 38(5): p. 787-803.

Wang, D., F. Nie, and H. Huang, Feature selection via global redundancy minimization. IEEE transactions on Knowledge and data engineering, 2015. 27(10): p. 2743-2755.

Ultsch, A., et al. A comparison of algorithms to find differentially expressed genes in microarray data. in Advances in Data Analysis, Data Handling and Business Intelligence: Proceedings of the 32nd Annual Conference of the Gesellschaft für Klassifikation eV, Joint Conference with the British Classification Society (BCS) and the Dutch/Flemish Classification Society (VOC), Helmut-Schmidt-University, Hamburg, July 16-18, 2008. 2010. Springer.

Duch, W., et al. Comparison of feature ranking methods based on information entropy. in 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No. 04CH37541). 2004. IEEE.

Lazar, C., et al., A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM transactions on computational biology and bioinformatics, 2012. 9(4): p. 1106-1119.

Liu, Y., A comparative study on feature selection methods for drug discovery. Journal of chemical information and computer sciences, 2004. 44(5): p. 1823-1828.

Peng, H., F. Long, and C. Ding, Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on pattern analysis and machine intelligence, 2005. 27(8): p. 1226-1238.

Bolón-Canedo, V., et al., A review of microarray datasets and applied feature selection methods. Information sciences, 2014. 282: p. 111-135.

Inza, I., et al., Filter versus wrapper gene selection approaches in DNA microarray domains. Artificial intelligence in medicine, 2004. 31(2): p. 91-103.

Meyer, P.E., C. Schretter, and G. Bontempi, Information-theoretic feature selection in microarray data using variable complementarity. IEEE Journal of Selected Topics in Signal Processing, 2008. 2(3): p. 261-274.

Brown, G., et al., Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. The journal of machine learning research, 2012. 13: p. 27-66.

Bommert, A., et al., Benchmark for filter methods for feature selection in high-dimensional classification data. Computational Statistics & Data Analysis, 2020. 143: p. 106839.

Bolón-Canedo, V., N. Sánchez-Maroño, and A. Alonso-Betanzos. Feature Selection for High-Dimensional Data. in Artificial Intelligence: Foundations, Theory, and Algorithms. 2015.

Lê Cao, K.-A., A. Bonnet, and S. Gadat, Multiclass classification and gene selection with a stochastic algorithm. Computational Statistics & Data Analysis, 2009. 53(10): p. 3601-3615.

Steuer, R., et al., The mutual information: detecting and evaluating dependencies between variables. Bioinformatics, 2002. 18(suppl_2): p. S231-S240.

Liu, X., A. Krishnan, and A. Mondry, An entropy-based gene selection method for cancer classification using microarray data. BMC bioinformatics, 2005. 6(1): p. 1-14.

Apiletti, D., et al. The painter's feature selection for gene expression data. in 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. 2007. IEEE.

Chuang, L.-Y., et al., A two-stage feature selection method for gene expression data. OMICS A journal of Integrative Biology, 2009. 13(2): p. 127-137.

Yan, X., et al., Detecting differentially expressed genes by relative entropy. Journal of theoretical biology, 2005. 234(3): p. 395-402.

Van't Veer, L.J., et al., Gene expression profiling predicts clinical outcome of breast cancer. nature, 2002. 415(6871): p. 530-536.

Fix, E. and J.L. Hodges, Discriminatory analysis. Nonparametric discrimination: Consistency properties. International Statistical Review/Revue Internationale de Statistique, 1989. 57(3): p. 238-247.

Wu, D. and S. Guo, An improved Fisher Score feature selection method and its application. J. Liaoning Tech. Univ.(Nat. Sci.), 2019. 38: p. 472-479.

Guyon, I., et al., Gene selection for cancer classification using support vector machines. Machine learning, 2002. 46: p. 389-422.

Chen, Y.-W. and C.-J. Lin, Combining SVMs with various feature selection strategies. Feature extraction: foundations and applications, 2006: p. 315-324.

Hancer, E., B. Xue, and M. Zhang, Differential evolution for filter feature selection based on information theory and feature ranking. Knowledge-Based Systems, 2018. 140: p. 103-119.

Thomas, J.A., Elements of information theory. 1991.

Fleuret, F., Fast binary feature selection with conditional mutual information. Journal of Machine learning research, 2004. 5(9).

McGill, W., Multivariate information transmission. Transactions of the IRE Professional Group on Information Theory, 1954. 4(4): p. 93-111.

Kira, K. and L.A. Rendell. The feature selection problem: Traditional methods and a new algorithm. in Proceedings of the tenth national conference on Artificial intelligence. 1992.

Acuna, E., F. Coaquira, and M. Gonzalez. A comparison of feature selection procedures for classifiers based on kernel density estimation. in Proc. of the int. conf. on computer, communication and control technologies, CCCT. 2003.

Yang, P., et al., Gene-gene interaction filtering with ensemble of filters. BMC bioinformatics, 2011. 12(1): p. 1-10.

Spolaôr, N., et al. ReliefF for multi-label feature selection. in 2013 Brazilian Conference on Intelligent Systems. 2013. IEEE.

Huang, X., et al., Feature clustering based support vector machine recursive feature elimination for gene selection. Applied Intelligence, 2018. 48: p. 594-607.

Yan, K. and D. Zhang, Feature selection and analysis on correlated gas sensor data with recursive feature elimination. Sensors and Actuators B: Chemical, 2015. 212: p. 353-363.

Jain, A. and D. Zongker, Feature selection: Evaluation, application, and small sample performance. IEEE transactions on pattern analysis and machine intelligence, 1997. 19(2): p. 153-158.

Somol, P., P. Pudil, and J. Kittler, Fast branch & bound algorithms for optimal feature selection. IEEE Transactions on pattern analysis and machine intelligence, 2004. 26(7): p. 900-912.

Liu, L., et al. A comparative study on unsupervised feature selection methods for text clustering. in 2005 International Conference on Natural Language Processing and Knowledge Engineering. 2005. IEEE.

Sharma, M. and P. Kaur, A comprehensive analysis of nature-inspired meta-heuristic techniques for feature selection problem. Archives of Computational Methods in Engineering, 2021. 28: p. 1103-1127.

Ruiz, R., J.C. Riquelme, and J.S. Aguilar-Ruiz, Incremental wrapper-based gene selection from microarray data for cancer classification. Pattern Recognition, 2006. 39(12): p. 2383-2392.

Wang, A., et al., Wrapper-based gene selection with Markov blanket. Computers in biology and medicine, 2017. 81: p. 11-23.

Kohavi, R. and G.H. John, Wrappers for feature subset selection. Artificial intelligence, 1997. 97(1-2): p. 273-324.

Breiman, L., Random forests. Machine learning, 2001. 45: p. 5-32.

Cortes, C. and V. Vapnik, Support-vector networks. Machine learning, 1995. 20: p. 273-297.

Wah, Y.B., et al., Feature selection methods: Case of filter and wrapper approaches for maximising classification accuracy. Pertanika Journal of Science & Technology, 2018. 26(1).

Xue, B., M. Zhang, and W.N. Browne, A comprehensive comparison on evolutionary feature selection approaches to classification. International Journal of Computational Intelligence and Applications, 2015. 14(02): p. 1550008.

Wu, G., R. Mallipeddi, and P.N. Suganthan, Ensemble strategies for population-based optimization algorithms–A survey. Swarm and evolutionary computation, 2019. 44: p. 695-711.

Shukla, A.K., Identification of cancerous gene groups from microarray data by employing adaptive genetic and support vector machine technique. Computational Intelligence, 2020. 36(1): p. 102-131.

Bins, J. and B.A. Draper. Feature selection from huge feature sets. in Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001. 2001. IEEE.

Liu, H. and H. Motoda, Feature selection for knowledge discovery and data mining. Vol. 454. 2012: Springer Science & Business Media.

Blum, A.L. and P. Langley, Selection of relevant features and examples in machine learning. Artificial intelligence, 1997. 97(1-2): p. 245-271.

Wang, L., F. Chu, and W. Xie, Accurate cancer classification using expressions of very few genes. IEEE/ACM Transactions on computational biology and bioinformatics, 2007. 4(1): p. 40-53.

Wang, L., S. Arunkumaar, and W. Gu. Genetic algorithms for optimal channel assignment in mobile communications. in Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP'02. 2002. IEEE.

Mnich, K. and W.R. Rudnicki, All-relevant feature selection using multidimensional filters with exhaustive search. Information Sciences, 2020. 524: p. 277-297.

Selman, B. and C.P. Gomes, Hill-climbing search. Encyclopedia of cognitive science, 2006. 81: p. 82.

Selman, B. and H. Levesque. Mitchell: A new method for solving hard satisfiability problems. in Proc. of the Tenth National Conference on Artificial Intelligence.

Hatamikia, S., K. Maghooli, and A.M. Nasrabadi, The emotion recognition system based on autoregressive model and sequential forward feature selection of electroencephalogram signals. Journal of medical signals and sensors, 2014. 4(3): p. 194.

Whitney, A.W., A direct method of nonparametric measurement selection. IEEE transactions on computers, 1971. 100(9): p. 1100-1103.

Xue, Y., L. Zhang, and B. Wang. Dissimilarity-based sequential backward feature selection algorithm for fault diagnosis. in Neural Information Processing: 24th International Conference, ICONIP 2017, Guangzhou, China, November 14–18, 2017, Proceedings, Part IV 24. 2017. Springer.

Marill, T. and D. Green, On the effectiveness of receptors in recognition systems. IEEE transactions on Information Theory, 1963. 9(1): p. 11-17.

Pudil, P., J. Novovičová, and J. Kittler, Floating search methods in feature selection. Pattern recognition letters, 1994. 15(11): p. 1119-1125.

Min, F., Q. Hu, and W. Zhu, Feature selection with test cost constraint. International Journal of Approximate Reasoning, 2014. 55(1): p. 167-179.

Diao, R. and Q. Shen, Nature inspired feature selection meta-heuristics. Artificial Intelligence Review, 2015. 44: p. 311-340.

Mirjalili, S. and A. Lewis, The whale optimization algorithm. Advances in engineering software, 2016. 95: p. 51-67.

Kirkpatrick, S., C.D. Gelatt Jr, and M.P. Vecchi, Optimization by simulated annealing. science, 1983. 220(4598): p. 671-680.

Wang, L., S.N.S. Lee, and W.Y. Hing. Solving channel assignment problems using local search methods and simulated annealing. in Independent Component Analyses, Wavelets, Neural Networks, Biosystems, and Nanoengineering IX. 2011. SPIE.

Macal, C.M., Agent-based modeling and artificial life. Complex Social and Behavioral Systems: Game Theory and Agent-Based Models, 2020: p. 725-745.

Zhu, M. and L. Wang. Intelligent trading using support vector regression and multilayer perceptrons optimized with genetic algorithms. in The 2010 international joint conference on neural networks (IJCNN). 2010. IEEE.

Wang, L., L. Zhou, and W. Liu. FPGA segmented channel routing using genetic algorithms. in 2005 IEEE Congress on Evolutionary Computation. 2005. IEEE.

Li, B., L. Wang, and W. Song. Ant colony optimization for the traveling salesman problem based on ants with memory. in 2008 fourth international conference on natural computation. 2008. IEEE.

Wang, L. and G. Si. Optimal location management in mobile computing with hybrid genetic algorithm and particle swarm optimization (GA-PSO). in 2010 17th IEEE International Conference on Electronics, Circuits and Systems. 2010. IEEE.

Alsalibi, B., I. Venkat, and M.A. Al-Betar, A membrane-inspired bat algorithm to recognize faces in unconstrained scenarios. Engineering Applications of Artificial Intelligence, 2017. 64: p. 242-260.

Yang, B., et al., Feature selection based on modified bat algorithm. IEICE TRANSACTIONS on Information and Systems, 2017. 100(8): p. 1860-1869.

Zheng, Y., et al., A novel hybrid algorithm for feature selection. Personal and Ubiquitous Computing, 2018. 22: p. 971-985.

Aziz, M.A.E. and A.E. Hassanien, Modified cuckoo search algorithm with rough sets for feature selection. Neural Computing and Applications, 2018. 29: p. 925-934.

Gunavathi, C. and K. Premalatha, Cuckoo search optimisation for feature selection in cancer classification: a new approach. International journal of data mining and bioinformatics, 2015. 13(3): p. 248-265.

Tharwat, A., T. Gabel, and A.E. Hassanien. Classification of toxicity effects of biotransformed hepatic drugs using optimized support vector machine. in Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2017. 2018. Springer.

Xue, B., et al., A survey on evolutionary computation approaches to feature selection. IEEE Transactions on evolutionary computation, 2015. 20(4): p. 606-626.

Brezočnik, L., I. Fister Jr, and V. Podgorelec, Swarm intelligence algorithms for feature selection: a review. Applied Sciences, 2018. 8(9): p. 1521.

Abd Elaziz, M., et al., Opposition-based moth-flame optimization improved by differential evolution for feature selection. Mathematics and Computers in Simulation, 2020. 168: p. 48-75.

Nemati, S., et al., A novel ACO–GA hybrid algorithm for feature selection in protein function prediction. Expert systems with applications, 2009. 36(10): p. 12086-12094.

Shukla, A.K., P. Singh, and M. Vardhan, Neighbour teaching learning based optimization for global optimization problems. Journal of Intelligent & Fuzzy Systems, 2018. 34(3): p. 1583-1594.

Shukla, A.K., P. Singh, and M. Vardhan, Gene selection for cancer types classification using novel hybrid metaheuristics approach. Swarm and Evolutionary Computation, 2020. 54: p. 100661.

Han, X., et al., Feature subset selection by gravitational search algorithm optimization. Information Sciences, 2014. 281: p. 128-146.

Banitalebi, A., M.I. Abd Aziz, and Z.A. Aziz, A self-adaptive binary differential evolution algorithm for large scale binary optimization problems. Information Sciences, 2016. 367: p. 487-511.

Song, X.-f., et al., Feature selection using bare-bones particle swarm optimization with mutual information. Pattern Recognition, 2021. 112: p. 107804.

Gumus, E., Z. Gormez, and O. Kursun, Multi objective SNP selection using pareto optimality. Computational biology and chemistry, 2013. 43: p. 23-28.

Amoozegar, M. and B. Minaei-Bidgoli, Optimizing multi-objective PSO based feature selection method using a feature elitism mechanism. Expert Systems with Applications, 2018. 113: p. 499-514.

Yong, Z., G. Dun-wei, and Z. Wan-qiu, Feature selection of unreliable data using an improved multi-objective PSO algorithm. Neurocomputing, 2016. 171: p. 1281-1290.

Dashtban, M., M. Balafar, and P. Suravajhala, Gene selection for tumor classification using a novel bio-inspired multi-objective approach. Genomics, 2018. 110(1): p. 10-17.

Sahoo, A. and S. Chandra, Multi-objective grey wolf optimizer for improved cervix lesion classification. Applied Soft Computing, 2017. 52: p. 64-80.

Zhang, Y., et al., A PSO-based multi-objective multi-label feature selection method in classification. Scientific reports, 2017. 7(1): p. 376.

Mukhopadhyay, A. and U. Maulik, An SVM-wrapped multiobjective evolutionary feature selection approach for identifying cancer-microRNA markers. IEEE transactions on nanobioscience, 2013. 12(4): p. 275-281.

Xue, B., et al., Multi-objective evolutionary algorithms for filter based feature selection in classification. International Journal on Artificial Intelligence Tools, 2013. 22(04): p. 1350024.

Zhong, J., et al., A feature selection method for prediction essential protein. Tsinghua Science and Technology, 2015. 20(5): p. 491-499.

Tang, Y., Y.-Q. Zhang, and Z. Huang, Development of two-stage SVM-RFE gene selection strategy for microarray expression data analysis. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2007. 4(3): p. 365-381.

Pang, H., et al., Gene selection using iterative feature elimination random forests for survival outcomes. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2012. 9(5): p. 1422-1431.

Bolón-Canedo, V., N. Sánchez-Maroño, and A. Alonso-Betanzos, A review of feature selection methods on synthetic data. Knowledge and information systems, 2013. 34: p. 483-519.

Kennedy, J. and R.C. Eberhart. A discrete binary version of the particle swarm algorithm. in 1997 IEEE International conference on systems, man, and cybernetics. Computational cybernetics and simulation. 1997. IEEE.

Abdelaziz, A.Y. and A. Fathy, A novel approach based on crow search algorithm for optimal selection of conductor size in radial distribution networks. Engineering Science and Technology, an International Journal, 2017. 20(2): p. 391-402.

Zorarpacı, E. and S.A. Özel, A hybrid approach of differential evolution and artificial bee colony for feature selection. Expert Systems with Applications, 2016. 62: p. 91-103.

Raweh, A.A., M. Nassef, and A. Badr, A hybridized feature selection and extraction approach for enhancing cancer prediction based on DNA methylation. IEEE Access, 2018. 6: p. 15212-15223.

Fong, S., et al., Feature selection in life science classification: metaheuristic swarm search. IT Professional, 2014. 16(4): p. 24-29.

Wang, J.-S. and J.-D. Song, A Hybrid Algorithm Based on Gravitational Search and Particle Swarm Optimization Algorithm to Solve Function Optimization Problems. Engineering Letters, 2017. 25(1).

Mirjalili, S., G.-G. Wang, and L.d.S. Coelho, Binary optimization using hybrid particle swarm optimization and gravitational search algorithm. Neural Computing and Applications, 2014. 25: p. 1423-1435.

Mirjalili, S. and S.Z.M. Hashim. A new hybrid PSOGSA algorithm for function optimization. in 2010 international conference on computer and information application. 2010. IEEE.

Talbi, E.-G., et al. Comparison of population based metaheuristics for feature selection: Application to microarray data classification. in 2008 IEEE/ACS International Conference on Computer Systems and Applications. 2008. IEEE.

Gheyas, I.A. and L.S. Smith, Feature subset selection in large dimensionality domains. Pattern recognition, 2010. 43(1): p. 5-13.

Mafarja, M. and S. Abdullah, Investigating memetic algorithm in solving rough set attribute reduction. International Journal of Computer Applications in Technology, 2013. 48(3): p. 195-202.

Zhu, W., et al., Neighborhood effective information ratio for hybrid feature subset evaluation and selection. Neurocomputing, 2013. 99: p. 25-37.

El Akadi, A., et al., A two-stage gene selection scheme utilizing MRMR filter and GA wrapper. Knowledge and Information Systems, 2011. 26: p. 487-500.

Li, X. and M. Yin, Multiobjective binary biogeography based optimization for feature selection using gene expression data. IEEE Transactions on NanoBioscience, 2013. 12(4): p. 343-353.

El-Abd, M. A cooperative approach to the artificial bee colony algorithm. in IEEE congress on evolutionary computation. 2010. IEEE.

Akbari, R., et al., A multi-objective artificial bee colony algorithm. Swarm and Evolutionary Computation, 2012. 2: p. 39-52.

Kuncheva, L.I., Combining pattern classifiers: methods and algorithms. 2014: John Wiley & Sons.

Kuncheva, L.I. and C.J. Whitaker, Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning, 2003. 51: p. 181-207.

Abeel, T., et al., Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics, 2010. 26(3): p. 392-398.

Li, J., et al., Feature selection: A data perspective. ACM computing surveys (CSUR), 2017. 50(6): p. 1-45.

Abasabadi, S., et al., Automatic ensemble feature selection using fast non-dominated sorting. Information Systems, 2021. 100: p. 101760.

Santana, L.E.A.d.S. and A.M. de Paula Canuto, Filter-based optimization techniques for selection of feature subsets in ensemble systems. Expert Systems with Applications, 2014. 41(4): p. 1622-1631.

Chaudhuri, A. and T. Sahu, PROMETHEE‐based hybrid feature selection technique for high‐dimensional biomedical data: application to Parkinson's disease classification. Electronics Letters, 2020. 56(25): p. 1403-1406.

Liu, X.-Y., et al., A hybrid genetic algorithm with wrapper-embedded approaches for feature selection. IEEE Access, 2018. 6: p. 22863-22874.

Oh, I.-S., J.-S. Lee, and B.-R. Moon, Hybrid genetic algorithms for feature selection. IEEE Transactions on pattern analysis and machine intelligence, 2004. 26(11): p. 1424-1437.

Moradi, P. and M. Gholampour, A hybrid particle swarm optimization for feature subset selection by integrating a novel local search strategy. Applied Soft Computing, 2016. 43: p. 117-130.

Xue, B., M. Zhang, and W.N. Browne, Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms. Applied soft computing, 2014. 18: p. 261-276.

Chuang, L.-Y., S.-W. Tsai, and C.-H. Yang, Improved binary particle swarm optimization using catfish effect for feature selection. Expert Systems with Applications, 2011. 38(10): p. 12699-12707.

Al-Tashi, Q., et al., Binary optimization using hybrid grey wolf optimization for feature selection. Ieee Access, 2019. 7: p. 39496-39508.

Shunmugapriya, P. and S. Kanmani, A hybrid algorithm using ant and bee colony optimization for feature selection and classification (AC-ABC Hybrid). Swarm and evolutionary computation, 2017. 36: p. 27-36.

Peng, H., et al., An improved feature selection algorithm based on ant colony optimization. Ieee Access, 2018. 6: p. 69203-69209.

Ghaemi, M. and M.-R. Feizi-Derakhshi, Feature selection using forest optimization algorithm. Pattern Recognition, 2016. 60: p. 121-129.

Zheng, Y., et al., A novel hybrid algorithm for feature selection based on whale optimization algorithm. Ieee Access, 2018. 7: p. 14908-14923.

Mafarja, M.M. and S. Mirjalili, Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing, 2017. 260: p. 302-312.

Zhu, Z., Y.-S. Ong, and M. Dash, Markov blanket-embedded genetic algorithm for gene selection. Pattern Recognition, 2007. 40(11): p. 3236-3248.

Mundra, P.A. and J.C. Rajapakse, SVM-RFE with MRMR filter for gene selection. IEEE transactions on nanobioscience, 2009. 9(1): p. 31-37.

Saengsiri, P., et al. Comparison of hybrid feature selection models on gene expression data. in 2010 Eighth International Conference on ICT and Knowledge Engineering. 2010. IEEE.

Shreem, S.S., S. Abdullah, and M.Z.A. Nazri, Hybridising harmony search with a Markov blanket for gene selection problems. Information Sciences, 2014. 258: p. 108-121.

Jain, I., V.K. Jain, and R. Jain, Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification. Applied Soft Computing, 2018. 62: p. 203-215.

Oliveira, L.S., et al. Feature selection using multi-objective genetic algorithms for handwritten digit recognition. in 2002 International Conference on Pattern Recognition. 2002. IEEE.

Huang, B., B. Buckley, and T.-M. Kechadi, Multi-objective feature selection by using NSGA-II for customer churn prediction in telecommunications. Expert Systems with Applications, 2010. 37(5): p. 3638-3646.

Hancer, E., et al., Pareto front feature selection based on artificial bee colony optimization. Information Sciences, 2018. 422: p. 462-479.

Hammami, M., et al., A multi-objective hybrid filter-wrapper evolutionary approach for feature selection. Memetic Computing, 2019. 11: p. 193-208.

Al-Tashi, Q., et al., Binary multi-objective grey wolf optimizer for feature selection in classification. IEEE Access, 2020. 8: p. 106247-106263.

Nouri-Moghaddam, B., M. Ghazanfari, and M. Fathian, A novel multi-objective forest optimization algorithm for wrapper feature selection. Expert Systems with Applications, 2021. 175: p. 114737.

Downloads

Published

13.11.2024

How to Cite

Hiba ALMarwi. (2024). A Review of Feature Selection Methods in Big Data . International Journal of Intelligent Systems and Applications in Engineering, 12(4), 4347–4366. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/7055

Issue

Section

Research Article