Efficient Microarray Gene Expression Data Sample Classification using Statistical Class Prediction Method

Authors

  • Rais Allauddin Mulla Department of Computer Engineering, Vasantdada Patil Pratishthan College of Engineering and Visual Arts, Mumbai, Maharashtra, India
  • Mahendra Eknath Pawar Department of Computer Engineering, Vasantdada Patil Pratishthan College of Engineering and Visual Arts, Mumbai, Maharashtra, India
  • Balasaheb Balkhande Associate professor, Vasantdada Patil Pratishthan College of Engineering and Visual Arts, Mumbai, Maharashtra, India
  • Vinod N. Alone Assistant Professor, Department of Computer Engineering, Vasantdada Patil Pratishthan's College of Engineering, Mumbai, Maharashtra, India
  • Vikas Narayan Nandgaonkar Department of Computer Engineering, Indira College of Engineering and Management, Pune, Maharashtra, India
  • Nidhi Ranjan Department of AI & DS, Vasantdada Patil Pratishthan College of Engineering and Visual Arts, Mumbai, Maharashtra, India

Keywords:

Gene Expression, Classification, machine learning, infiltration, Expression data, Hybrid deep learning method

Abstract

Insights into numerous biological processes and disease mechanisms are provided by microarray gene expression data, which is vital for biomedical research. Classifying samples into several predetermined groups based on their gene expression patterns is one of the core tasks in microarray data analysis. Our approach makes use of a thorough pipeline that includes feature selection, classification, and data preprocessing. To assure data quality and consistency, preprocessing procedures like normalization, missing value imputation, and noise reduction are first applied to the raw microarray data. The most insightful genes that considerably aid in the classification process are then found using a feature selection technique. We use a statistical class prediction approach based on an appropriate statistical model, such as logistic regression, support vector machines, or random forests, to carry out the classification. To ensure robustness and generalizability, the chosen model is trained on a labelled training set and its performance is assessed using cross-validation procedures. We carried out extensive tests on publically accessible microarray gene expression datasets related to various diseases to evaluate the efficacy of our suggested strategy. The outcomes show that our strategy outperforms previous approaches in terms of classification precision, sensitivity, specificity, and overall predictive power. Additionally, we discuss the biological significance of the discovered gene markers, offering light on putative molecular pathways causing the disorders under investigation.

Downloads

Download data is not yet available.

References

Alanni, R.; Hou, J.; Azzawi, H.; Xiang, Y. Deep gene selection method to select genes from microarray datasets for cancer classification. BMC Bioinform. 2019, 20, 608.

Zhao, Z.; Morstatter, F.; Sharma, S.; Alelyani, S.; Anand, A.; Liu, H. Advancing feature selection research. ASU Feature Sel. Repos. 2010, 1–28, doi 10.1.1.642.5862

Elloumi, M.; Zomaya, A.Y. Algorithms in Computational Molecular Biology: Techniques, Approaches and Applications; John Wiley & Sons: Hoboken, NJ, USA, 2011; Volume 21.

Bolón-Canedo, V.; Sánchez-Marono, N.; Alonso-Betanzos, A.; Benítez, J.M.; Herrera, F. A review of microarray datasets and applied feature selection methods. Inf. Sci. 2014, 282, 111–135.

Almugren, N.; Alshamlan, H. A survey on hybrid feature selection methods in microarray gene expression data for cancer classification. IEEE Access 2019, 7, 78533–78548.

Ding, C.; Peng, H. Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 2005, 3, 185–205. Genes 2020, 11, 819 26 of 28

Li, J.; Cheng, K.; Wang, S.; Morstatter, F.; Trevino, R.P.; Tang, J.; Liu, H. Feature selection: A data perspective. ACM Comput. Surv. (CSUR) 2017, 50, 94.

Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238.

Fakoor, R.; Ladhak, F.; Nazi, A.; Huber, M. Using deep learning to enhance cancer diagnosis and classification. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; ACM: New York, NY, USA, 2013; Volume 28.

Chen, Y.; Li, Y.; Narayan, R.; Subramanian, A.; Xie, X. Gene expression inference with deep learning. Bioinformatics 2016, 32, 1832–1839.

Sevakula, R.K.; Singh, V.; Verma, N.K.; Kumar, C.; Cui, Y. Transfer learning for molecular cancer classification using deep neural networks. IEEE/ACM Trans. Comput. Biol. Bioinform. 2018, 16, 2089–2100.

Shi, L.; Campbell, G.; Jones, W.D.; Campagne, F.; Wen, Z.; Walker, S.J.; Su, Z.; Chu, T.M.; Goodsaid, F.M.; Pusztai, L.; et al. The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nat. Biotechnol. 2010, 28, 827.

Khetani, V. ., Gandhi, Y. ., Bhattacharya, S. ., Ajani, S. N. ., & Limkar, S. . (2023). Cross-Domain Analysis of ML and DL: Evaluating their Impact in Diverse Domains. International Journal of Intelligent Systems and Applications in Engineering, 11(7s), 253–262.

Selvaraj, C.; Kumar, R.S.; Karnan, M. A survey on application of bio-inspired algorithms. Int. J. Comput. Sci. Inf. Technol. 2014, 5, 366–70.

Duncan, J.; Insana, M.; Ayache, N. Biomedical Imaging and Analysis In the Age of Sparsity, Big Data, and Deep Learning. Proc. IEEE 2020, 108, doi:10.1109/JPROC.2019.2956422.

Bojarski, M.; Del Testa, D.; Dworakowski, D.; Firner, B.; Flepp, B.; Goyal, P.; Jackel, L.D.; Monfort, M.; Muller, U.; Zhang, J.; et al. End to end learning for self-driving cars. arXiv 2016, arXiv:1604.07316.

Huynh, B.Q.; Li, H.; Giger, M.L. Digital mammographic tumor classification using transfer learning from deep convolutional neural networks. J. Med. Imaging 2016, 3, 034501.

Spanhol, F.A.; Oliveira, L.S.; Petitjean, C.; Heutte, L. Breast cancer histopathological image classification using Convolutional Neural Networks. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016; pp. 2560–2567. doi:10.1109/IJCNN.2016.7727519.

Han, Z.; Wei, B.; Zheng, Y.; Yin, Y.; Li, K.; Li, S. Breast cancer multi-classification from histopathological images with structured deep learning model. Sci. Rep. 2017, 7, 4172.

Lévy, D.; Jain, A. Breast mass classification from mammograms using deep convolutional neural networks. arXiv 2016, arXiv:1612.00542. 21. Liao, Q.; Ding, Y.; Jiang, Z.L.; Wang, X.; Zhang, C.; Zhang, Q. Multi-task deep convolutional neural network for cancer diagnosis. Neurocomputing 2019, 348, 66–73.

Chapman, A. Digital Games as History: How Videogames Represent the Past and Offer Access to Historical Practice; Routledge Advances in Game Studies, Taylor & Francis: Abingdon, UK, 2016; pp. 185–185.

Ikeda, N.; Watanabe, S.; Fukushima, M.; Kunita, H. Itô’s Stochastic Calculus and Probability Theory; Springer: Tokyo, Japan, 2012.

Sato, I.; Nakagawa, H. Approximation analysis of stochastic gradient Langevin dynamics by using Fokker–Planck equation and Ito process. In International Conference on Machine Learning; PMLR: Bejing, China, 2014; pp. 982–990.

Polley, E.C.; Van Der Laan, M.J. Super Learner in Prediction. U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 266. May 2010. Available online: https://biostats.bepress.com/ucbbiostat/ paper266/ (accessed on 15 March 2010).

Sollich, P.; Krogh, A. Learning with ensembles: How overfitting can be useful. In Advances in Neural Information Processing Systems; NIPS: Denver, CO, USA, 1995; pp. 190–196.

Shi, L.; Reid, L.H.; Jones, W.D.; Shippy, R.; Warrington, J.A.; Baker, S.C.; Collins, P.J.; De Longueville, F.; Kawasaki, E.S.; Lee, K.Y.; et al. The MicroArray Quality Control (MAQC) project shows inter-and intraplatform reproducibility of gene expression measurements. Nat. Biotechnol. 2006, 24, 1151.

Chen, J.J.; Hsueh, H.M.; Delongchamp, R.R.; Lin, C.J.; Tsai, C.A. Reproducibility of microarray data: A further analysis of microarray quality control (MAQC) data. BMC Bioinform. 2007, 8, 412.

Guilleaume, B. Microarray Quality Control. By Wei Zhang, Ilya Shmulevich and Jaakko Astola. Proteomics 2005, 5, 4638–4639.

B. Chandra and Manish Gupta,“ An efficient statistical feature selection approach for classification of gene expression data”, Journal of Biomedical Informatics 44 ;529–535, 2011.

S.Cho and H. Won,” Machine learning in dna microarray analysis for cancer classification”, First Asia Pacific bioinformatics conference on Bioinformatics 2003:189–98, 2003.

P. Chopra et al.,”Improving cancer classification accuracy using gene pairs”. PloS One, 5(12), 2010.

T. Cover and J. Thomas, “Elements of Information Theory”, John Wiley and sons, 1991.

C. Cortes and V. Vapnik, “Support Vector Networks”, Machine Learning, 1995; 20:3: 273-297, 1995.

C. Ding and H. Peng , “Minimum redundancy feature selection from microarray gene expression data,” Journal of Bio-informatics and Computational Biology, vol. 3, no. 2, pp. 523-529, 2003.

A.Dupuy and R.Simon, “Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting”, J Natl Cancer Inst ;9:147–57, 2007.

P, R. H. ., B, S. D. ., M, D. K. ., Sooda, K. ., & B, K. R. . (2023). Transfer Learning based Automated Essay Summarization. International Journal on Recent and Innovation Trends in Computing and Communication, 11(1), 20–25. https://doi.org/10.17762/ijritcc.v11i1.5983

Mr. Rahul Sharma. (2013). Modified Golomb-Rice Algorithm for Color Image Compression. International Journal of New Practices in Management and Engineering, 2(01), 17 - 21. Retrieved from http://ijnpme.org/index.php/IJNPME/article/view/13

Anand, R., Khan, B., Nassa, V. K., Pandey, D., Dhabliya, D., Pandey, B. K., & Dadheech, P. (2023). Hybrid convolutional neural network (CNN) for kennedy space center hyperspectral image. Aerospace Systems, 6(1), 71-78. doi:10.1007/s42401-022-00168-4

Downloads

Published

16.08.2023

How to Cite

Allauddin Mulla, R. ., Pawar, M. E. ., Balkhande, B. ., Alone, V. N. ., Nandgaonkar, V. N. ., & Ranjan, N. . (2023). Efficient Microarray Gene Expression Data Sample Classification using Statistical Class Prediction Method. International Journal of Intelligent Systems and Applications in Engineering, 11(10s), 691–700. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/3324

Issue

Section

Research Article

Most read articles by the same author(s)