Identification of Lung Cancer Using Ensemble Methods Based on Gene Expression Data

Authors

  • K. Mary Sudha Rani Research scholar, Dept. of CSE, JNTUH, Assistant Professor, CSE Dept., Chaitanya Bharathi of Technology Hyderabad, Telangana, India
  • V. Kamakshi Prasad Professor, Dept. of CSE, JNTUH, Telangana, India

Keywords:

Gene Expression, Lung cancer, Ensemble machine learning Random forest, AdaBoost

Abstract

Lung cancer is consistently classified as the most dangerous form of the disease since the beginning of recorded history. Patients with lung cancer who receive appropriate medical care, such as a low-dose CT scan, have a far better chance of survival since the disease is detected and diagnosed early. Nonetheless, there are certain drawbacks to this attempt. The gene expression level in hundreds of genes or cells within each tissue may now be determined because of developments in DNA microarray technology. Even though machine learning (ML) is rapidly being used in the medical field for lung cancer detection, the shortage of interpretability of these models remains a significant hurdle. Machine learning can be used to analyze gene expression data (DNA microarray) to predict whether or not a patient has lung cancer. The Collective Random Forest and Adaptive Boosting were employed to determine who was responsible for the harm. KPCA, or Kernel principal component analysis, was used for the feature reduction procedure. We calculated the correlation between each feature and the target using the statistical parameters provided by KPCA. Determining the proportion of the correct predictions for a given data set is one way to calculate the accuracy of a classification model. We tested the validity of the proposed technique in this work using a dataset including information about lung cancer. The dataset includes GSE4115 from the Gene Expression Omnibus (GEO) database, as well as the expression profiles it contains. The findings demonstrate the Identification of Lung Cancer (IOLC) model's potential to detect lung cancer in terms of accuracy, precision, recall, F-Measure, and error rate, with results indicating an accuracy of 81%, the precision of 81.2%, recall of 78.9%, F-Measure of 77.7%, and error rate of 0.29%, respectively.

Downloads

Download data is not yet available.

References

Rebecca L Siegel, Kimberly D Miller, and Ahmedin Jemal. Cancer statistics, 2018. CA: a cancer journal for clinicians, 68(1):7–30, 2018.

Lindsey A. Torre, Rebecca L. Siegel, and Ahmedin Jemal. Lung Cancer Statistics. Springer International Publishing, 2016.

Howard Lee and Yi Ping Phoebe Chen. Image based computer aided diagnosis system for cancer detection. Expert Systems with Applications 42(12):5356–5365, 2015.

Azian Azamimi Abdullah and Syamimi Mardiah Shaharum. Lung cancer cell classification method using artificial neural network. Information engineering letters, 2(1), 2012.

Z. Cai, D. Xu, Q. Zhang, J. Zhang, S. M. Ngai, and J. Shao. Classification of lung cancer using ensemble-based feature selection and machine learning methods. Molecular Biosystems, 11(3):791–800, 2015.

Maciej Zięba, Jakub M Tomczak, Marek Lubicz, and Jerzy Świątek. Boosted svm for extracting rules from imbalanced data in application to predictionof the post-operative life expectancy in the lung cancer patients. Applied soft computing, 14:99–108, 2014.

Golrokh Mirzaei, Anahita Adeli, and Hojjat Adeli. Imaging and machine learning techniques for diagnosis of alzheimerâĂŹs disease. Reviews in the Neurosciences, 27(8):857–870, 2016.

Aboul Ella Hassanien, Hossam M Moftah, Ahmad Taher Azar, and Mahmoud Shoman. Mri breast cancer diagnosis hybrid approach using adaptive ant-based segmentation and multilayer perceptron neural networks classifier. Applied Soft Computing Journal, 14(1):62–71, 2014.

Qingyong Wang, Liang-Yong Xia, Hua Chai, and Yun Zhou. Semi-supervised learning with ensemble self-training for cancer classification. In 2018IEEE Smart World, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud& Big Data Computing, Internet of People and Smart City Innovation (Smart World/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), pages 796–803. IEEE, 2018.

M Pawan Kumar, Benjamin Packer, and Daphne Koller. Self-paced learning for latent variable models. In Advances in Neural Information Processing Systems, pages 1189–1197, 2010.

Changsheng Li, Junchi Yan, Fan Wei, Weishan Dong, Qingshan Liu, and Hongyuan Zha. Self-paced multi-task learning. In AAAI, pages 2175–2181, 2017.

Ye Tang, Yu Bin Yang, and Yang Gao. Self-paced dictionary learning for image classification. In ACM International Conference on Multimedia, pages 833–836, 2012.

Liang-Yong Xia, Qing-Yong Wang, Zehong Cao, and Yong Liang. Descriptor selection improvements for quantitative structure-activity relationships. International Journal of Neural Systems, pages 1–16, 2019.

Abiezer, Otniel & Nhita, Fhira & Kurniawan, Isman. (2022). Identification of Lung Cancer in Smoker Person Using Ensemble Methods Based on Gene Expression Data. 89-93. 10.1109/IC2IE56416.2022.9970035.

Onwuka, Justina & Zahed, Hana & Feng, Xiaoshuang & Alcala, Karine & Johansson, Mattias & Robbins, Hilary & Consortium, Lung. (2023). Abstract 1950: Socioeconomic status and lung cancer incidence: An analysis of data from 15 countries in the Lung Cancer Cohort Consortium. Cancer Research. 83. 1950-1950. 10.1158/1538-7445.AM2023-1950.

Fatima, Fayeza Sifat & Jaiswal, Arunima & Sachdeva, Nitin. (2023). Lung Cancer Detection Using Ensemble Learning. 10.1007/978-3-031-23724-9_15.

Zolfaghari, Behrouz & Mirsadeghi, Leila & Bibak, Khodakhast & Kavousi, Kaveh. (2023). Cancer Prognosis and Diagnosis Methods Based on Ensemble Learning. ACM Computing Surveys. 55. 10.1145/ 3580218.

Pradhan, Kanchan & Chawla, Priyanka & Tiwari, Rajeev. (2022). HRDEL: High Ranking Deep Ensemble Learning-based Lung Cancer Diagnosis Model. Expert Systems with Applications. 213. 118956. 10.1016/ j.eswa.2022.118956.

Zhou, Zhi & Yang, Yu & Chen, Shi. (2002). Lung Cancer Cell Identification Based on Artificial Neural Network Ensembles. Artificial intelligence in medicine. 24. 25-36. 10.1016/S0933-3657(01)00094-X.

Spira, A.; Beane, J.E.; Shah, V.; Steiling, K.; Liu, G.; Schembri, F.; Gilman, S.; Dumas, Y.M.; Calner, P.; Sebastiani, P.; et al. Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer. Nat. Med. 2007, 13, 361.

Gustafson, A.M.; Soldi, R.; Anderlind, C.; Scholand, M.B.; Qian, J.; Zhang, X.; Cooper, K.; Walker, D.; Mc Williams, A.; Liu, G.; et al. Airway PI3K pathway activation is an early and reversible event in lung cancer development. Sci. Transl. Med. 2010, 2, 26ra25–26ra25.

Edgar, R.; Domrachev, M.; Lash, A.E. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002, 30, 207–210.

Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32.

Freund, Y.; Schapire, R.E. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. Syst. Sci. 1997, 55, 119–139.

Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference and Prediction, 2nd ed.; Springer: Berlin, Germany, 2009.

Leila Abadi, Amira Khalid, Predictive Maintenance in Renewable Energy Systems using Machine Learning , Machine Learning Applications Conference Proceedings, Vol 3 2023.

Martin, S., Wood, T., Hernandez, M., González, F., & Rodríguez, D. Machine Learning for Personalized Advertising and Recommendation. Kuwait Journal of Machine Learning, 1(4). Retrieved from http://kuwaitjournals.com/index.php/kjml/article/view/156

Raghavendra, S., Dhabliya, D., Mondal, D., Omarov, B., Sankaran, K. S., Dhablia, A., . . . Shabaz, M. (2022). Development of intrusion detection system using machine learning for the analytics of internet of things enabled enterprises. IET Communications, doi:10.1049/cmu2.12530

Downloads

Published

16.08.2023

How to Cite

Rani, K. M. S. ., & Prasad, V. K. . (2023). Identification of Lung Cancer Using Ensemble Methods Based on Gene Expression Data. International Journal of Intelligent Systems and Applications in Engineering, 11(10s), 257–266. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/3249

Issue

Section

Research Article