Extraction of Features from the GC-MS Chromatogram Unstructured Data using Multi-Class and Multi-Label Classification for the Injection of Preprocessed Dataset into Machine Learning Algorithms applicable for E-Nose

Authors

  • Sasedharen Chinnathambi Sasedharen Chinnathambi, Research Scholar, School of Computer Science and Engineering, Bharathidasan University, Tiruchirappalli, TamilNadu, India.
  • Gopinath Ganapathy Gopinath Ganapathy, Professor, Department of Computer Science, Bharathidasan University, Tiruchirappalli, TamilNadu, India.

Keywords:

Machine Learning, Random Forest, E-Nose, Feature Extraction, Support Vector Machine, Decision Tree

Abstract

Gas Chromatography-Mass Spectrometry (GC-MS) is a powerful tool for analyzing complex chemical mixtures, particularly for characterizing chemical compositions. Our paper examines the chemical compositions of Indian Jasminum Sambac, Rosa Damascena, and Human Urine using GC-MS analysis. In the realm of Electronic Noses (E-Noses), which mimic the olfactory capabilities of living organisms, GC-MS data provide a valuable source of chemical information. However, the raw data generated by GC-MS can be complex and unstructured, posing challenges for effective integration with machine learning (ML) algorithms in E-Nose applications. This research focuses on crucial aspects of feature extraction, multi-class and multi-label classification, and proposes a machine learning algorithm for characterizing chemical compounds and their influence on odor classification. Exploratory Data Analysis (EDA) techniques are used to select important variables and explore the potential for discrimination. Linear interpolation enhances the integration of  GC-MS data into ML algorithms for E-Nose applications. This research aims to leverage advanced machine learning techniques, specifically employing multi-output classifiers with various base classifiers (e.g., Random Forest, Decision Tree), for multi-level compound classification in Gas Chromatography-Mass Spectrometry (GC-MS) datasets associated with jasmine, rose, and urine extracts. This work paves the way for automated and efficient compound recognition in complex aromatic profiles.

Downloads

Download data is not yet available.

References

Sasedharen Chinnathambi and Gopinath Ganapathy, “Qualitative Analysis of Chemical Components of Jasminum Sambac and Rosa Damascena by Gas Chromatography-Mass Spectrometry and Its Influences on E-nose to Classify Odour”, Applied Ecology and Environmental Sciences, 2022, Vol. 10, No. 12, 766-775, DOI:10.12691/aees-10-12-10

Sasedharen Chinnathambi and Gopinath Ganapathy, “A literature review of scent technology and analysis on digital smell to capture, classify, transmit and reproduce smell over internet”, Journal of Theoretical and Applied Information Technology, 31’st May 2023. Vol.101. No 10, ISSN: 1992-8645

Kexin Bi et.al., “GC-MS Fingerprints Profiling Using Machine Learning Models for Food Flavor Prediction”, MDPI,www.mdpi.com/journal/processes, Processes 2020, 8, 23; doi:10.3390/pr8010023[4]. Xiaqiong Fan et.al., “Fully automatic resolution of untargeted GC-MS data with deep learning assistance”, Talanta, Volume 244, 1 July 2022, 123415

Fawzan Sigma Aurum et.al., “Predicting Indonesian coffee origins using untargeted SPME − GCMS - based volatile compounds fingerprinting and machine learning approaches”, European Food Research and Technology, 249(8), May 2023, DOI:10.1007/s00217-023-04281-2

Nico Borgsmüller et.al., “Machine learning-based classification to improve Gas Chromatography-Mass spectrometry data processing”, European RFMF Metabomeeting 2020, Jan 2020, Toulouse, France. 263 p., 2020

Kristian Pastor et.al., “Classification of Cereal Flour by Gas Chromatography – Mass Spectrometry (GC-MS) Liposoluble Fingerprints and Automated Machine Learning”, Analytical Letters, Taylor & Francis Online, Volume 55, 2022 - Issue 14, Pages 2220-2226, 21 Mar 2022

Sastia Prama Putri et.al., “GC/MS based metabolite profiling of Indonesian specialty coffee from different species and geographical origin”, Metabolomics, Vol. 15, Iss: 10, pp 126, 18 Sep 2019

Kristian Pastor et.al., “A rapid dicrimination of wheat, walnut and hazelnut four samples using chemometric algorithms on GC/MS data”, Journal of Food Measurement and Characterization, Springer Nature 2019, 10 July 2019

Kristian Pastor et.al., “Discriminating cereal and pseudocereal speciesusing binary system of GC/MS data – Pattern Recognition Approach”, Journal of The Serbian Chemical Society (National Library of Serbia), Vol. 83, Iss: 3, pp 317-329, 01 Apr 2018

Alban Ramette, “Multivariate analyses in microbial cology”, Federation of European Microbiological Societies, Blackwell Publishing Ltd., FEMS Microbiol Ecol 62, 142–160, 2007

L. Tedone, A. Ghiasvand and B. Paull, “Random Forests machine learning applied to gas chromatography – Mass spectrometry derived average mass spectrum data sets for classification and characterisation of essential oils”, Talanta, doi: 10.1016/j.talanta.2019.120471, 2019

Jacobs et.al., “Genetic fingerprinting of salmon louse (Lepeophtheirus salmonis) populations in the North-East Atlantic using a random forest classification approach”, Sci. Rep. 8(1) 1203. https://doi.org/10.1038/s41598-018-19323-z, 2018

Melville, A. Lucieer and J. Aryal, “Object-based random forest classification of Landsat ETM+ and WorldView-2 satellite imagery for mapping lowland native grassland communities in Tasmania”, Australia, Int. J. Appl. Earth Obs. Geoinf. 66 (2018) 46-55 https://doi.org/10.1016/j.jag.2017.11.006.

Amjad et.al., “Raman spectroscopy based analysis of milk using random forest classification”, Vib. Spectrosc. 99 (2018), 124-129. https://doi.org/10.1016/j.vibspec.2018.09.003.

B.V. Canizo et.al., “Intra-regional classification of grape seeds produced in Mendoza province (Argentina) by multi-elemental analysis and chemometrics tools”, Food Chem. 242 (2018) 272-278. https://doi.org/10.1016/j.foodchem.2017.09.062.

F. Tian, L. Yang, F. Lv and P. Zhou, “Predicting liquid chromatographic retention times of peptides from the Drosophila melanogaster proteome by machine learning approaches”, Anal. Chim. Acta 644(1-2) (2009) 10-6. https://doi.org/10.1016/j.aca.2009.04.010.

Downloads

Published

24.03.2024

How to Cite

Chinnathambi, S. ., & Ganapathy, G. . (2024). Extraction of Features from the GC-MS Chromatogram Unstructured Data using Multi-Class and Multi-Label Classification for the Injection of Preprocessed Dataset into Machine Learning Algorithms applicable for E-Nose. International Journal of Intelligent Systems and Applications in Engineering, 12(3), 138–145. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/5231

Issue

Section

Research Article