Data Mining Techniques in Bioinformatics Analysis
Keywords:
Data Mining, Bioinformatics, Microarray Datasets, k-meansAbstract
Microarray experiments yield vast datasets containing expression data for thousands of genes across a limited number of samples, usually no more than a few dozen. A major challenge is identifying groups of genes that are co-regulated and collectively show strong associations with specific outcome variables. To tackle this challenge, we suggest using k-means clustering algorithms, which leverage external information about response variables to group genes effectively. We propose an algorithm based on logistic regression analysis that integrates gene selection, supervision, gene clustering, and sample classification into a single streamlined process. Through empirical studies on diverse microarray datasets, we demonstrate its ability to pinpoint gene clusters whose expression centroids exhibit robust predictive potential, surpassing conventional methods focused on individual gene analysis. This approach not only promises advancements in medical diagnostics and prognostics but also enhances functional genomics by offering insights into gene function and regulation.
Downloads
References
Nguyen D, Rocke D: Tumor Classification by Partial Least Squares Using Microarray Gene Expression Data. Bioinformatics 2002, 18: 39–50.
Hastie T, Tibshirani R, Botstein D, Brown P: Supervised Harvesting of Expression Trees. Genome Biology 2001, 1: 1–12.
Dettling M, B¨uhlmann P: Supervised Clustering of Genes. Genome Biology 2002, 3: research 0069.1–0069.15.
J¨ornsten R, Yu B. Simultaneous Gene Clustering and Subset Selection for Sample Classification via MDL. To appear in Bioinformatics 2003.
Bickel P, Klaassen C, Ritov Y, Wellner J: Efficient and Adaptive Estimation for Semiparametric Models. John Hopkins University Press, 1993.
Dudoit S, Fridlyand J: A Prediction-Based Resampling Method to Estimate the Number of Clusters in a Dataset. Genome Biology 2002, 3(7): 0036.1– 0036.21.
Tibshirani R, Walther G, Hastie T: Estimating the Number of Clusters in a Dataset via the Gap Statistic. Technical Report 208, Department of Statistics, Stanford University, 2000.
La Cessie S, Van Houwelingen J: Ridge Estimators in Logistic Regression. Applied Statistics 1990, 41, 191–201.
Eilers P, Boer J, Van Ommen G, Van Houwelingen H: Classification of Microarray Data with Penalized Logistic Regression. Proceedings of SPIE 2001, Volume 4266: Progress in biomedical optics and imaging, 2: 187–198.
Zhu J, Hastie T: Classification of Gene Microarrays by Penalized Logistic Regression. Preprint, Department of Statistics, Stanford University, 2002.
Dettling M, B¨uhlmann P: Boosting for Tumor Classification with Gene Expression Data. To appear in Bioinformatics 2003.
Allwein E, Schapire R, Singer Y: Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers. Journal of Machine Learning Research 2000, 1: 113–141.
Hoerl A, Kennard R: Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics 1970, 12: 55–67.
Golub T, Slonim D, Tamayo P, Huard C, Gassenbeek M, Coller H, Loh M, Downing J, Caliguri M, Bloomfield C, Lander E: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 1999, 286: 531–538
Dudoit S, Fridlyand J, Speed T: Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data. Journal of the American Statistical Association 2002, 97: 77–87.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.