A Comparative Analysis of Dimension Reduction Techniques for High-Dimensional Classification Tasks
Keywords:
practitioners, classification, reduction, computationalAbstract
As machine learning datasets continue to grow in dimensionality, efficient dimension reduction techniques have become essential for both computational efficiency and model performance. This study presents a comprehensive evaluation of various dimension reduction methods—Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP)—for preprocessing high-dimensional data prior to classification. We evaluate their impact on the performance of three widely used classification algorithms: Random Forests, Support Vector Machines, and Neural Networks. Experiments conducted on benchmark datasets (MNIST and Digits) reveal that while no dimension reduction yields the highest overall accuracy (95.31%), specialized techniques can offer significant computational advantages with minimal performance degradation. Our analysis provides empirical evidence that t-SNE offers an optimal balance between classification performance and training efficiency, particularly for support vector machines. We further demonstrate that dimension reduction techniques exhibit dataset-dependent effectiveness, suggesting the need for adaptive selection strategies based on data characteristics. This work provides valuable insights for practitioners seeking to optimize machine learning pipelines for high-dimensional classification tasks.
Downloads
References
Abdi, H. and Williams, L. J. (2010). Principal component analysis. , 2(4):433–459.
Belkin, M. and Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. , 15(6):1373–1396.
Bellman, R. E. (1966). Dynamic programming. , 153(3731):34–37.
Bengio, Y., Courville, A., and Vincent, P. (2013). Representation learning: A review and new perspectives. , 35(8):1798–1828.
Bishop, C. M. (2006). . Springer.
Breiman, L. (2001). Random forests. , 45(1):5–32.
Breiman, L., Friedman, J., Stone, C. J., and Olshen, R. A. (1984). . CRC press.
Cortes, C. and Vapnik, V. (1995). Support-vector networks. , 20(3):273–297.
Cristianini, N. and Shawe-Taylor, J. (2000). . Cambridge university press.
Cunningham, J. P. and Ghahramani, Z. (2015). Linear dimensionality reduction: Survey, insights, and generalizations. , 16(1):2859–2900.
Deegalla, S. and Bostrom, H. (2006). Reducing high-dimensional data by principal component analysis vs. random projection for nearest neighbor classification. In 5th International Conference on Machine Learning and Applications (ICMLA’06), pages 245–250. IEEE.
Ding, C. and He, X. (2004). K-means clustering via principal component analysis. In Proceedings of the twenty-first international conference on Machine learning, page 29.
Espadoto, M., Martins, R. M., Kerren, A., Hirata, N. S., and Telea, A. C. (2019). Toward a quantitative survey of dimension reduction techniques. , 27(3):2153–2173.
Friedman, J. H. (1997). On bias, variance, 0/1—loss, and the curse-of-dimensionality. , 1(1):55–77.
Ghodsi, A. (2006). Dimensionality reduction a short tutorial. , 37(38):39.
Goodfellow, I., Bengio, Y., and Courville, A. (2016). . MIT press.
Guyon, I. and Elisseeff, A. (2003). An introduction to variable and feature selection. , 3(Mar):1157–1182.
Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. , 24(6):417.
Jolliffe, I. T. (2002). . Springer.
Kittler, J. (1978). Feature selection and extraction. , 59:81.
Krijthe, J. H. and van der Maaten, L. (2017). Comparing dimensionality reduction techniques using structure preserving assessment. In European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning.
Kruskal, J. B. (1964). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. , 29(1):1–27.
LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. , 86(11):2278–2324.
Linderman, G. C., Rachh, M., Hoskins, J. G., Steinerberger, S., and Kluger, Y. (2019). Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data. , 16(3):243–245.
van der Maaten, L. and Hinton, G. (2008). Visualizing data using t-SNE. , 9(Nov):2579–2605.
McInnes, L., Healy, J., and Melville, J. (2018). UMAP: Uniform manifold approximation and projection for dimension reduction. .
Pearson, K. (1901). LIII. On lines and planes of closest fit to systems of points in space. , 2(11):559–572.
Pechenizkiy, M., Tsymbal, A., and Puuronen, S. (2004). PCA-based feature transformation for classification: issues in medical diagnostics. In Proceedings. 17th IEEE Symposium on Computer-Based Medical Systems, pages 535–540. IEEE.
Quinlan, J. R. (1986). Induction of decision trees. , 1(1):81–106.
Roweis, S. T. and Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. , 290(5500):2323–2326.
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning representations by back-propagating errors. , 323(6088):533–536.
Schölkopf, B., Smola, A. J., Bach, F., et al. (2002). . MIT press.
Sugiyama, M. (2007). Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. , 8(May):1027–1061.
Tenenbaum, J. B., De Silva, V., and Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. , 290(5500):2319–2323.
Trunk, G. V. (1979). A problem of dimensionality: A simple example. , (3):306–307.
van der Maaten, L. (2014). Accelerating t-SNE using tree-based algorithms. , 15(1):3221–3245.
van der Maaten, L., Postma, E., and van den Herik, J. (2009). Dimensionality reduction: a comparative review. , 10:66–71.
van der Maaten, L. J., Postma, E. O., and Van Den Herik, H. J. (2007). Dimensionality reduction: A comparative review. .
Vapnik, V. N. (1999). An overview of statistical learning theory. , 10(5):988–999.
Xanthopoulos, P., Pardalos, P. M., and Trafalis, T. B. (2013). A survey of dimensionality reduction approaches for high-dimensional data sets. In Linear and Nonlinear Dimensionality Reduction, pages 59–90. Springer.
Yang, Y. and Nataliani, Y. (2017). Dimension reduction methods for microarray data: a review. In AIP Conference Proceedings, volume 1862, page 030128. AIP Publishing LLC.
Yang, Y., Xu, D., Nie, F., Yan, S., and Zhuang, Y. (2005). Local and nonlocal preserving projection for dimensionality reduction. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 2, pages 1059–1065. IEEE.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.


