A Comparative Analysis of Dimension Reduction Techniques for High-Dimensional Classification Tasks

Authors

  • Fardeen NB, Sameer NB

Keywords:

practitioners, classification, reduction, computational

Abstract

As machine learning datasets continue to grow in dimensionality, efficient dimension reduction techniques have become essential for both computational efficiency and model performance. This study presents a comprehensive evaluation of various dimension reduction methods—Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP)—for preprocessing high-dimensional data prior to classification. We evaluate their impact on the performance of three widely used classification algorithms: Random Forests, Support Vector Machines, and Neural Networks. Experiments conducted on benchmark datasets (MNIST and Digits) reveal that while no dimension reduction yields the highest overall accuracy (95.31%), specialized techniques can offer significant computational advantages with minimal performance degradation. Our analysis provides empirical evidence that t-SNE offers an optimal balance between classification performance and training efficiency, particularly for support vector machines. We further demonstrate that dimension reduction techniques exhibit dataset-dependent effectiveness, suggesting the need for adaptive selection strategies based on data characteristics. This work provides valuable insights for practitioners seeking to optimize machine learning pipelines for high-dimensional classification tasks.

Downloads

Download data is not yet available.

References

Abdi, H. and Williams, L. J. (2010). Principal component analysis. , 2(4):433–459.

Belkin, M. and Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. , 15(6):1373–1396.

Bellman, R. E. (1966). Dynamic programming. , 153(3731):34–37.

Bengio, Y., Courville, A., and Vincent, P. (2013). Representation learning: A review and new perspectives. , 35(8):1798–1828.

Bishop, C. M. (2006). . Springer.

Breiman, L. (2001). Random forests. , 45(1):5–32.

Breiman, L., Friedman, J., Stone, C. J., and Olshen, R. A. (1984). . CRC press.

Cortes, C. and Vapnik, V. (1995). Support-vector networks. , 20(3):273–297.

Cristianini, N. and Shawe-Taylor, J. (2000). . Cambridge university press.

Cunningham, J. P. and Ghahramani, Z. (2015). Linear dimensionality reduction: Survey, insights, and generalizations. , 16(1):2859–2900.

Deegalla, S. and Bostrom, H. (2006). Reducing high-dimensional data by principal component analysis vs. random projection for nearest neighbor classification. In 5th International Conference on Machine Learning and Applications (ICMLA’06), pages 245–250. IEEE.

Ding, C. and He, X. (2004). K-means clustering via principal component analysis. In Proceedings of the twenty-first international conference on Machine learning, page 29.

Espadoto, M., Martins, R. M., Kerren, A., Hirata, N. S., and Telea, A. C. (2019). Toward a quantitative survey of dimension reduction techniques. , 27(3):2153–2173.

Friedman, J. H. (1997). On bias, variance, 0/1—loss, and the curse-of-dimensionality. , 1(1):55–77.

Ghodsi, A. (2006). Dimensionality reduction a short tutorial. , 37(38):39.

Goodfellow, I., Bengio, Y., and Courville, A. (2016). . MIT press.

Guyon, I. and Elisseeff, A. (2003). An introduction to variable and feature selection. , 3(Mar):1157–1182.

Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. , 24(6):417.

Jolliffe, I. T. (2002). . Springer.

Kittler, J. (1978). Feature selection and extraction. , 59:81.

Krijthe, J. H. and van der Maaten, L. (2017). Comparing dimensionality reduction techniques using structure preserving assessment. In European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning.

Kruskal, J. B. (1964). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. , 29(1):1–27.

LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. , 86(11):2278–2324.

Linderman, G. C., Rachh, M., Hoskins, J. G., Steinerberger, S., and Kluger, Y. (2019). Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data. , 16(3):243–245.

van der Maaten, L. and Hinton, G. (2008). Visualizing data using t-SNE. , 9(Nov):2579–2605.

McInnes, L., Healy, J., and Melville, J. (2018). UMAP: Uniform manifold approximation and projection for dimension reduction. .

Pearson, K. (1901). LIII. On lines and planes of closest fit to systems of points in space. , 2(11):559–572.

Pechenizkiy, M., Tsymbal, A., and Puuronen, S. (2004). PCA-based feature transformation for classification: issues in medical diagnostics. In Proceedings. 17th IEEE Symposium on Computer-Based Medical Systems, pages 535–540. IEEE.

Quinlan, J. R. (1986). Induction of decision trees. , 1(1):81–106.

Roweis, S. T. and Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. , 290(5500):2323–2326.

Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning representations by back-propagating errors. , 323(6088):533–536.

Schölkopf, B., Smola, A. J., Bach, F., et al. (2002). . MIT press.

Sugiyama, M. (2007). Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. , 8(May):1027–1061.

Tenenbaum, J. B., De Silva, V., and Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. , 290(5500):2319–2323.

Trunk, G. V. (1979). A problem of dimensionality: A simple example. , (3):306–307.

van der Maaten, L. (2014). Accelerating t-SNE using tree-based algorithms. , 15(1):3221–3245.

van der Maaten, L., Postma, E., and van den Herik, J. (2009). Dimensionality reduction: a comparative review. , 10:66–71.

van der Maaten, L. J., Postma, E. O., and Van Den Herik, H. J. (2007). Dimensionality reduction: A comparative review. .

Vapnik, V. N. (1999). An overview of statistical learning theory. , 10(5):988–999.

Xanthopoulos, P., Pardalos, P. M., and Trafalis, T. B. (2013). A survey of dimensionality reduction approaches for high-dimensional data sets. In Linear and Nonlinear Dimensionality Reduction, pages 59–90. Springer.

Yang, Y. and Nataliani, Y. (2017). Dimension reduction methods for microarray data: a review. In AIP Conference Proceedings, volume 1862, page 030128. AIP Publishing LLC.

Yang, Y., Xu, D., Nie, F., Yan, S., and Zhuang, Y. (2005). Local and nonlocal preserving projection for dimensionality reduction. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 2, pages 1059–1065. IEEE.

Downloads

Published

27.12.2022

How to Cite

Fardeen NB. (2022). A Comparative Analysis of Dimension Reduction Techniques for High-Dimensional Classification Tasks. International Journal of Intelligent Systems and Applications in Engineering, 10(3s), 420 –. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/7778

Issue

Section

Research Article

Similar Articles

You may also start an advanced similarity search for this article.