A Fast Simple Linear (FaSL) Unsupervised Feature Extraction Method

Karteeka Pavan Kanadam

Authors

Karteeka Pavan Kanadam, G.L.N.Jaya Prada, Jeevanajyothi Pujari, Hymavathi Thottathyl

Keywords:

Dimensionality reduction, linear, nonlinear, clustering, PCA, LDA

Abstract

The increase in volume of high-dimensional data necessitates the use of dimensionality reduction strategies (DRS), which reduce dimensions and extract meaningful insights by eradicating irrelevant features. Linear and nonlinear are the two types in DRS. Nonlinear dimensionality reduction methods have gained considerable popularity in recent years due to their effectiveness in handling real-world datasets with complex nonlinear structures. However, there are some fields where linear data sets are frequently used, including physics, economics, health informatics, social sciences, etc. The major drawback of many existing linear and nonlinear DRS models is their computationally expensive nature. To address this issue, a fast, simple, linear (FaSL) unsupervised feature extraction method is proposed using descriptive statistics. The FaSL performance is evaluated by applying clustering on various benchmark data sets and compared with five linear state-of-the-art methods. The experimental results demonstrate that FaSL outperforms other linear models such as PCA, LDA, LPP, ICA, and FA in terms of accuracy and computation time. The average accuracy improvement of FaSL over PCA, LDA, LPP, ICA, and FA is, in order, 3.4, 9.2, 5.67, 3.97, and 0.075 while reducing computational time by 2.26, 3.1, 1.29, 7.58, and 6.2 times, respectively.

Downloads

Download data is not yet available.

References

H. Liu and H. Motoda, Computational Methods of Feature Selection. Boca Raton, FL, USA: CRC Press, 2007.

H. Li, T. Jiang, K. Zhang, Efficient and robust feature extraction by maximum margin criterion, IEEE Transactions on Neural Networks 17 (1) (2006) 157– 165.

J. Yan, B. Zhang, N. Liu, S. Yan, Q. Cheng, W. Fan, Q. Yang, W. Xi, Z. Chen, Effective and efficient dimensionality reduction for large-scale and streaming data preprocessing, IEEE Transactions on Knowledge and Data Engineering 18 (3) (2006) 320–333

W. Wang, J. Shen, and L. Shao, “Video salient object detection via fully convolutional networks,” IEEE Trans. Image Process., vol. 27, no. 1, pp. 38–49, Jan. 2018.

J. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos, “Face recognition using LDA-based algorithms,” IEEE Trans. Neural Netw., vol. 14, no. 1, pp. 195–200, Jan. 2003.

H.Gunduz, An efficient dimensionality reduction method using filter-based feature selection and variational autoencoders on Parkinson’s disease classification, Biomedical Signal Processing and Control 66 (2021) 102452.

J. Han, Z. Ge, Effect of dimensionality reduction on stock selection with cluster analysis in different market situations, Expert Systems with Applications 147 (2020) 113226.

H. Liu and H. Motoda, Feature Selection for Knowledge Discovery and Data Mining, vol. 454. Berlin, Germany: Springer, 2012.

J. P. Cunningham and Z. Ghahramani, “Linear dimensionality reduction: Survey, insights, and generalizations,” J. Mach. Learn. Res., vol. 16, pp. 2859–2900, 2015.

Wei, Qin Yue, Kai Feng, Junbiao Cui, and Jiye Liang, “Unsupervised Dimensionality Reduction Based on Fusing Multiple Clustering Results”, IEEE Transactions on Knowledge and Data Engineering, VOL. 35, NO. 3, 32w11-3223, 2023.

Ruisheng Ran, Ji Feng, Shougui Zhang, and Bin Fang, “A General Matrix Function Dimensionality Reduction Framework and Extension for Manifold Learning”, IEEE Transactions on Cybernetics, vol.52, no.4, 2137-2148, 2022.

Smita Rath, Alakananda Tripathy, Alok Ranjan Tripathy, Prediction of new active cases of coronavirus disease (COVID-19) pandemic using multiple linear regression model.

Diabetes& Metabolic Syndrome: Clinical Research & Reviews, Volume 14, Issue 5, 2020, Pages 1467-1474, ISSN 1871-4021.

Davide Festa, Alessandro Novellino, Ekbal Hussain, Luke Bateson, Nicola Casagli, Pierluigi Confuorto, Matteo Del Soldato, Federico Raspini, Unsupervised detection of InSAR time series patterns based on PCA and K-means clustering, International Journal of Applied Earth Observation and Geoinformation, Volume 118, 2023, 103276, ISSN 1569-8432.

Qingsong Xiong, Haibei Xiong, Qingzhao Kong, Xiangyong Ni, Ying Li, Cheng Yuan, Machine zearning-driven seismic failure mode identification of reinforced concrete shear walls based on PCA feature extraction, Structures, Volume 44, 2022, Pages 1429-1442, ISSN 2352-0124.

Nai Xue Zhang, Yuzhong Zhong, Songyi Dian, Rethinking unsupervised texture defect detection using PCA, Optics and Lasers in Engineering, Volume 163, 2023, 107470, ISSN 0143-8166.

Ying chao Huang, Abdul Bais, A novel PCA-based calibration algorithm for classification of challenging laser-induced breakdown spectroscopy soil sample data, Spectrochimica Acta Part B: Atomic Spectroscopy, Volume 193, 2022, 106451, ISSN 0584-8547.

Iqbal H. Sarker, Machine Learning: Algorithms, Real World Applications and Research Directions, SN Computer Science (2021) 2:160.

Fa Zhu, Junbin Gao, Jian Yang, Ning Ye, Neighborhood linear discriminant analysis, Pattern Recognition, Volume 123, 2022, 108422, ISSN 0031-3203

Shengkun Xie, Feature extraction of auto insurance size of loss data using functional principal component analysis, Expert Systems with Applications, Volume 198, 2022, 116780, ISSN 0957-4174.

Meier, A., Kramer, O. (2017). An Experimental Study of Dimensionality Reduction Methods. In: Kern-Isberner, G., Fürnkranz, J., Thimm, M. (eds) KI 2017: Advances in Artificial Intelligence. KI 2017. Lecture Notes in Computer Science (), vol 10505. Springer, Cham. https://doi.org/10.1007/978-3-319-67190-1_14.

L. van der Maaten, E. Postma, J. van den Herik, “Dimensionality reduction: A comparative review,” tech. rep., Tilburg University, Netherlands, 2009. Tech. report TiCC TR 2009-005.

Rizgar R. Zebari, Adnan Mohsin Abdulazeez, Diyar Qader Zeebaree, Dilovan Asaad Zebari, Jwan Najeeb Saeed, “A Comprehensive Review of Dimensionality Reduction Techniques for Feature Selection and Feature Extraction”, Journal of Applied Science and Technology Trends, Vol. 01, No.

02, pp. 56 –70 (2020).

Haozhe Xie, Jie Li, Qiaosheng Zhang, Yadong Wang, “Comparison among dimensionality reduction techniques based on Random Projection for cancer classification, Computational Biology and Chemistry, Volume 65, 2016, Pages 165-172 ISSN 1476-9271

Z. Zhao, L. Wang, H. Liu, and J. Ye, “On similarity preserving feature selection,” IEEE Trans Knowl. Data Eng., vol. 25, no. 3, pp. 619–632, Mar. 2013.

D. Wang, F. Nie, and H. Huang, “Global redundancy minimization for feature ranking,” IEEE Trans. Knowl. Data Eng., vol. 27, no. 10, pp. 2743–2755, 2015.

D. Cai, C. Zhang, and X. He, “Unsupervised feature selection for multi-cluster data,” in Proc. 16th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2010, pp. 333–342.

M. Qian and C. Zhai, “Robust unsupervised feature selection,” in Proc. 23rd Int. Joint Conf. Artif. Intell., 2013, pp. 1621–1627.

Z. Xu, I. King, M. R.-T. Lyu, and R. Jin, “Discriminative semi supervised feature selection via manifold regularization,” IEEE Trans. Neural Netw., vol. 21, no. 7, pp. 1033–1047, Jul. 2010.

K. Benabdeslem and M. Hindawi, “Efficient semi-supervised feature selection: Constraint, relevance, and redundancy,” IEEE Trans. Knowl. Data Eng., vol. 26, no. 5, pp. 1131–1143, May 2014.

Mishra D, Sharma S (2021) Performance analysis of dimensionality reduction techniques: a comprehensive review. Adv Mech Eng:639–651.

Moore B (1981) Principal component analysis in linear systems: controllability, observability, and model reduction. IEEE Trans Autom Control 26(1):17–32 49.

Abdi H, Williams LJ (2010) Principal component analysis. Wiley Interdiscip Rev Comput Stat 2(4):433–459.

Wang S et al (2016) Semi-supervised linear discriminant analysis for dimension reduction and classification. Pattern Recogn 57:179–189.

T. Jolliffe, “Principal component analysis and factor analysis,” in Principal Component Analysis, pp. 115–128, Springer, 1986.

X. He, P. Niyogi, Locality preserving projections, in: Advances in Neural Information Processing Systems, 2003, pp. 153–160.

Comon P (1994) Independent component analysis, a new concept? Signal Process 36(3):287–314 57. Hyvärinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13(4–5):411–430.

J. Cunningham and Z. Ghahramani, “Linear dimensionality reduction: Survey, insights, and generalizations,” JMLR, vol. 16, pp. 2859–2900, 2015.

Rand WM 1971, “Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association,” vol.66, pp.846-850

M. Halkidi, Y. Batistakis, and M. Vazirgiannis, “On clustering validation techniques,” J. Intell. Inf. Syst., vol. 17, no. 2, pp. 107–145, 2001.

N. X. Vinh, J. Epps, and J. Bailey, “Information theoretic measures for clustering’s comparison: Variants, properties, normalization and correction for chance,” J. Mach. Learn. Res., vol. 11, pp.2837– 2854, 2010.

Davies DL and Bouldin DW, 1979. A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence 1979, vol.1, pp.224-227.

Kaggle: your home for data sciencehttps://www.kaggle.com/datasets/vikrishnan/boston-house-prices?resource=download.

UCI Machine learning repositoryhttps://archive.ics.uci.edu/ml/datasets.php?format=&task=clu&att=&area=&numAtt=greater100&numIns=&type=&sort=nameUp&view=table.

Nguyen X. Vinh, Jeffrey Chan, Simone Romano and James Bailey, "Effective Global Approaches for Mutual Information based Feature Selection". Proceedings of the 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'14), August 24-27, New York City, 2014.

P. Fänti and S. Sieranoja K-means properties on six clustering benchmark datasets Applied Intelligence, 48 (12), 4743-4759, December 2018

https://doi.org/10.1007/s10489-018-1238-7.

P. Fränti, O. Virmajoki and V. Hautamäki, "Fast agglomerative clustering using a k-nearest neighbor graph", IEEE Trans. on Pattern Analysis and Machine Intelligence, 28 (11), 1875-1881, November 2006 https://cs.joensuu.fi/sipu/datasets/.

M. Friedman, “A comparison of alternative tests of significance for the problem of m rankings,” Ann. Math. Statist., vol. 11, no. 1, pp. 86–92, 1940.

P. B. Nemenyi, “Distribution-free multiple comparison,” Ph.D. dissertation, Princeton Univ., Princeton, NJ, USA, 1963.

A Fast Simple Linear (FaSL) Unsupervised Feature Extraction Method

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Announcements

Information for Authors

ijisae

Information

trindex