A High-Level Ensemble Feature Selection Algorithm for Mitigating the Dimensionality in Stress Data

Authors

  • Prashant M. Suryavanshi Department of Computer Science and Engineering and Research Scholar, Dr. A. P. J. Abdul Kalam University, Indore
  • Pradnya A. Vikhar Department of Computer Science and Engineering and Research Supervisor, Dr. A. P. J. Abdul Kalam University, Indore, (M.P.), India.

Keywords:

HLE-FS, Naïve Bayes, ThreadPoolExecutor, PCA

Abstract

Stress is a common response to environmental and psychological factors, negatively impacting mental and physical health. Analyzing stress data with multiple features can reveal contributing factors and aid in developing effective stress management strategies. However, the large dimensionality poses challenges due to many features, leading to overfitting. Feature selection is crucial in mitigating this issue and improving machine learning model performance on stress data. This paper proposes a high-level ensemble feature selection (HLE-FS) algorithm for stress data. The algorithm aims to identify the most informative features relevant to stress classification, which can lead to a better understanding of the underlying factors contributing to stress and more accurate stress prediction. The proposed algorithm consists of several steps to preprocess the input stress data and apply different feature selection techniques. First, missing values in the data are imputed using hybrid imputation, and categorical variables are converted to numerical using categorical feature target encoding. The data is then normalized to ensure compatibility with machine learning algorithms. The algorithm applies three feature selection techniques in an ensemble approach, including filter-based, wrapper-based, and embedding-based methods. The filter-based feature selection technique uses information gain and ranker search to rank the features. The wrapper-based technique employs Naïve Bayes classifier and Greedy Stepwise search with ThreadPoolExecutor to search for the best feature subsets using a wrapper approach. Finally, the embedding-based technique uses Principal Component Analysis (PCA) to reduce the dimensionality of the data, and Ranker search to rank the PCA-derived features. The results of the three feature selection techniques are combined using a majority voting mechanism, and the top-k features are extracted from the combined results. The algorithm then evaluates the performance of the dataset with and without feature selection using a Random Forest classifier. Experimental results on stress data demonstrate that the proposed algorithm outperforms the existing system regarding the accuracy and computational efficiency. The algorithm effectively selects the most informative features from the input stress data, improving stress classification performance.

Downloads

Download data is not yet available.

References

Daniel, C. O. (2019). Effects of job stress on employee’s performance. International Journal of Business, Management and Social Research, 6(2), 375-382.

Khaire, U. M., & Dhanalakshmi, R. (2022). Stability of feature selection algorithm: A review. Journal of King Saud University-Computer and Information Sciences, 34(4), 1060-1073.

Asif, A., Majid, M., & Anwar, S. M. (2019). Human stress classification using EEG signals in response to music tracks. Computers in biology and medicine, 107, 182-196.

Hwangbo, H., Sharma, V., Arndt, C., & TerMaath, S. (2023). A Randomized Subspace-based Approach for Dimensionality Reduction and Important Variable Selection. Journal of Machine Learning Research, 24, 1-30.

Yang, P., Huang, H., & Liu, C. (2021). Feature selection revisited in the single-cell era. Genome Biology, 22, 1-17.

Alghowinem, S. M., Gedeon, T., Goecke, R., Cohn, J., & Parker, G. (2020). Interpretation of depression detection models via feature selection methods. IEEE Transactions on affective computing.

Lin, S. Stress Recognition Using LSTM-Based Neural Network Model with Feature Selection and Bimodal Distribution Removal.

Majid, M., Arsalan, A., & Anwar, S. M. (2022). A Multimodal Perceived Stress Classification Framework using Wearable Physiological Sensors. arXiv preprint arXiv:2206.10846.

Parsi, A., O’Callaghan, D., & Lemley, J. (2023). A Feature Selection Method for Driver Stress Detection Using Heart Rate Variability and Breathing Rate. arXiv preprint arXiv:2302.01602.

Reddy, U. S., Thota, A. V., & Dharun, A. (2018, December). Machine learning techniques for stress prediction in working employees. In 2018 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC) (pp. 1-4). IEEE.

Jaiswal, S., Song, S., & Valstar, M. (2019, September). Automatic prediction of depression and anxiety from behaviour and personality attributes. In 2019 8th international conference on affective computing and intelligent interaction (acii) (pp. 1-7). IEEE.

Rashid, B., & Calhoun, V. (2020). Towards a brain‐based predictome of mental illness. Human brain mapping, 41(12), 3468-3535.

Mousavian, M., Chen, J., & Greening, S. (2018). Feature selection and imbalanced data handling for depression detection. In Brain Informatics: International Conference, BI 2018, Arlington, TX, USA, December 7–9, 2018, Proceedings 11 (pp. 349-358). Springer International Publishing.

Tadesse, M. M., Lin, H., Xu, B., & Yang, L. (2019). Detection of depression-related posts in Reddit social media forum. IEEE Access, 7, 44883-44893.

Saeed, S. M. U., Anwar, S. M., Khalid, H., Majid, M., & Bagci, U. (2020). EEG-based classification of long-term stress using psychological labelling. Sensors, 20(7), 1886.

Downloads

Published

12.01.2024

How to Cite

Suryavanshi, P. M. ., & Vikhar , P. A. . (2024). A High-Level Ensemble Feature Selection Algorithm for Mitigating the Dimensionality in Stress Data. International Journal of Intelligent Systems and Applications in Engineering, 12(12s), 86–99. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/4493

Issue

Section

Research Article