IFSFSPIS: Incremental Feature Selection and Feature Sensitive Progressive Instance Selection Models for BigData Analytics

Authors

  • Subhash Kamble Department of Computer Science and Engineering, University Visvesvaraya College of Engineering, Bangalore University, Bengaluru, Karnataka – 560001, India
  • Arunalatha J. S. Department of Computer Science and Engineering, University Visvesvaraya College of Engineering, Bangalore University, Bengaluru, Karnataka – 560001, India
  • Venugopal K. R. Department of Computer Science and Engineering, University Visvesvaraya College of Engineering, Bangalore University, Bengaluru, Karnataka – 560001, India

Keywords:

BigData Analytics, Feature Sensitive Progressive Instance Selection, Incremental Feature Selection, Select-k-Best

Abstract

In this paper, a highly robust and first-of-its-kind Incremental Feature Selection (IFS) framework is designed for BigData Analytics, which considers both feature reduction as well as feature sensitive instance selection as a viable solution towards BigData Analytics. Unlike classical threshold based feature selection methods, this work is designed for an IFS model encompassing Chi-Squared IFS with Feature Sensitive Progressive Instance Selection (FSPIS). This concept intends to meet Volume, Variety, Velocity, and Veracity aspects of the BigData simultaneously. FSPIS model executed K-Means clustering over the selected features and performed incremental stratified instance selection. The proposed FSPIS model initiated feature selection with minimum volume as 20% (and maximum as 80%), which was continuously updated by appending new (ranked) features and corresponding instances to achieve expected accuracy performance. To ensure generalizability of the solution, FSPIS model can be applied to an ensemble learning model encompassing Bagging, AdaBoost, k-NN, Random Forest and Extended-Tree Classifiers as foundational-classifiers to perform consensus-based classification. Simulation results over the different datasets confirmed that the proposed FSPIS model selects minimum features while retaining higher statistical performance (i.e., Accuracy), and minimum computational time than other state-of-art techniques.

Downloads

Download data is not yet available.

References

O. Duda, V. Kochan, N. Kunanets, O. Matsiuk, V. Pasichnyk, A. Sachenko, and T. Pytlenko, “Data Processing in IoT for Smart City Systems,” in proceedings of the IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems, Technology and Applications, vol. 1, pp. 96–99, 2019.

H. Chiroma, U. A. Abdullahi, A. A. Alarood, L. A. Gabralla, N. Rana, L. Shuib, I. A. T. Hashem, D. E. Gbenga, A. I. Abubakar, and A. M. Zeki, “Progress on Artificial Neural Networks for Big Data Analytics: A Survey,” IEEE Access, vol. 7, pp. 70 535–70 551, 2018.

Kamble S, Arunalatha JS, Venkataravana Nayak K, Venugopal K R, “Chi-Square Top-K Based Incremental Feature Selection Model for BigData Analytics,” in proceedings of the Emerging Trends and Technologies on Intelligent Systems, pp. 127-139, 2022.

X. Zhang, C. Mei, D. Chen, Y. Yang, and J. Li, “Active Incremental Feature Selection using a Fuzzy Rough Set Based Information Entropy,” IEEE Transactions on Fuzzy Systems, vol. 28, no. 5, pp. 901–915, 2019.

C. Wang, Y. Huang, M. Shao, Q. Hu, and D. Chen, “Feature Selection Based Neighborhood Self-Informations,” IEEE Transactions on Cybernetics, vol. 50, no. 9, pp. 4031–4042, 2019.

N. L. Giang, T. T. Ngan, T. M. Tuan, H. T. Phuong, M. Abdel Basset, A. R. L. de Macedo, and V. H. C. de Albuquerque, “Novel Incremental Algorithms for Attribute Reduction from Dynamic Decision Tables using Hybrid Filter-Wrapper with Fuzzy Partition Distance,” IEEE Transactions on Fuzzy Systems, vol. 28, no. 5, pp. 858–873, 2019.

L. Sun, L. Wang, W. Ding, Y. Qian, and J. Xu, “Neighborhood Multi-Granulation Rough Sets-Based Attribute Reduction using Lebesgue and Entropy Measures in Incomplete Neighbourhood Decision Systems,” Journal of Knowledge-Based Systems, vol. 192, p. 105373, 2020.

Z. A. Zhao and H. Liu, “Spectral Feature Selection for Data Mining,” Taylor & Francis, pp. 1–220, 2012.

F. Li, Z. Zhang, and C. Jin, “Feature Selection with Partition Differentiation Entropy for Large-Scale Data Sets,” Journal of Information Sciences, vol. 329, pp. 690–700, 2016.

J. Liu, Y. Lin, Y. Li, W. Weng, and S. Wu, “Online Multi-Label Streaming Feature Selection Based on Neighbourhood Rough Set,” Journal of Pattern Recognition, vol. 84, pp. 273–287, 2018.

H. Liu and L. Yu, “Toward Integrating Feature Selection Algorithms for Classification and Clustering,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 4, pp. 491–502, 2005.

J. Dai, H. Hu, W.-Z. Wu, Y. Qian, and D. Huang, “Maximal-Discernibility-Pair-Based Approach to Attribute Reduction in Fuzzy Rough Sets,” IEEE Transactions on Fuzzy Systems, vol. 26, no. 4, pp. 2174–2187, 2017.

C. Wang, Y. Qi, M. Shao, Q. Hu, D. Chen, Y. Qian, and Y. Lin, “A Fitting Model for Feature Selection with Fuzzy Rough Sets,” IEEE Transactions on Fuzzy Systems, vol. 25, no. 4, pp. 741–753, 2016.

X. Zhang, C. Mei, D. Chen, and Y. Yang, “A Fuzzy Rough Set-Based Feature Selection Method using Representative Instances,” Journal of Knowledge-Based Systems, vol. 151, pp. 216–229, 2018.

Z. Pawlak, “Rough sets,” International Journal of Computer & Information Sciences, vol. 11, no. 5, pp. 341–356, 1982.

F. Wang, J. Liang, and Y. Qian, “Attribute Reduction: A Dimension Incremental Strategy,” Journal of Knowledge-Based Systems, vol. 39, pp. 95–108, 2013.

W. Shu and H. Shen, “Updating Attribute Reduction in Incomplete Decision Systems with the Variation of Attribute Set,” International Journal of Approximate Reasoning, vol. 55, no. 3, pp. 867–884, 2014.

W. Qian, W. Shu, and C. Zhang, “Feature Selection from the Perspective of Knowledge Granulation in Dynamic Set-valued Information System,” Journal of Information Science and Engineering, vol. 32, no. 3, 2016.

Y. Jing, T. Li, J. Huang, and Y. Zhang, “An Incremental Attribute Reduction Approach Based on Knowledge Granularity Under the Attribute Generalization,” International Journal of Approximate Reasoning, vol. 76, pp. 80–95, 2016.

S. Eskandari and M. M. Javidi, “Online Streaming Feature Selection using Rough Sets,” International Journal of Approximate Reasoning, vol. 69, pp. 35–57, 2016.

M. M. Javidi and S. Eskandari, “Streamwise Feature Selection: A Rough Set Method,” International Journal of Machine Learning and Cybernetics, vol. 9, no. 4, pp. 667–676, 2018.

P. Zhou, X. Hu, P. Li, and X. Wu, “Online Feature Selection for High-Dimensional Class-Imbalanced Data,” Journal of Knowledge-Based Systems, vol. 136, pp. 187–199, 2017.

F. Wang, J. Liang, and C. Dang, “Attribute Reduction for Dynamic Data Sets,” Journal of Applied Soft Computing, vol. 13, no. 1, pp. 676–689, 2013.

W. Shu and H. Shen, “Incremental Feature Selection Based on Rough Set in Dynamic Incomplete Data,” Journal of Pattern Recognition, vol. 47, no. 12, pp. 3890–3906, 2014.

Z. T. Liu, “An Incremental Arithmetic for The Smallest Reduction of Attributes,” Journal of Acta Electronica Sinica, vol. 27, no. 11, pp. 96–98, 1999.

D. Chen, Y. Yang, and Z. Dong, “An Incremental Algorithm for Attribute Reduction with Variable Precision Rough Sets,” Journal of Applied Soft Computing, vol. 45, pp. 129–149, 2016.

K. Das, S. Sengupta, and S. Bhattacharyya, “A Group Incremental Feature Selection for Classification using Rough Set Theory Based Genetic Algorithm,” Applied Soft Computing, vol. 65, pp. 400–411, 2018.

J. Liang, F. Wang, C. Dang, and Y. Qian, “A Group Incremental Approach to Feature Selection Applying Rough Set Technique,” IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 2, pp. 294–308, 2012.

Zeng, T. Li, D. Liu, J. Zhang, and H. Chen, “A Fuzzy Rough Set Approach for Incremental Feature Selection on Hybrid Information Systems,” Journal of Fuzzy Sets and Systems, vol. 258, pp. 39–60, 2015.

Y. Yang, D. Chen, H. Wang, E. C. Tsang, and D. Zhang, “Fuzzy Rough Set Based Incremental Attribute Reduction from Dynamic Data with Sample Arriving,” Journal of Fuzzy Sets and Systems, vol. 312, pp. 66–86, 2017.

Y. Yang, D. Chen, H. Wang, and X. Wang, “Incremental Perspective for Feature Selection Based on Fuzzy Rough Sets,” IEEE Transactions on Fuzzy Systems, vol. 26, no. 3, pp. 1257–1273, 2017.

P. Zhou, X. Hu, P. Li, and X. Wu, “OFS-Density: A Novel Online Streaming Feature Selection Method,” Journal of Pattern Recognition, vol. 86, pp. 48–61, 2019.

H. Liu, H. Motoda, and L. Yu, “A Selective Sampling Approach to Active Feature Selection,” Journal of Artificial Intelligence, vol. 159, no. 1-2, pp. 49–74, 2004.

S. Bahassine, A. Madani, M. Al-Sarem, and M. Kissi, “Feature Selection using an Improved Chi-Square for Arabic Text Classification,” Journal of King Saud University-Computer and Information Sciences, vol. 32, no. 2, pp. 225–231, 2020.

T. Siswantining, D. Sarwinda, and A. Bustamam, “RFE and Chi-Square Based Feature Selection Approach for Detection of Diabetic Retinopathy,” in Proceedings of the International Joint Conference on Science and Engineering, pp. 380–386, 2020.

T. Parlar, S. A. O¨ zel, and F. Song, “QER: A New Feature Selection Method for Sentiment Analysis,” Journal of Human-centric Computing and Information Sciences, vol. 8, no. 1, pp. 1–19, 2018.

S. Parthasarathy, “Efficient Progressive Sampling for Association Rules,” in Proceedings of the IEEE International Conference on Data Mining, pp. 354–361, 2002.

K. T. Chuang, M. S. Chen, and W. C. Yang, “Progressive Sampling for Association Rules Based on Sampling Error Estimation,” in Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 505–515, 2005.

Y. Yang, P. Yin, Z. Luo, W. Gu, R. Chen, and Q. Wu, “Informative Feature Clustering and Selection for Gene Expression Data,” IEEE Access, vol. 7, pp. 169 174–169 184, 2019.

C. X. Zhang and J. S. Zhang, “RotBoost: A Technique for Combining Rotation Forest and AdaBoost,” Journal of Pattern Recognition Letters, vol. 29, no. 10, pp. 1524–1536, 2008.

M. El-Hasnony, S. I. Barakat, M. Elhoseny, and R. R. Mostafa, “Improved Feature Selection Model for Big Data Analytics,” IEEE Access, vol. 8, pp. 66 989–67 004, 2020.

L. Kong, W. Qu, J. Yu, H. Zuo, G. Chen, F. Xiong, S. Pan, S. Lin, and M. Qiu, “Distributed Feature Selection for Big Data using Fuzzy Rough Sets,” IEEE Transactions on Fuzzy Systems, vol. 28, no. 5, pp. 846–857, 2019.

Mr. Kaustubh Patil. (2013). Optimization of Classified Satellite Images using DWT and Fuzzy Logic. International Journal of New Practices in Management and Engineering, 2(02), 08 - 12. Retrieved from http://ijnpme.org/index.php/IJNPME/article/view/15

Lavanya, A. ., & Priya, N. S. . (2023). Enriched Model of Case Based Reasoning and Neutrosophic Intelligent System for DDoS Attack Defence in Software Defined Network based Cloud. International Journal on Recent and Innovation Trends in Computing and Communication, 11(4s), 141–148. https://doi.org/10.17762/ijritcc.v11i4s.6320

Downloads

Published

21.09.2023

How to Cite

Kamble , S. ., J. S., A. ., & K. R., V. . (2023). IFSFSPIS: Incremental Feature Selection and Feature Sensitive Progressive Instance Selection Models for BigData Analytics. International Journal of Intelligent Systems and Applications in Engineering, 11(4), 743–762. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/3609

Issue

Section

Research Article