IFSFSPIS: Incremental Feature Selection and Feature Sensitive Progressive Instance Selection Models for BigData Analytics
Keywords:
BigData Analytics, Feature Sensitive Progressive Instance Selection, Incremental Feature Selection, Select-k-BestAbstract
In this paper, a highly robust and first-of-its-kind Incremental Feature Selection (IFS) framework is designed for BigData Analytics, which considers both feature reduction as well as feature sensitive instance selection as a viable solution towards BigData Analytics. Unlike classical threshold based feature selection methods, this work is designed for an IFS model encompassing Chi-Squared IFS with Feature Sensitive Progressive Instance Selection (FSPIS). This concept intends to meet Volume, Variety, Velocity, and Veracity aspects of the BigData simultaneously. FSPIS model executed K-Means clustering over the selected features and performed incremental stratified instance selection. The proposed FSPIS model initiated feature selection with minimum volume as 20% (and maximum as 80%), which was continuously updated by appending new (ranked) features and corresponding instances to achieve expected accuracy performance. To ensure generalizability of the solution, FSPIS model can be applied to an ensemble learning model encompassing Bagging, AdaBoost, k-NN, Random Forest and Extended-Tree Classifiers as foundational-classifiers to perform consensus-based classification. Simulation results over the different datasets confirmed that the proposed FSPIS model selects minimum features while retaining higher statistical performance (i.e., Accuracy), and minimum computational time than other state-of-art techniques.
Downloads
References
O. Duda, V. Kochan, N. Kunanets, O. Matsiuk, V. Pasichnyk, A. Sachenko, and T. Pytlenko, “Data Processing in IoT for Smart City Systems,” in proceedings of the IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems, Technology and Applications, vol. 1, pp. 96–99, 2019.
H. Chiroma, U. A. Abdullahi, A. A. Alarood, L. A. Gabralla, N. Rana, L. Shuib, I. A. T. Hashem, D. E. Gbenga, A. I. Abubakar, and A. M. Zeki, “Progress on Artificial Neural Networks for Big Data Analytics: A Survey,” IEEE Access, vol. 7, pp. 70 535–70 551, 2018.
Kamble S, Arunalatha JS, Venkataravana Nayak K, Venugopal K R, “Chi-Square Top-K Based Incremental Feature Selection Model for BigData Analytics,” in proceedings of the Emerging Trends and Technologies on Intelligent Systems, pp. 127-139, 2022.
X. Zhang, C. Mei, D. Chen, Y. Yang, and J. Li, “Active Incremental Feature Selection using a Fuzzy Rough Set Based Information Entropy,” IEEE Transactions on Fuzzy Systems, vol. 28, no. 5, pp. 901–915, 2019.
C. Wang, Y. Huang, M. Shao, Q. Hu, and D. Chen, “Feature Selection Based Neighborhood Self-Informations,” IEEE Transactions on Cybernetics, vol. 50, no. 9, pp. 4031–4042, 2019.
N. L. Giang, T. T. Ngan, T. M. Tuan, H. T. Phuong, M. Abdel Basset, A. R. L. de Macedo, and V. H. C. de Albuquerque, “Novel Incremental Algorithms for Attribute Reduction from Dynamic Decision Tables using Hybrid Filter-Wrapper with Fuzzy Partition Distance,” IEEE Transactions on Fuzzy Systems, vol. 28, no. 5, pp. 858–873, 2019.
L. Sun, L. Wang, W. Ding, Y. Qian, and J. Xu, “Neighborhood Multi-Granulation Rough Sets-Based Attribute Reduction using Lebesgue and Entropy Measures in Incomplete Neighbourhood Decision Systems,” Journal of Knowledge-Based Systems, vol. 192, p. 105373, 2020.
Z. A. Zhao and H. Liu, “Spectral Feature Selection for Data Mining,” Taylor & Francis, pp. 1–220, 2012.
F. Li, Z. Zhang, and C. Jin, “Feature Selection with Partition Differentiation Entropy for Large-Scale Data Sets,” Journal of Information Sciences, vol. 329, pp. 690–700, 2016.
J. Liu, Y. Lin, Y. Li, W. Weng, and S. Wu, “Online Multi-Label Streaming Feature Selection Based on Neighbourhood Rough Set,” Journal of Pattern Recognition, vol. 84, pp. 273–287, 2018.
H. Liu and L. Yu, “Toward Integrating Feature Selection Algorithms for Classification and Clustering,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 4, pp. 491–502, 2005.
J. Dai, H. Hu, W.-Z. Wu, Y. Qian, and D. Huang, “Maximal-Discernibility-Pair-Based Approach to Attribute Reduction in Fuzzy Rough Sets,” IEEE Transactions on Fuzzy Systems, vol. 26, no. 4, pp. 2174–2187, 2017.
C. Wang, Y. Qi, M. Shao, Q. Hu, D. Chen, Y. Qian, and Y. Lin, “A Fitting Model for Feature Selection with Fuzzy Rough Sets,” IEEE Transactions on Fuzzy Systems, vol. 25, no. 4, pp. 741–753, 2016.
X. Zhang, C. Mei, D. Chen, and Y. Yang, “A Fuzzy Rough Set-Based Feature Selection Method using Representative Instances,” Journal of Knowledge-Based Systems, vol. 151, pp. 216–229, 2018.
Z. Pawlak, “Rough sets,” International Journal of Computer & Information Sciences, vol. 11, no. 5, pp. 341–356, 1982.
F. Wang, J. Liang, and Y. Qian, “Attribute Reduction: A Dimension Incremental Strategy,” Journal of Knowledge-Based Systems, vol. 39, pp. 95–108, 2013.
W. Shu and H. Shen, “Updating Attribute Reduction in Incomplete Decision Systems with the Variation of Attribute Set,” International Journal of Approximate Reasoning, vol. 55, no. 3, pp. 867–884, 2014.
W. Qian, W. Shu, and C. Zhang, “Feature Selection from the Perspective of Knowledge Granulation in Dynamic Set-valued Information System,” Journal of Information Science and Engineering, vol. 32, no. 3, 2016.
Y. Jing, T. Li, J. Huang, and Y. Zhang, “An Incremental Attribute Reduction Approach Based on Knowledge Granularity Under the Attribute Generalization,” International Journal of Approximate Reasoning, vol. 76, pp. 80–95, 2016.
S. Eskandari and M. M. Javidi, “Online Streaming Feature Selection using Rough Sets,” International Journal of Approximate Reasoning, vol. 69, pp. 35–57, 2016.
M. M. Javidi and S. Eskandari, “Streamwise Feature Selection: A Rough Set Method,” International Journal of Machine Learning and Cybernetics, vol. 9, no. 4, pp. 667–676, 2018.
P. Zhou, X. Hu, P. Li, and X. Wu, “Online Feature Selection for High-Dimensional Class-Imbalanced Data,” Journal of Knowledge-Based Systems, vol. 136, pp. 187–199, 2017.
F. Wang, J. Liang, and C. Dang, “Attribute Reduction for Dynamic Data Sets,” Journal of Applied Soft Computing, vol. 13, no. 1, pp. 676–689, 2013.
W. Shu and H. Shen, “Incremental Feature Selection Based on Rough Set in Dynamic Incomplete Data,” Journal of Pattern Recognition, vol. 47, no. 12, pp. 3890–3906, 2014.
Z. T. Liu, “An Incremental Arithmetic for The Smallest Reduction of Attributes,” Journal of Acta Electronica Sinica, vol. 27, no. 11, pp. 96–98, 1999.
D. Chen, Y. Yang, and Z. Dong, “An Incremental Algorithm for Attribute Reduction with Variable Precision Rough Sets,” Journal of Applied Soft Computing, vol. 45, pp. 129–149, 2016.
K. Das, S. Sengupta, and S. Bhattacharyya, “A Group Incremental Feature Selection for Classification using Rough Set Theory Based Genetic Algorithm,” Applied Soft Computing, vol. 65, pp. 400–411, 2018.
J. Liang, F. Wang, C. Dang, and Y. Qian, “A Group Incremental Approach to Feature Selection Applying Rough Set Technique,” IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 2, pp. 294–308, 2012.
Zeng, T. Li, D. Liu, J. Zhang, and H. Chen, “A Fuzzy Rough Set Approach for Incremental Feature Selection on Hybrid Information Systems,” Journal of Fuzzy Sets and Systems, vol. 258, pp. 39–60, 2015.
Y. Yang, D. Chen, H. Wang, E. C. Tsang, and D. Zhang, “Fuzzy Rough Set Based Incremental Attribute Reduction from Dynamic Data with Sample Arriving,” Journal of Fuzzy Sets and Systems, vol. 312, pp. 66–86, 2017.
Y. Yang, D. Chen, H. Wang, and X. Wang, “Incremental Perspective for Feature Selection Based on Fuzzy Rough Sets,” IEEE Transactions on Fuzzy Systems, vol. 26, no. 3, pp. 1257–1273, 2017.
P. Zhou, X. Hu, P. Li, and X. Wu, “OFS-Density: A Novel Online Streaming Feature Selection Method,” Journal of Pattern Recognition, vol. 86, pp. 48–61, 2019.
H. Liu, H. Motoda, and L. Yu, “A Selective Sampling Approach to Active Feature Selection,” Journal of Artificial Intelligence, vol. 159, no. 1-2, pp. 49–74, 2004.
S. Bahassine, A. Madani, M. Al-Sarem, and M. Kissi, “Feature Selection using an Improved Chi-Square for Arabic Text Classification,” Journal of King Saud University-Computer and Information Sciences, vol. 32, no. 2, pp. 225–231, 2020.
T. Siswantining, D. Sarwinda, and A. Bustamam, “RFE and Chi-Square Based Feature Selection Approach for Detection of Diabetic Retinopathy,” in Proceedings of the International Joint Conference on Science and Engineering, pp. 380–386, 2020.
T. Parlar, S. A. O¨ zel, and F. Song, “QER: A New Feature Selection Method for Sentiment Analysis,” Journal of Human-centric Computing and Information Sciences, vol. 8, no. 1, pp. 1–19, 2018.
S. Parthasarathy, “Efficient Progressive Sampling for Association Rules,” in Proceedings of the IEEE International Conference on Data Mining, pp. 354–361, 2002.
K. T. Chuang, M. S. Chen, and W. C. Yang, “Progressive Sampling for Association Rules Based on Sampling Error Estimation,” in Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 505–515, 2005.
Y. Yang, P. Yin, Z. Luo, W. Gu, R. Chen, and Q. Wu, “Informative Feature Clustering and Selection for Gene Expression Data,” IEEE Access, vol. 7, pp. 169 174–169 184, 2019.
C. X. Zhang and J. S. Zhang, “RotBoost: A Technique for Combining Rotation Forest and AdaBoost,” Journal of Pattern Recognition Letters, vol. 29, no. 10, pp. 1524–1536, 2008.
M. El-Hasnony, S. I. Barakat, M. Elhoseny, and R. R. Mostafa, “Improved Feature Selection Model for Big Data Analytics,” IEEE Access, vol. 8, pp. 66 989–67 004, 2020.
L. Kong, W. Qu, J. Yu, H. Zuo, G. Chen, F. Xiong, S. Pan, S. Lin, and M. Qiu, “Distributed Feature Selection for Big Data using Fuzzy Rough Sets,” IEEE Transactions on Fuzzy Systems, vol. 28, no. 5, pp. 846–857, 2019.
Mr. Kaustubh Patil. (2013). Optimization of Classified Satellite Images using DWT and Fuzzy Logic. International Journal of New Practices in Management and Engineering, 2(02), 08 - 12. Retrieved from http://ijnpme.org/index.php/IJNPME/article/view/15
Lavanya, A. ., & Priya, N. S. . (2023). Enriched Model of Case Based Reasoning and Neutrosophic Intelligent System for DDoS Attack Defence in Software Defined Network based Cloud. International Journal on Recent and Innovation Trends in Computing and Communication, 11(4s), 141–148. https://doi.org/10.17762/ijritcc.v11i4s.6320
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.