A Modified Binary Gray Wolf Optimization for Feature Selection Using Elite Wolf in Unstructured Data Stream

Authors

  • Suman R. Tiwari, Kaushik K. Rana, Viral H. Borisagar

Keywords:

cosine proximity, landmark window, session window, tumbling window

Abstract

Stream clustering poses challenges in feature selection due to data dynamics, variety, and a lack of labels in incoming data streams. While existing methods rely on labelled data, assuming structure in heterogeneous, unlabeled streams is unrealistic. To address this, we introduce a novel feature selection method, modified binary gray wolf optimization for stream feature selection (MBGWOSFS) using elite wolf, utilizing Evolutionary algorithm for unsupervised learning in streaming environments. Our novel feature selection method, aims to enhance clustering performance by selecting relevant features from unstructured data streams. Evaluation using internal metrics like Dunn Index, Davies-Bouldin Index, Calinski-Harabasz Index, and Silhouette Score, separation and compactness demonstrates that MBGWOSFS outperforms traditional methods by providing effective feature selection without relying on labelled data or predefined structures. With varying feature counts and high Dunn indices ranging from 57.846 to 72.7538, the method excels in cluster separation, reinforcing strong data similarity within clusters with Silhouette scores between 0.0324 and 0.047. Further, the well-balanced cluster quality, reflected in DB index and CH index values of 2.631 to 3.264 and 0.3688 to 0.43 respectively, showcases the adaptability and superior effectiveness of MBGWOSFS in text stream.

Downloads

Download data is not yet available.

References

U. Kokate, A. Deshpande, P. Mahalle, and P. Patil, “Data Stream Clustering Techniques, Applications, and Models: Comparative Analysis and Discussion,” BDCC, vol. 2, no. 4, p. 32, Oct. 2018, doi: 10.3390/bdcc2040032.

A. Zubaroğlu and V. Atalay, “Data stream clustering: a review,” Artif Intell Rev, vol. 54, no. 2, pp. 1201–1236, Feb. 2021, doi: 10.1007/s10462-020-09874-x.

S. R. Tiwari and K. K. Rana, “Challenges and Future Research Directions for Stream Clustering,” in 2023 7th International Conference on Trends in Electronics and Informatics (ICOEI), IEEE, 2023, pp. 525–531. Accessed: May 18, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/10125674/

C. Fahy and S. Yang, “Dynamic Feature Selection for Clustering High Dimensional Data Streams,” IEEE Access, vol. 7, pp. 127128–127140, 2019, doi: 10.1109/ACCESS.2019.2932308.

J. M. Yeoh, F. Caraffini, E. Homapour, V. Santucci, and A. Milani, “A Clustering System for Dynamic Data Streams Based on Metaheuristic Optimisation,” Mathematics, vol. 7, no. 12, p. 1229, Dec. 2019, doi: 10.3390/math7121229.

D. Zhao and Y. S. Koh, “Feature Drift Detection in Evolving Data Streams,” in Database and Expert Systems Applications, vol. 12392, S. Hartmann, J. Küng, G. Kotsis, A. M. Tjoa, and I. Khalil, Eds., in Lecture Notes in Computer Science, vol. 12392. , Cham: Springer International Publishing, 2020, pp. 335–349. doi: 10.1007/978-3-030-59051-2_23.

S. Harde and V. Sahare, “Design and implementation of ACO feature selection algorithm for data stream mining,” in 2016 International Conference on Automatic Control and Dynamic Optimization Techniques (ICACDOT), IEEE, 2016, pp. 1047–1051. Accessed: Apr. 26, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/7877746/

A. K. Abasi, A. T. Khader, M. A. Al-Betar, S. Naim, S. N. Makhadmeh, and Z. A. A. Alyasseri, “An Improved Text Feature Selection for Clustering Using Binary Grey Wolf Optimizer,” in Proceedings of the 11th National Technical Seminar on Unmanned System Technology 2019, vol. 666, Z. Md Zain, H. Ahmad, D. Pebrianti, M. Mustafa, N. R. H. Abdullah, R. Samad, and M. Mat Noh, Eds., in Lecture Notes in Electrical Engineering, vol. 666. , Singapore: Springer Nature Singapore, 2021, pp. 503–516. doi: 10.1007/978-981-15-5281-6_34.

E. Emary, H. M. Zawbaa, and A. E. Hassanien, “Binary grey wolf optimization approaches for feature selection,” Neurocomputing, vol. 172, pp. 371–381, 2016, Accessed: Apr. 26, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0925231215010504

D. Wang, Y. Ji, H. Wang, and M. Huang, “Binary grey wolf optimizer with a novel population adaptation strategy for feature selection,” IET Control Theory & Appl, vol. 17, no. 17, pp. 2313–2331, Nov. 2023, doi: 10.1049/cth2.12498.

R. Ahmadi, G. Ekbatanifard, and P. Bayat, “A Modified Grey Wolf Optimizer Based Data Clustering Algorithm,” Applied Artificial Intelligence, vol. 35, no. 1, pp. 63–79, Jan. 2021, doi: 10.1080/08839514.2020.1842109.

S. Fong, R. Wong, and A. V. Vasilakos, “Accelerated PSO Swarm Search Feature Selection for Data Stream Mining Big Data,” IEEE Trans. Serv. Comput., vol. 9, no. 1, pp. 33–45, Jan. 2016, doi: 10.1109/TSC.2015.2439695.

X. Hu, P. Zhou, P. Li, J. Wang, and X. Wu, “A survey on online feature selection with streaming features,” Front. Comput. Sci., vol. 12, no. 3, pp. 479–493, Jun. 2018, doi: 10.1007/s11704-016-5489-3.

S. Tabakhi, P. Moradi, and F. Akhlaghian, “An unsupervised feature selection algorithm based on ant colony optimization,” Engineering Applications of Artificial Intelligence, vol. 32, pp. 112–123, Jun. 2014, doi: 10.1016/j.engappai.2014.03.007.

H. Amazal and M. Kissi, “A New Big Data Feature Selection Approach for Text Classification,” Scientific Programming, vol. 2021, pp. 1–10, Apr. 2021, doi: 10.1155/2021/6645345.

Q. Al-Tashi, S. J. A. Kadir, H. M. Rais, S. Mirjalili, and H. Alhussian, “Binary optimization using hybrid grey wolf optimization for feature selection,” Ieee Access, vol. 7, pp. 39496–39508, 2019, Accessed: Apr. 26, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/8672550/

X. Yan, A. Homaifar, M. Sarkar, B. Lartey, and K. D. Gupta, “An online unsupervised streaming features selection through dynamic feature clustering,” IEEE Transactions on Artificial Intelligence, 2022, Accessed: Apr. 29, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/9851506/

N. Almusallam, Z. Tari, J. Chan, and A. AlHarthi, “UFSSF - An Efficient Unsupervised Feature Selection for Streaming Features,” in Advances in Knowledge Discovery and Data Mining, vol. 10938, D. Phung, V. S. Tseng, G. I. Webb, B. Ho, M. Ganji, and L. Rashidi, Eds., in Lecture Notes in Computer Science, vol. 10938. , Cham: Springer International Publishing, 2018, pp. 495–507. doi: 10.1007/978-3-319-93037-4_39.

S. Mansalis, E. Ntoutsi, N. Pelekis, and Y. Theodoridis, “An evaluation of data stream clustering algorithms,” Statistical Analysis, vol. 11, no. 4, pp. 167–187, Aug. 2018, doi: 10.1002/sam.11380.

C. Fahy, S. Yang, and M. Gongora, “Ant Colony Stream Clustering: A Fast Density Clustering Algorithm for Dynamic Data Streams,” IEEE Trans. Cybern., vol. 49, no. 6, pp. 2215–2228, Jun. 2019, doi: 10.1109/TCYB.2018.2822552.

“A Novel Feature selection using Binary Gray wolf optimization with featuer weights for unstructured stream clustering”.

H. Chantar, M. Mafarja, H. Alsawalqah, A. A. Heidari, I. Aljarah, and H. Faris, “Feature selection using binary grey wolf optimizer with elite-based crossover for Arabic text classification,” Neural Comput & Applic, vol. 32, no. 16, pp. 12201–12220, Aug. 2020, doi: 10.1007/s00521-019-04368-6.

R. Purushothaman, S. P. Rajagopalan, and G. Dhandapani, “Hybridizing Gray Wolf Optimization (GWO) with Grasshopper Optimization Algorithm (GOA) for text feature selection and clustering,” Applied Soft Computing, vol. 96, p. 106651, 2020, Accessed: Apr. 26, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1568494620305895

L. Zhang and X. Chen, “A Velocity-Guided Grey Wolf Optimization Algorithm With Adaptive Weights and Laplace Operators for Feature Selection in Data Classification,” IEEE Access, vol. 12, pp. 39887–39901, 2024, doi: 10.1109/ACCESS.2024.3376235.

I. M. El-Hasnony, S. I. Barakat, M. Elhoseny, and R. R. Mostafa, “Improved feature selection model for big data analytics,” IEEE Access, vol. 8, pp. 66989–67004, 2020, Accessed: Apr. 29, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/9058715/

S. Xu, L. Feng, S. Liu, and H. Qiao, “Self-adaption neighborhood density clustering method for mixed data stream with concept drift,” Engineering Applications of Artificial Intelligence, vol. 89, p. 103451, Mar. 2020, doi: 10.1016/j.engappai.2019.103451.

S. Mirjalili, S. M. Mirjalili, and A. Lewis, “Grey Wolf Optimizer,” Advances in Engineering Software, vol. 69, pp. 46–61, Mar. 2014, doi: 10.1016/j.advengsoft.2013.12.007.

H. Kremer et al., “An effective evaluation measure for clustering on evolving data streams,” in Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, San Diego California USA: ACM, Aug. 2011, pp. 868–876. doi: 10.1145/2020408.2020555.

L. E. Ekemeyong Awong and T. Zielinska, “Comparative Analysis of the Clustering Quality in Self-Organizing Maps for Human Posture Classification,” Sensors, vol. 23, no. 18, p. 7925, Sep. 2023, doi: 10.3390/s23187925.

M. Hassani and T. Seidl, “Using internal evaluation measures to validate the quality of diverse stream clustering algorithms,” Vietnam J Comput Sci, vol. 4, no. 3, pp. 171–183, Aug. 2017, doi: 10.1007/s40595-016-0086-9.

Downloads

Published

05.06.2024

How to Cite

Suman R. Tiwari. (2024). A Modified Binary Gray Wolf Optimization for Feature Selection Using Elite Wolf in Unstructured Data Stream. International Journal of Intelligent Systems and Applications in Engineering, 12(3), 4319–4330. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/6147

Issue

Section

Research Article