A Modified Binary Gray Wolf Optimization for Feature Selection Using Elite Wolf in Unstructured Data Stream
Keywords:
cosine proximity, landmark window, session window, tumbling windowAbstract
Stream clustering poses challenges in feature selection due to data dynamics, variety, and a lack of labels in incoming data streams. While existing methods rely on labelled data, assuming structure in heterogeneous, unlabeled streams is unrealistic. To address this, we introduce a novel feature selection method, modified binary gray wolf optimization for stream feature selection (MBGWOSFS) using elite wolf, utilizing Evolutionary algorithm for unsupervised learning in streaming environments. Our novel feature selection method, aims to enhance clustering performance by selecting relevant features from unstructured data streams. Evaluation using internal metrics like Dunn Index, Davies-Bouldin Index, Calinski-Harabasz Index, and Silhouette Score, separation and compactness demonstrates that MBGWOSFS outperforms traditional methods by providing effective feature selection without relying on labelled data or predefined structures. With varying feature counts and high Dunn indices ranging from 57.846 to 72.7538, the method excels in cluster separation, reinforcing strong data similarity within clusters with Silhouette scores between 0.0324 and 0.047. Further, the well-balanced cluster quality, reflected in DB index and CH index values of 2.631 to 3.264 and 0.3688 to 0.43 respectively, showcases the adaptability and superior effectiveness of MBGWOSFS in text stream.
Downloads
References
U. Kokate, A. Deshpande, P. Mahalle, and P. Patil, “Data Stream Clustering Techniques, Applications, and Models: Comparative Analysis and Discussion,” BDCC, vol. 2, no. 4, p. 32, Oct. 2018, doi: 10.3390/bdcc2040032.
A. Zubaroğlu and V. Atalay, “Data stream clustering: a review,” Artif Intell Rev, vol. 54, no. 2, pp. 1201–1236, Feb. 2021, doi: 10.1007/s10462-020-09874-x.
S. R. Tiwari and K. K. Rana, “Challenges and Future Research Directions for Stream Clustering,” in 2023 7th International Conference on Trends in Electronics and Informatics (ICOEI), IEEE, 2023, pp. 525–531. Accessed: May 18, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/10125674/
C. Fahy and S. Yang, “Dynamic Feature Selection for Clustering High Dimensional Data Streams,” IEEE Access, vol. 7, pp. 127128–127140, 2019, doi: 10.1109/ACCESS.2019.2932308.
J. M. Yeoh, F. Caraffini, E. Homapour, V. Santucci, and A. Milani, “A Clustering System for Dynamic Data Streams Based on Metaheuristic Optimisation,” Mathematics, vol. 7, no. 12, p. 1229, Dec. 2019, doi: 10.3390/math7121229.
D. Zhao and Y. S. Koh, “Feature Drift Detection in Evolving Data Streams,” in Database and Expert Systems Applications, vol. 12392, S. Hartmann, J. Küng, G. Kotsis, A. M. Tjoa, and I. Khalil, Eds., in Lecture Notes in Computer Science, vol. 12392. , Cham: Springer International Publishing, 2020, pp. 335–349. doi: 10.1007/978-3-030-59051-2_23.
S. Harde and V. Sahare, “Design and implementation of ACO feature selection algorithm for data stream mining,” in 2016 International Conference on Automatic Control and Dynamic Optimization Techniques (ICACDOT), IEEE, 2016, pp. 1047–1051. Accessed: Apr. 26, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/7877746/
A. K. Abasi, A. T. Khader, M. A. Al-Betar, S. Naim, S. N. Makhadmeh, and Z. A. A. Alyasseri, “An Improved Text Feature Selection for Clustering Using Binary Grey Wolf Optimizer,” in Proceedings of the 11th National Technical Seminar on Unmanned System Technology 2019, vol. 666, Z. Md Zain, H. Ahmad, D. Pebrianti, M. Mustafa, N. R. H. Abdullah, R. Samad, and M. Mat Noh, Eds., in Lecture Notes in Electrical Engineering, vol. 666. , Singapore: Springer Nature Singapore, 2021, pp. 503–516. doi: 10.1007/978-981-15-5281-6_34.
E. Emary, H. M. Zawbaa, and A. E. Hassanien, “Binary grey wolf optimization approaches for feature selection,” Neurocomputing, vol. 172, pp. 371–381, 2016, Accessed: Apr. 26, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0925231215010504
D. Wang, Y. Ji, H. Wang, and M. Huang, “Binary grey wolf optimizer with a novel population adaptation strategy for feature selection,” IET Control Theory & Appl, vol. 17, no. 17, pp. 2313–2331, Nov. 2023, doi: 10.1049/cth2.12498.
R. Ahmadi, G. Ekbatanifard, and P. Bayat, “A Modified Grey Wolf Optimizer Based Data Clustering Algorithm,” Applied Artificial Intelligence, vol. 35, no. 1, pp. 63–79, Jan. 2021, doi: 10.1080/08839514.2020.1842109.
S. Fong, R. Wong, and A. V. Vasilakos, “Accelerated PSO Swarm Search Feature Selection for Data Stream Mining Big Data,” IEEE Trans. Serv. Comput., vol. 9, no. 1, pp. 33–45, Jan. 2016, doi: 10.1109/TSC.2015.2439695.
X. Hu, P. Zhou, P. Li, J. Wang, and X. Wu, “A survey on online feature selection with streaming features,” Front. Comput. Sci., vol. 12, no. 3, pp. 479–493, Jun. 2018, doi: 10.1007/s11704-016-5489-3.
S. Tabakhi, P. Moradi, and F. Akhlaghian, “An unsupervised feature selection algorithm based on ant colony optimization,” Engineering Applications of Artificial Intelligence, vol. 32, pp. 112–123, Jun. 2014, doi: 10.1016/j.engappai.2014.03.007.
H. Amazal and M. Kissi, “A New Big Data Feature Selection Approach for Text Classification,” Scientific Programming, vol. 2021, pp. 1–10, Apr. 2021, doi: 10.1155/2021/6645345.
Q. Al-Tashi, S. J. A. Kadir, H. M. Rais, S. Mirjalili, and H. Alhussian, “Binary optimization using hybrid grey wolf optimization for feature selection,” Ieee Access, vol. 7, pp. 39496–39508, 2019, Accessed: Apr. 26, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/8672550/
X. Yan, A. Homaifar, M. Sarkar, B. Lartey, and K. D. Gupta, “An online unsupervised streaming features selection through dynamic feature clustering,” IEEE Transactions on Artificial Intelligence, 2022, Accessed: Apr. 29, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/9851506/
N. Almusallam, Z. Tari, J. Chan, and A. AlHarthi, “UFSSF - An Efficient Unsupervised Feature Selection for Streaming Features,” in Advances in Knowledge Discovery and Data Mining, vol. 10938, D. Phung, V. S. Tseng, G. I. Webb, B. Ho, M. Ganji, and L. Rashidi, Eds., in Lecture Notes in Computer Science, vol. 10938. , Cham: Springer International Publishing, 2018, pp. 495–507. doi: 10.1007/978-3-319-93037-4_39.
S. Mansalis, E. Ntoutsi, N. Pelekis, and Y. Theodoridis, “An evaluation of data stream clustering algorithms,” Statistical Analysis, vol. 11, no. 4, pp. 167–187, Aug. 2018, doi: 10.1002/sam.11380.
C. Fahy, S. Yang, and M. Gongora, “Ant Colony Stream Clustering: A Fast Density Clustering Algorithm for Dynamic Data Streams,” IEEE Trans. Cybern., vol. 49, no. 6, pp. 2215–2228, Jun. 2019, doi: 10.1109/TCYB.2018.2822552.
“A Novel Feature selection using Binary Gray wolf optimization with featuer weights for unstructured stream clustering”.
H. Chantar, M. Mafarja, H. Alsawalqah, A. A. Heidari, I. Aljarah, and H. Faris, “Feature selection using binary grey wolf optimizer with elite-based crossover for Arabic text classification,” Neural Comput & Applic, vol. 32, no. 16, pp. 12201–12220, Aug. 2020, doi: 10.1007/s00521-019-04368-6.
R. Purushothaman, S. P. Rajagopalan, and G. Dhandapani, “Hybridizing Gray Wolf Optimization (GWO) with Grasshopper Optimization Algorithm (GOA) for text feature selection and clustering,” Applied Soft Computing, vol. 96, p. 106651, 2020, Accessed: Apr. 26, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1568494620305895
L. Zhang and X. Chen, “A Velocity-Guided Grey Wolf Optimization Algorithm With Adaptive Weights and Laplace Operators for Feature Selection in Data Classification,” IEEE Access, vol. 12, pp. 39887–39901, 2024, doi: 10.1109/ACCESS.2024.3376235.
I. M. El-Hasnony, S. I. Barakat, M. Elhoseny, and R. R. Mostafa, “Improved feature selection model for big data analytics,” IEEE Access, vol. 8, pp. 66989–67004, 2020, Accessed: Apr. 29, 2024. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/9058715/
S. Xu, L. Feng, S. Liu, and H. Qiao, “Self-adaption neighborhood density clustering method for mixed data stream with concept drift,” Engineering Applications of Artificial Intelligence, vol. 89, p. 103451, Mar. 2020, doi: 10.1016/j.engappai.2019.103451.
S. Mirjalili, S. M. Mirjalili, and A. Lewis, “Grey Wolf Optimizer,” Advances in Engineering Software, vol. 69, pp. 46–61, Mar. 2014, doi: 10.1016/j.advengsoft.2013.12.007.
H. Kremer et al., “An effective evaluation measure for clustering on evolving data streams,” in Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, San Diego California USA: ACM, Aug. 2011, pp. 868–876. doi: 10.1145/2020408.2020555.
L. E. Ekemeyong Awong and T. Zielinska, “Comparative Analysis of the Clustering Quality in Self-Organizing Maps for Human Posture Classification,” Sensors, vol. 23, no. 18, p. 7925, Sep. 2023, doi: 10.3390/s23187925.
M. Hassani and T. Seidl, “Using internal evaluation measures to validate the quality of diverse stream clustering algorithms,” Vietnam J Comput Sci, vol. 4, no. 3, pp. 171–183, Aug. 2017, doi: 10.1007/s40595-016-0086-9.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.