Incremental Fuzzy Clustering Algorithm For Large Datasets

Authors

  • Ani Davis K., Raj Mathew

Keywords:

Incremental Fuzzy c-Means Double clustering, Fuzzy c-Means, Silhouette, Davies-Bouldin, Calinski Harabasz

Abstract

Clustering streaming data poses unique challenges that are different from traditional batch processing. Conventional clustering methods struggle with the high volume, speed, and diversity of data due to limitations in memory, computing power, and processing time. The challenges of clustering large datasets include limited storage capacity, the requirement to process the data in a single pass, and the concept drift of the data. A novel method, incremental fuzzy double clustering (IFDC) has been proposed to tackle these challenges. IFDC is an innovative version of Fuzzy c-Means (FCM) and incremental clustering. It divides the data into groups based on memory capacity and clusters them. Relevant samples from each group are selected using stratified sampling and k-Medoid methods and then transferred to the next group. This process carries the essence of the data from the beginning to the end. The newly reached dataset can be easily merged with the last block of data and clustered, instead of clustering the entire dataset as it arrives. The performance of IFDC was evaluated using Silhouette, Davies-Bouldin, and Calinski Harabasz Indexes, and the results demonstrate that IFDC outperforms traditional techniques such as FCM and k-means by successfully overcoming the challenges of clustering large data. The benefits of IFDC include improved efficiency and reduced clustering time. It efficiently manages large streaming datasets by continuously accommodating new data and utilizing sampling methods, thereby enhancing accuracy, reducing execution time, and eliminating the need for complete re-clustering.

Downloads

Download data is not yet available.

References

T. J. Ross, Fuzzy Logic with Engineering Applications. 2010.

J. C. Dunn, “A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters,” J. Cybern., vol. 3, no. 3, pp. 32–57, 1973, doi: 10.1080/01969727308546046.

J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms. 1981.

J. C. Bezdek, “FCM : THE FUZZY c-MEANS CLUSTERING ALGORITHM 1 ; yk E Y ~ l,” vol. 10, no. 2, pp. 191–203, 1984.

A. S. Bozkir and E. A. Sezer, “FUAT - A fuzzy clustering analysis tool,” Expert Syst. Appl., vol. 40, no. 3, pp. 842–849, 2013, doi: 10.1016/j.eswa.2012.05.038.

Agbonifo and O. Catherine, “Fuzzy C-Means Clustering Model for Identification of Students ’ Learning Preferences in Online Environment,” Int. J. Comput. Appl. Inf. Technol., vol. 4, no. I, pp. 15–21, 2013.

H. Izakian and A. Abraham, “Fuzzy C-means and fuzzy swarm for fuzzy clustering problem,” Expert Syst. Appl., vol. 38, no. 3, pp. 1835–1838, 2011, doi: 10.1016/j.eswa.2010.07.112.

T. M. Silva Filho, B. A. Pimentel, R. M. C. R. Souza, and A. L. I. Oliveira, “Hybrid methods for fuzzy clustering based on fuzzy c-means and improved particle swarm optimization,” Expert Syst. Appl., vol. 42, no. 17–18, pp. 6315–6328, 2015, doi: 10.1016/j.eswa.2015.04.032.

M. R. Mahmoudi, D. Baleanu, Z. Mansor, B. A. Tuan, and K. H. Pho, “Fuzzy clustering method to compare the spread rate of Covid-19 in the high risks countries,” Chaos, Solitons and Fractals, vol. 140, pp. 1–9, 2020, doi: 10.1016/j.chaos.2020.110230.

T. Lei, X. Jia, Y. Zhang, L. He, H. Meng, and A. K. Nandi, “Significantly Fast and Robust Fuzzy C-Means Clustering Algorithm Based on Morphological Reconstruction and Membership Filtering,” IEEE Trans. Fuzzy Syst., vol. XXX, no. XXX, pp. 1–15, 2018, doi: 10.1109/TFUZZ.2018.2796074.

T. Bonis and S. Oudot, “A fuzzy clustering algorithm for the mode-seeking framework,” Pattern Recognit. Lett., vol. 102, pp. 37–43, 2018, doi: 10.1016/j.patrec.2017.11.019

Z. Siqing, T. Yang, and Y. Feiyue, “ScienceDirect ScienceDirect Fuzzy Logic-Based Clustering Algorithm for Multi-hop Wireless Fuzzy Logic-Based Clustering Algorithm for Multi-hop Wireless Sensor Networks Sensor Networks,” Procedia Comput. Sci., vol. 131, pp. 1095–1103, 2018, doi: 10.1016/j.procs.2018.04.270.

O. M. Saad, A. Shalaby, L. Samy, and M. S. Sayed, “Automatic arrival time detection for earthquakes based on Modified Laplacian of Gaussian filter,” Comput. Geosci., vol. 113, pp. 43–53, 2018, doi: 10.1016/j.cageo.2018.01.013.

P. Karczmarek, A. Kiersztyn, W. Pedrycz, and D. Czerwiński, “Fuzzy C-Means-based Isolation Forest,” Appl. Soft Comput., vol. 106, p. 107354, 2021, doi: 10.1016/j.asoc.2021.107354.

M. Salah, “Filtering of remote sensing point clouds using fuzzy C-means clustering,” Appl. Geomatics, vol. 12, no. 3, pp. 307–321, 2020, doi: 10.1007/s12518-020-00299-3.

J. P. Mei, Y. Wang, L. Chen, and C. Miao, “Large Scale Document Categorization With Fuzzy Clustering,” IEEE Trans. Fuzzy Syst., vol. 25, no. 5, pp. 1239–1251, 2017, doi: 10.1109/TFUZZ.2016.2604009.

R. Jiao, S. Liu, W. Wen, and B. Lin, “Incremental kernel fuzzy c-means with optimizing cluster center initialization and delivery,” Kybernetes, vol. 45, no. 8, pp. 1273–1291, 2016, doi: 10.1108/K-08-2015-0209.

A. Ribert, A. Ennaji, Y. Lecourtier, P. S. I. F. Sciences, and U. De Rouen, “An Incremental Hierarchical Clustering,” Interface, no. May, pp. 19–21, 1999, [Online]. Available: http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:An+incremental+hierarchical+clustering#3.

R. J. Kuo, T. C. Lin, F. E. Zulvia, and C. Y. Tsai, “A hybrid metaheuristic and kernel intuitionistic fuzzy c-means algorithm for cluster analysis,” Appl. Soft Comput. J., vol. 67, pp. 299–308, 2018, doi: 10.1016/j.asoc.2018.02.039.

F. Can, “Incremental Clustering for Dynamic Information Processing,” ACM Trans. Inf. Syst., vol. 11, no. 2, pp. 143–164, 1993, doi: 10.1145/130226.134466.

W. Zhao, L. Li, S. Alam, and Y. Wang, “An incremental clustering method for anomaly detection in flight data,” Transp. Res. Part C Emerg. Technol., vol. 132, no. September 2019, p. 103406, 2021, doi: 10.1016/j.trc.2021.103406.

P. Hore, L. O. Hall, and D. B. Goldgof, “Single pass fuzzy c means,” IEEE Int. Conf. Fuzzy Syst., 2007, doi: 10.1109/FUZZY.2007.4295372.

P. Hore, L. O. Hall, D. B. Goldgof, and W. Cheng, “Online fuzzy C means,” Annu. Conf. North Am. Fuzzy Inf. Process. Soc. - NAFIPS, pp. 1–5, 2008, doi: 10.1109/NAFIPS.2008.4531233.

J. P. Mei, Y. Wang, L. Chen, and C. Miao, “Incremental fuzzy clustering for document categorization,” IEEE Int. Conf. Fuzzy Syst., pp. 1518–1525, 2014, doi: 10.1109/FUZZ-IEEE.2014.6891554.

M. Al-Ayyoub, S. M. Alzu’Bi, Y. Jararweh, and M. A. Alsmirat, “A GPU-based breast cancer detection system using Single Pass Fuzzy C-Means clustering algorithm,” Int. Conf. Multimed. Comput. Syst. -Proceedings, vol. 0, pp. 650–654, 2017, doi: 10.1109/ICMCS.2016.7905595.

Y. Li, Q. Wang, K. Ran, and L. Jiao, “Weighted Single-Pass Fuzzy c-Means Algorithm Based on Density Peaks,” IEEE Reg. 10 Annu. Int. Conf. Proceedings/TENCON, vol. 2018-Octob, no. October, pp. 2214–2217, 2019, doi: 10.1109/TENCON.2018.8650348.

M. D. Woodbright, M. A. Rahman, and M. Z. Islam, “A Novel Incremental Clustering Technique with Concept Drift Detection,” 2020, [Online]. Available: http://arxiv.org/abs/2003.13225.

S. Laohakiat and V. Sa-ing, “An incremental density-based clustering framework using fuzzy local clustering,” Inf. Sci. (Ny)., vol. 547, pp. 404–426, 2021, doi: 10.1016/j.ins.2020.08.052.

X. Wang and Y. Xu, “An improved index for clustering validation based on Silhouette index and Calinski-Harabasz index,” IOP Conf. Ser. Mater. Sci. Eng., vol. 569, no. 5, 2019, doi: 10.1088/1757-899X/569/5/052024.

P. Jha, A. Tiwari, N. Bharill, M. Ratnaparkhe, N. Nagendra, and M. Mounika, “Scalable incremental fuzzy consensus clustering algorithm for handling big data,” Soft Comput., vol. 25, no. 13, pp. 8703–8719, 2021, doi: 10.1007/s00500-021-05733-1.

A. D. Kochuveettil and R. Mathew, “A novel approach to fuzzy c-Means clustering using kernel function,” Intell. Decis. Technol., vol. 16, no. 4, pp. 643–651, 2022, doi: 10.3233/IDT-210091.

J. C. . R. E. ;WILLIAM F. Bezdek, “FCM: THE FUZZY c-MEANS CLUSTERING ALGORITHM,” FCM Fuzzy c-Means Clust. Algorithm, vol. 10, no. 2, pp. 191–203, 1984.

P. J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,” J. Comput. Appl. Math., vol. 20, no. C, pp. 53–65, 1987, doi: 10.1016/0377-0427(87)90125-7.

X. Gu, Q. Ni, and G. Tang, “A Novel Data-Driven Approach to Autonomous Fuzzy Clustering,” IEEE Trans. Fuzzy Syst., vol. 30, no. 6, pp. 2073–2085, 2022, doi: 10.1109/TFUZZ.2021.3074299.

D. L. Davies and D. W. Bouldin, “A Cluster Separation Measure,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-1, no. 2, pp. 224–227, 1979, doi: 10.1109/TPAMI.1979.4766909.

Downloads

Published

12.06.2024

How to Cite

Ani Davis K. (2024). Incremental Fuzzy Clustering Algorithm For Large Datasets. International Journal of Intelligent Systems and Applications in Engineering, 12(4), 2134 –. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/6549

Issue

Section

Research Article