A Comprehensive Study on Density Peak Clustering and its Variants

Authors

  • Sarvani Anandarao School of Computer Science and Engineering, Vellore Institute of Technology (VIT), Chennai 600127, Tamil Nadu, India
  • Sweetlin Hemalatha Chellasamy School of Computer Science and Engineering, Vellore Institute of Technology (VIT), Chennai 600127, Tamil Nadu, India

Keywords:

Density peak clustering (DPC), cut-off distance parameter, homogeneity, completeness, silhouette coefficient

Abstract

Clustering is a technique used to group similar datapoints/samples. Similar group of datapoints can be formed by using distance measure or by density. Density peak clustering (DPC) groups datapoints based on the density. This paper shows variations and improvements of DPC and also the performance of DPC over other clustering algorithms. This paper also addresses the problem in DPC with random selection of cut-off distance parameter(dc). Local density of the datapoint is calculated based on dc. The improper selection of dc leads to wrong clustering results. The issue in the random choice of dc is addressed by using gini index or Gaussian function to make a valid guess on dc.. Here we have chosen homogeneity, completeness, silhouette coefficient as the three parameters to compare results of DPC, DPC with gini index, DPC with gaussian function.

Downloads

Download data is not yet available.

References

Xu, Rui, and Donald Wunsch. "Survey of clustering algorithms." IEEE Transactions on neural networks 16.3 (2005): 645-678.

Salloum, S.A., Al-Emran, M., Monem, A.A., Shaalan, K.: A Survey of text mining in social media: facebook and twitter perspectives. Adv. Sci. Technol. Eng. Syst. J. (2017)

https://devopedia.org/text-clustering

Vijayarani, S., Ms J. Ilamathi, and Ms Nithya. "Preprocessing techniques for text mining-an overview." International Journal of Computer Science & Communication Networks 5.1 (2015): 7-16

Salloum, Said A., et al. "Using text mining techniques for extracting information from research articles." Intelligent natural language processing: Trends and Applications. Springer, Cham, 2018. 373-397

Clifton, C., Cooley, R.: TopCat: Data mining for topic identification in a text corpus. In: European Conference on Principles of Data Mining and Knowledge Discovery, pp. 174–183. Springer, Heidelberg (1999)

K. P. Sinaga and M. Yang, "Unsupervised K-Means Clustering Algorithm," in IEEE Access, vol. 8, pp. 80716-80727, 2020, doi: 10.1109/ACCESS.2020.2988796

Patra, Bidyut Kr, Sukumar Nandi, and P. Viswanath. "A distance based clustering method for arbitrary shaped clusters in large datasets." Pattern Recognition 44.12 (2011): 2862-2870

Lu, Kaiyue, Siyu Xia, and Chao Xia. "Clustering based road detection method." 2015 34th Chinese Control Conference (CCC). IEEE, 2015

Zhang, Yuchi, et al. "A new algorithm for fast and accurate moving object detection based on motion segmentation by clustering." 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA). IEEE, 2017

Wang, Yi, et al. "Clustering of electricity consumption behavior dynamics toward big data applications." IEEE transactions on smart grid 7.5 (2016): 2437-2447

Wang, Baoyan, et al. "Density peaks clustering based integrate framework for multi-document summarization." CAAI Transactions on Intelligence Technology 2.1 (2017): 26-30.

Bai, Xueying, Peilin Yang, and Xiaohu Shi. "An overlapping community detection algorithm based on density peaks." Neurocomputing 226 (2017): 7-15

Zhou, Erzhong, Ning Zhong, and Yuefeng Li. "Hot topic detection in professional blogs." International Conference on Active Media Technology. Springer, Berlin, Heidelberg, 2011

Hokama, T., Kitagawa, H.: Detecting Hot Topics about a Person from Blogspace. In: Proc. of the Sixteenth European-Japaness Conference on Information Modeling and Knowledge Bases, pp. 290–294 (2006).

He, T.T., Qu, G.Z., Li, S.W., Tu, X.H., Zhong, Y., Ren, H.: Semi-automatic Hot Event Detection. In: Proc. of the Second International Conference on Advanced Data Mining and Applications, pp. 1008–1016 (2006).

Chen, K.Y., Luesukprasert, L., Chou, S.C.T.: Hot Topic Extraction Based on Timeline Analysis and Multidimensional Sentence Modeling. IEEE Transactions on Knowledge and Data Engineering 19(8), 1016–1025 (2007).

Zhou, Y.D., Sun, Q.D., Guan, X.H., Li, W., Tao, J.: Internet Popular Topics Extraction of Traffic Content Words Correlation. Journal of Xian Jiao Tong University 41(10), 1142–1145 (2007)

T. Yamanaka, Y. Tanaka, Y. Hijikata, and S. Nishida, “A Supporting System for Situation Assessment using Text Data with Spatio-temporal Information,” Journal of Japan Society for Fuzzy Theory and Intelligent Informatics, Vol. 22, No. 6. pp. 691–706, 2010. (in Japanese).

T. Sakaki, M. Okazaki, and Y. Matsuo, “Earthquake Shakes Twitter Users: Real-time Event Detection by Social Sensors,” Proc. the 19th International Conference on World Wide Web (WWW), pp. 851–860, 2010

Lu Y, Zhang P, Liu J, Li J, Deng S (2013) Health-Related Hot Topic Detection in Online Communities Using Text Clustering.

Yu, RuiGuo, et al. "Online hot topic detection from web news archive in short terms." 2014 11th International Conference on Fuzzy Systems And Knowledge Discovery (Fskd). IEEE, 2014

Liu, Peiyu, et al. "A text clustering algorithm based on find of density peaks." 2015 7th International Conference on Information Technology in Medicine and Education (ITME). IEEE, 2015

Rodriguez, A.; Laio, A. Clustering by fast search and find of density peaks. Science 2014, 344, 1492

Mehmood, R.; Zhang, G.; Bie, R.; Dawood, H.; Ahmad, H. Clustering by fast search and find of density peaks via heat diffusion. Neurocomput. 2016, 208, 210–217.

Wang, S.; Wang, D.; Li, C.; Li, Y.; Ding, G. Clustering by fast search and find of density peaks with data field. Chinese J. Electron. 2016, 25, 397–402.

Bai, L.; Cheng, X.; Liang, J.; Shen, H.; Guo, Y. Fast density clustering strategies based on the k-means algorithm. Pattern Recognit. 2017, 71, 375–386.

Mehmood, R.; El-Ashram, S.; Bie, R.; Dawood, H.; Kos, A. Clustering by fast search and merge of local density peaks for gene expression microarray data. Sci. Reports 2017, 7, 45602.

Liu, S.; Zhou, B.; Huang, D.; Shen, L. Clustering mixed data by fast search and find of density peaks. Math. Problems Eng. 2017, 2017, 7.

Li, Z.; Tang, Y. Comparative density peaks clustering. Expert Syst. Appl. 2018, 95, 236–247.

Du, M.; Ding, S.; Jia, H. Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowledge-Based Syst. 2016, 99, 135–145.

Yaohui, L.; Zhengming, M.; Fang, Y. Adaptive density peak clustering based on k-nearest neighbors with aggregating strategy. Knowledge-Based Syst. 2017, 133, 208–220.

Ding, S.; Du, M.; Sun, T.; Xu, X.; Xue, Y. An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood. Knowledge-Based Syst. 2017, 133, 294–313.

Yang, X.-H.; Zhu, Q.-P.; Huang, Y.-J.; Xiao, J.; Wang, L.; Tong, F.-C. Parameter-free laplacian centrality peaks clustering. Pattern Recognit. Letters 2017, 100, 167–173.

Cheng, S.; Duan, Y.; Fan, X.; Zhang, D.; Cheng, H. Review of Fast Density-Peaks Clustering and Its Application to Pediatric White Matter Tracts. Annual Conference on Medical Image Understanding and Analysis. Springer International Publishing: Cham, Switzerland, 2017; pp 436–447.

Yaohui, Liu, Ma Zhengming, and Yu Fang. "Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy." Knowledge-Based Systems 133 (2017): 208-220.

Xie, J. Y., Gao, H. C., Xie, W. X., Liu, X. H., Grant, P. W., Aug. 2016. Robust clustering by detecting density peaks and assigning points based on fuzzy weighted k-nearest neighbors. Information Sciences 354, 19–40.

Lin, Jun-Lin. "Accelerating Density Peak Clustering Algorithm." Symmetry 11.7 (2019): 859.

Wang, Zhechuan, and Yuping Wang. "A new density peak clustering algorithm for automatically determining clustering centers." 2020 International Workshop on Electronic Communication and Artificial Intelligence (IWECAI). IEEE, 2020.

Lv, Yi, Mandan Liu, and Yue Xiang. "Fast Searching Density Peak Clustering Algorithm Based on Shared Nearest Neighbor and Adaptive Clustering Center." Symmetry 12.12 (2020): 2014

Clustering results of K-means

Downloads

Published

17.02.2023

How to Cite

Anandarao, S. ., & Hemalatha Chellasamy, S. . (2023). A Comprehensive Study on Density Peak Clustering and its Variants . International Journal of Intelligent Systems and Applications in Engineering, 11(2), 216–224. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/2613

Issue

Section

Research Article