A Comprehensive Study on Density Peak Clustering and its Variants
Keywords:
Density peak clustering (DPC), cut-off distance parameter, homogeneity, completeness, silhouette coefficientAbstract
Clustering is a technique used to group similar datapoints/samples. Similar group of datapoints can be formed by using distance measure or by density. Density peak clustering (DPC) groups datapoints based on the density. This paper shows variations and improvements of DPC and also the performance of DPC over other clustering algorithms. This paper also addresses the problem in DPC with random selection of cut-off distance parameter(dc). Local density of the datapoint is calculated based on dc. The improper selection of dc leads to wrong clustering results. The issue in the random choice of dc is addressed by using gini index or Gaussian function to make a valid guess on dc.. Here we have chosen homogeneity, completeness, silhouette coefficient as the three parameters to compare results of DPC, DPC with gini index, DPC with gaussian function.
Downloads
References
Xu, Rui, and Donald Wunsch. "Survey of clustering algorithms." IEEE Transactions on neural networks 16.3 (2005): 645-678.
Salloum, S.A., Al-Emran, M., Monem, A.A., Shaalan, K.: A Survey of text mining in social media: facebook and twitter perspectives. Adv. Sci. Technol. Eng. Syst. J. (2017)
https://devopedia.org/text-clustering
Vijayarani, S., Ms J. Ilamathi, and Ms Nithya. "Preprocessing techniques for text mining-an overview." International Journal of Computer Science & Communication Networks 5.1 (2015): 7-16
Salloum, Said A., et al. "Using text mining techniques for extracting information from research articles." Intelligent natural language processing: Trends and Applications. Springer, Cham, 2018. 373-397
Clifton, C., Cooley, R.: TopCat: Data mining for topic identification in a text corpus. In: European Conference on Principles of Data Mining and Knowledge Discovery, pp. 174–183. Springer, Heidelberg (1999)
K. P. Sinaga and M. Yang, "Unsupervised K-Means Clustering Algorithm," in IEEE Access, vol. 8, pp. 80716-80727, 2020, doi: 10.1109/ACCESS.2020.2988796
Patra, Bidyut Kr, Sukumar Nandi, and P. Viswanath. "A distance based clustering method for arbitrary shaped clusters in large datasets." Pattern Recognition 44.12 (2011): 2862-2870
Lu, Kaiyue, Siyu Xia, and Chao Xia. "Clustering based road detection method." 2015 34th Chinese Control Conference (CCC). IEEE, 2015
Zhang, Yuchi, et al. "A new algorithm for fast and accurate moving object detection based on motion segmentation by clustering." 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA). IEEE, 2017
Wang, Yi, et al. "Clustering of electricity consumption behavior dynamics toward big data applications." IEEE transactions on smart grid 7.5 (2016): 2437-2447
Wang, Baoyan, et al. "Density peaks clustering based integrate framework for multi-document summarization." CAAI Transactions on Intelligence Technology 2.1 (2017): 26-30.
Bai, Xueying, Peilin Yang, and Xiaohu Shi. "An overlapping community detection algorithm based on density peaks." Neurocomputing 226 (2017): 7-15
Zhou, Erzhong, Ning Zhong, and Yuefeng Li. "Hot topic detection in professional blogs." International Conference on Active Media Technology. Springer, Berlin, Heidelberg, 2011
Hokama, T., Kitagawa, H.: Detecting Hot Topics about a Person from Blogspace. In: Proc. of the Sixteenth European-Japaness Conference on Information Modeling and Knowledge Bases, pp. 290–294 (2006).
He, T.T., Qu, G.Z., Li, S.W., Tu, X.H., Zhong, Y., Ren, H.: Semi-automatic Hot Event Detection. In: Proc. of the Second International Conference on Advanced Data Mining and Applications, pp. 1008–1016 (2006).
Chen, K.Y., Luesukprasert, L., Chou, S.C.T.: Hot Topic Extraction Based on Timeline Analysis and Multidimensional Sentence Modeling. IEEE Transactions on Knowledge and Data Engineering 19(8), 1016–1025 (2007).
Zhou, Y.D., Sun, Q.D., Guan, X.H., Li, W., Tao, J.: Internet Popular Topics Extraction of Traffic Content Words Correlation. Journal of Xian Jiao Tong University 41(10), 1142–1145 (2007)
T. Yamanaka, Y. Tanaka, Y. Hijikata, and S. Nishida, “A Supporting System for Situation Assessment using Text Data with Spatio-temporal Information,” Journal of Japan Society for Fuzzy Theory and Intelligent Informatics, Vol. 22, No. 6. pp. 691–706, 2010. (in Japanese).
T. Sakaki, M. Okazaki, and Y. Matsuo, “Earthquake Shakes Twitter Users: Real-time Event Detection by Social Sensors,” Proc. the 19th International Conference on World Wide Web (WWW), pp. 851–860, 2010
Lu Y, Zhang P, Liu J, Li J, Deng S (2013) Health-Related Hot Topic Detection in Online Communities Using Text Clustering.
Yu, RuiGuo, et al. "Online hot topic detection from web news archive in short terms." 2014 11th International Conference on Fuzzy Systems And Knowledge Discovery (Fskd). IEEE, 2014
Liu, Peiyu, et al. "A text clustering algorithm based on find of density peaks." 2015 7th International Conference on Information Technology in Medicine and Education (ITME). IEEE, 2015
Rodriguez, A.; Laio, A. Clustering by fast search and find of density peaks. Science 2014, 344, 1492
Mehmood, R.; Zhang, G.; Bie, R.; Dawood, H.; Ahmad, H. Clustering by fast search and find of density peaks via heat diffusion. Neurocomput. 2016, 208, 210–217.
Wang, S.; Wang, D.; Li, C.; Li, Y.; Ding, G. Clustering by fast search and find of density peaks with data field. Chinese J. Electron. 2016, 25, 397–402.
Bai, L.; Cheng, X.; Liang, J.; Shen, H.; Guo, Y. Fast density clustering strategies based on the k-means algorithm. Pattern Recognit. 2017, 71, 375–386.
Mehmood, R.; El-Ashram, S.; Bie, R.; Dawood, H.; Kos, A. Clustering by fast search and merge of local density peaks for gene expression microarray data. Sci. Reports 2017, 7, 45602.
Liu, S.; Zhou, B.; Huang, D.; Shen, L. Clustering mixed data by fast search and find of density peaks. Math. Problems Eng. 2017, 2017, 7.
Li, Z.; Tang, Y. Comparative density peaks clustering. Expert Syst. Appl. 2018, 95, 236–247.
Du, M.; Ding, S.; Jia, H. Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowledge-Based Syst. 2016, 99, 135–145.
Yaohui, L.; Zhengming, M.; Fang, Y. Adaptive density peak clustering based on k-nearest neighbors with aggregating strategy. Knowledge-Based Syst. 2017, 133, 208–220.
Ding, S.; Du, M.; Sun, T.; Xu, X.; Xue, Y. An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood. Knowledge-Based Syst. 2017, 133, 294–313.
Yang, X.-H.; Zhu, Q.-P.; Huang, Y.-J.; Xiao, J.; Wang, L.; Tong, F.-C. Parameter-free laplacian centrality peaks clustering. Pattern Recognit. Letters 2017, 100, 167–173.
Cheng, S.; Duan, Y.; Fan, X.; Zhang, D.; Cheng, H. Review of Fast Density-Peaks Clustering and Its Application to Pediatric White Matter Tracts. Annual Conference on Medical Image Understanding and Analysis. Springer International Publishing: Cham, Switzerland, 2017; pp 436–447.
Yaohui, Liu, Ma Zhengming, and Yu Fang. "Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy." Knowledge-Based Systems 133 (2017): 208-220.
Xie, J. Y., Gao, H. C., Xie, W. X., Liu, X. H., Grant, P. W., Aug. 2016. Robust clustering by detecting density peaks and assigning points based on fuzzy weighted k-nearest neighbors. Information Sciences 354, 19–40.
Lin, Jun-Lin. "Accelerating Density Peak Clustering Algorithm." Symmetry 11.7 (2019): 859.
Wang, Zhechuan, and Yuping Wang. "A new density peak clustering algorithm for automatically determining clustering centers." 2020 International Workshop on Electronic Communication and Artificial Intelligence (IWECAI). IEEE, 2020.
Lv, Yi, Mandan Liu, and Yue Xiang. "Fast Searching Density Peak Clustering Algorithm Based on Shared Nearest Neighbor and Adaptive Clustering Center." Symmetry 12.12 (2020): 2014

Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.