A Survey on Clustering Algorithms and their Constraints

Authors

  • Maradana Durga Venkata Prasad Research Scholar, Department of Computer Science and Engineering, Gandhi Institute of Technology and Management (GITAM), Visakhapatnam, Andhra Pradesh, India.
  • Srikanth T. Associate Professor, Department of Computer Science and Engineering, Gandhi Institute of Technology and Management (GITAM), Visakhapatnam, Andhra Pradesh, India.

Keywords:

Clustering, Constraints, similarity functions, Clustering Stages, Supervised Learning, Unsupervised Learning, and Clustering Algorithms, High-dimensional data.

Abstract

In the current era different techniques were used for the retrieval of information from the data sources like data base and from the files. The popular technique used for the information retrieval is clustering. This paper concentrates more on the contraints which are used on the data sets do the clustering to cluster the data. In the overview of this paper we are learning about different clustering algorithms (Hierarchical , Partitioning, Grid Based, Model Based e.t.c) with their constraints.

Downloads

Download data is not yet available.

References

Xu, Haoxiang. "Research on clustering algorithms in data mining." 2022 3rd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE). IEEE, 2022.

Lee, Richard CT. "Clustering analysis and its applications." Advances in Information Systems Science: Volume 8 (1981): 169-292.

G K, G. Kesavaraj & Sukumaran, Surya, “A study on classification techniques in data mining”,2013 4th International Conference on Computing, Communications and Networking Technologies, ICCCNT 2013. 1-7. 10.1109 / ICCCNT.2013.6726842.

P. Tamilselvi and K. A. Kumar, "Unsupervised machine learning for clustering the infected leaves based on the leaf-colours," 2017 Third International Conference on Science Technology Engineering & Management (ICONSTEM), Chennai, 2017, pp. 106-110. doi: 10.1109/ICONSTEM.2017.8261265.

C. Lin and F. Yan, "The Study on Classification and Prediction for Data Mining," 2015 Seventh International Conference on Measuring Technology and MechatronicsAutomation,Nanchang,2015,pp.1305-1309.doi: 10.1109/ICMTMA.2015.318

Rizan, Okkita, and Rahmat Sulaiman. "Data Mining Using Apriori Algorithm and Linear Regression in Product Recommendations." 2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI). IEEE, 2021.

S.Umadevi and K.S.J.Marseline,"A survey on data mining classification algorithms," 2017 International Conference on Signal Processing and Communication (ICSPC), Coimbatore, 2017, pp.264-268.doi: 10.1109/CSPC.2017.8305851.

Larose, Daniel T. "Regression Modeling." (2006): 33-92.

T Johannes Petrus, Ermatita and Sukemi, "Soft and Hard Clustering for Abstract Scientific Paper in Indonesian, 978-1-7281-2930-3/19/$31.00 ©2019 IEEE.

Christina, J., and K. Komathy. "Analysis of hard clustering algorithms applicable to regionalization." 2013 IEEE conference on information & communication technologies. IEEE, 2013.

Zhong, Shi, and Joydeep Ghosh. "Model-based clustering with soft balancing." Third ieee international conference on data mining. IEEE, 2003.

H. Liu, Z. Tao and Y. Fu, "Partition Level Constrained Clustering," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 10, pp. 2469-2483,1Oct.2018.doi: 10.1109/TPAMI.2017.2763945.

S. Patel, S. Sihmar and A. Jatain, "A study of hierarchical clustering algorithms," 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, 2015, pp. 537-541.

J. Lu and Q. Zhu, "An Effective Algorithm Based on Density Clustering Framework," in IEEEAccess,vol.5, pp.4991-5000,2017.doi: 10.1109/ACCESS.2017.2688477

K. M. A. Patel and P. Thakral, "The best clustering algorithms in data mining," 2016 International Conference on Communication and Signal Processing (ICCSP), Melmaruvathur,2016, pp.2042-2046.doi: 10.1109/ICCSP.2016.7754534.

Shi Zhong and J. Ghosh, "Model-based clustering with soft balancing," Third IEEE International Conference on Data Mining, Melbourne, FL, USA, 2003, pp. 459-466.

doi: 10.1109/ICDM.2003.1250953.

Smarika, N. Mattas, P. Kalra and D. Mehrotra, "Agglomerative hierarchical Clustering technique for partitioning patent dataset," 2015 4th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future Directions), Noida, 2015, pp. 1-4.

MacQueen, J. “Some methods for classification and analysis of multivariate observations.” Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, 281--297, University of California Press, Berkeley, Calif., 1967. https://projecteuclid.org/euclid.bsmsp/1200512992.

Stoffel, Kilian & Belkoniene, Abdelkader. (1999). “Parallel k/h -Means Clustering for Large Data Sets”. pp1451-1454. Doi: 10.1007/3-540-48311-X_205.

MacQueen, J. “Some methods for classification and analysis of multivariate observations.” Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, 281--297, University of California Press, Berkeley, Calif., 1967. https://projecteuclid.org/euclid.bsmsp/1200512992.

Stoffel, Kilian & Belkoniene, Abdelkader. (1999). “Parallel k/h -Means Clustering for Large Data Sets”. pp1451-1454. Doi: 10.1007/3-540-48311-X_205.

Aristidis Likas, Nikos Vlassis, Jakob J. Verbeek, “The global k-means clustering algorithm”, Pattern Recognition, Volume 36, Issue 2,2003,Pages 451-461,ISSN 0031-3203,https://doi.org/10.1016/S0031-3203(02)00060-2.

Arthur, David & Vassilvitskii, Sergei. (2007). “K-Means++: The Advantages of Careful Seeding”. Proc. of the Annu. ACM-SIAM Symp. on Discrete Algorithms. 8. pp1027-1035. doi:10.1145/1283383.1283494.

Mark Van der Laan, Katherine Pollard & Jennifer Bryan (2003) A new partitioning around medoids algorithm, Journal of Statistical Computation and Simulation, 73:8, 575-584, doi: 10.1080/0094965031000136012.

L. Kaufman and P.J. Rousseeuw, Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, 1990.

Ng, Raymond & Han, Jiawei. (2002). “CLARANS: A method for clustering objects for spatial data mining”. Knowledge and Data Engineering, IEEE Transactions on. 14. 1003- 1016. 10.1109/TKDE.2002.1033770.

Zhang, T., Ramakrishnan, R. & Livny, M. BIRCH: A New Data Clustering Algorithm and Its Applications. Data Mining and Knowledge Discovery 1, 141–182 (1997) doi: 10.1023/A: 1009783824328.

Sudipto Guha, Rajeev Rastogi, Kyuseok Shim, Cure: an efficient clustering algorithm for large databases, Information Systems, Volume 26, Issue 1, 2001, pp 35-58, ISSN 0306-4379, https://doi.org/10.1016/S0306-4379(01)00008-4.

Sudipto Guha, Rajeev Rastogi, Kyuseok Shim, Rock: A robust clustering algorithm for categorical attributes, Information Systems, Volume 25, Issue 5, 2000, pp345-366, ISSN 0306-4379, https://doi.org/10.1016/S0306-4379(00)00022-3.

Ganti, Venkatesh & Gehrke, Johannes & Ramakrishnan, Raghu. (2000). “CACTUS -clustering categorical data using summaries”. In Knowledge Discovery and Data Mining. doi:10.1145/312129.312201.

Gayathri, S , Metilda, M. and Babu, S. (2015). A Shared Nearest Neighbor Density based Clustering Approach on a Proclus Method to Cluster High Dimensional Data. Indian Journal of Science and Technology. Doi: 8. 10.17485/ijst/2015/v8i22/79131.

Smarika, N. Mattas, P. Kalra and D. Mehrotra, "Agglomerative hierarchical Clustering technique for partitioning patent dataset," 2015 4th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future Directions), Noida, 2015, pp. 1-4.

S. V. Lahane, M. U. Kharat and P. S. Halgaonkar, "Divisive Approach of Clustering for Educational Data," 2012 Fifth International Conference on Emerging Trends in Engineering and Technology, Himeji, 2012, pp. 191-195.

Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD'96), Evangelos Simoudis, Jiawei Han, and Usama Fayyad (Eds.). AAAI Press pp 226-231.

Kröger, Peer & Kriegel, Hans-Peter & Kailing, Karin. (2004). Density-Connected Subspace Clustering for High-Dimensional Data. Pp 246-257. doi:10.1137/1.9781611972740.23.

Alexander Hinneburg and Daniel A. Keim. 1998. An efficient approach to clustering in large multimedia databases with noise. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD'98),AAAI Press 58-65.

Hajar Rehioui, Abdellah Idrissi, Manar Abourezq, Faouzia Zegrari, DENCLUE-IM: A New Approach for Big Data Clustering, Procedia Computer Science, Volume 83, 2016, pp 560-567, ISSN 1877-0509, https://doi.org/10.1016/j.procs.2016.04.265.

Nagesh, Harsha S., Sanjay Goil and Alok N. Choudhary. “Adaptive Grids for Clustering Massive Data Sets.” SDM (2001).

Schikuta, Erich & Erhart, Martin. (1997). The BANG-clustering system: Grid-based data analysis. Lecture Notes in Computer Science. doi:10.1007/BFb0052867

Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, and Prabhakar Raghavan. 1998. Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of the 1998 ACM SIGMOD international conference on Management of data (SIGMOD '98), Ashutosh Tiwary and Michael Franklin (Eds.). ACM, New York, NY, USA, 94-105. DOI: https://doi.org/10.1145/276304.276314

C. Doring, C. Borgelt and R. Kruse, S. Madhusudanan and Suresh Jaganathan, "Fuzzy clustering of quantitative and qualitative data, 0-7803-8376-1/04/$20.00 Copyright 2004 IEEE.

Jinchao Ji,Wei Pang, Zairong Li, Fei He, Guozhong Feng And Xiaowei Zhao, “Clustering Mixed Numeric and Categorical Data With Cuckoo Search”, Digital Object Identifier 10.1109/ACCESS.2020.2973216.

Christian Borgelt and Rudolf Kruse , “Fuzzy clustering of quantitative and qualitative data”, DOI: 10.1109/NAFIPS.2004.1336254 · Source: IEEE Xplore.

D. Venkatavara Prasad, S. Madhusudanan and Suresh Jaganathan, "uCLUST-a new algorithm for clustering unstructured data, VOL. 10, NO. 5, MARCH 2015©2006-2015 Asian Research Publishing Network (ARPN).

Felix Iglesias V azquez, Tanja Zseby and Arthur Zimek, "Interpretability and Refinement of Clustering, 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA).

Masoud Makrehchi, "Hierarchical Agglomerative Clustering Using Common Neighbours Similarity, 978-1-5090-4470-2/16 $31.00 © 2016 IEEE DOI 10.1109/WI.2016.92.

Khalid M. Hosny, "Scalable Clustering Algorithms for Big Data: A Review, Digital Object Identifier 10.1109/ACCESS.2021.3084057.

Ping Zong, Junyan Jiang and Jun Qin, "Study of High-Dimensional Data Analysis based on Clustering Algorithm, 978-1-7281-7267-5/20/$31.00 ©2020 IEEE.

Shokri Z. Selim And M. A. Ismail, "K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality, Ieee Transactions On Pattfrn Analysis And Machine Intelligence, Vol. PAMI-6, NO. 1, JANUARY 1984.

Kamalpreet Bindra and Anuranjan Mishra, "A Detailed Study of Clustering Algorithms, 978-1-5090-3012-5/17/$31.00 ©2017 IEEE.

Dongkuan Xu1 and Yingjie Tian, "A Comprehensive Survey of Clustering Algorithms, Ann. Data. Sci. DOI 10.1007/s40745-015-0040-1.

Punit Rathore , Dheeraj Kumar, James C. Bezdek, Sutharshan Rajasegarar, and Marimuthu Palaniswami, " A Rapid Hybrid Clustering Algorithm for Large Volumes of High Dimensional Data”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 31, NO. 4, APRIL 2019 641.

Wojciech Kwedlo ,And Michał Łubowicz, “Accelerated K-Means Algorithms for Low-Dimensional Data on Parallel Shared-Memory Systems”.

Arpita Nagpal, Aman Jatain, Deepti Gaur, “Review based on Data Clustering Algorithms ”, 978-1-4673-5758-6/13/$31.00 © 2013 IEEE 298.

Jelili Oyelade, Itunuoluwa Isewon, Olufunke Oladipupo, Onyeka Emebo Zacchaeus Omogbadegun, Olufemi Aromolaran, Efosa Uwoghiren, Damilare Olaniyan, and Obembe Olawole, “Data Clustering: Algorithms and Its Applications”, 2019 19th International Conference on Computational Science and Its Applications (ICCSA).

Aswan Supriyadi Sunge, Yaya Heryadi, Yoga Religia and Lukas, “Comparison of Distance Function to Performance of K-Medoids Algorithm for Clustering ”, 978-1-7281-3083-5/20/$31.00 ©2020 IEEE.

Usama Fayyad , “Data Mining and Knowledge Discovery in Databases: Implications for Scientific Databases ”, 0-8186-7952-2/97 $10.00 0 1997 IEEE.

Neha B. Nikhare and Prakash S.Prasad , “A review on inter-cluster and intra-cluster similarity using bisected fuzzy C-mean technique via outward statistical testing”, 978-1-5386-0807-4/18/$31.00 ©2018 IEEE.

Ying Lai, Ratko Orlandic, Wai Gen Yee and Sachin Kulkarni , “Scalable Clustering for Large High-Dimensional Data Based on Data Summarization”, 1-4244-0705-2/07/$20.00 ©2007 IEEE.

Mahmoud A. Mahdi , Khalid M. Hosny , And Ibrahim Elhenawy , “Scalable Clustering Algorithms for Big Data: A Review”, Digital Object Identifier 10.1109/ACCESS.2021.3084057.

Downloads

Published

17.05.2023

How to Cite

Venkata Prasad, M. D. ., & T., S. . (2023). A Survey on Clustering Algorithms and their Constraints. International Journal of Intelligent Systems and Applications in Engineering, 11(6s), 165–179. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/2838

Issue

Section

Research Article