A New Approach to Determine Eps Parameter of DBSCAN Algorithm
DOI:
https://doi.org/10.18201/ijisae.2017533899Keywords:
AE-DBSCAN, Clustering, Data Mining, Density-Based ClusteringAbstract
In recent years, data analysis has become important with increasing data volume. Clustering, which groups objects according to their similarity, has an important role in data analysis. DBSCAN is one of the most effective and popular density-based clustering algorithm and has been successfully implemented in many areas. However, it is a challenging task to determine the input parameter values of DBSCAN algorithm, which are neighborhood radius, Eps, and minimum number of points, MinPts. The values of these parameters significantly affect clustering performance of the algorithm. In this study, we propose AE-DBSCAN algorithm, which includes a new method to determine the value of neighborhood radius Eps automatically. The experimental evaluations showed that the proposed method outperformed the analytical DBSCAN.Downloads
References
M. Ester, H.-P. Kriegel, and X. Xu "A density-based algorithm for discovering clusters in large spatial databases with noise," in Proc. KDD, Oregon, USA, 1996, pp. 226-231.
X. P. Yu, D. Zhou, and Y. Zhou, “A New Clustering Algorithm Based on Distance and Density,” in Proc. ICSSSM, Chongquing, China, 2005, pp. 1016-1021.
S. K. Popat, and M. Emmanuel, "Review and Comparative Study of Clustering Techniques," Int. J. of Computer Science and Information Technologies, vol. 5, no.1, pp. 805–12, 2014.
P. Liu, D. Zhou, and N. J. Wu,“VDBSCAN: Varied density based spatial clustering of applications with noise,” in Proc. ICSSSM, Chengdu, China, 2007, pp 1-4.
K. Khan, S. U. Rehman, K. Aziz, S. Fong and S. Sarasvady, "DBSCAN: Past, present and future." in Proc. ICADIWT, Bangalore, India, 2014, pp. 232-238.
A. Ram, S. Jalal, A. S. Jalal, and M. Kumar "A density based algorithm for discovering density varied clusters in large spatial databases," Int. J. of Computer Applications, vol. 3, no. 6, pp. 1-4, 2010.
A.K. Jain, M.N. Murty, and P.J. Flynn, "Data Clustering: A Review," ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, 1999.
D. Birant and A. Kut, “ST-DBSCAN: An algorithm for clustering spatial-temporal data,” Data & Knowledge Engineering, vol. 60, no. 1, pp. 208–221, 2007.
M. Celik, F. Dadaser-Celik, and A. Dokuz, “Anomaly detection in temperature data using dbscan algorithm,” in Proc. INISTA, Istanbul, Turkey, 2011, pp. 91–95.
P. N. Tan, M. Steinbach, and V. Kumar, "Introduction to Data Mining," Boston Addison-Wesley, April 2005.
G. Sheikholeslami, S. Chatterjee, and A. Zhang, "Wave Cluster: A multi-resolution clustering approach for very large spatial databases," in Proc. VLDB, San Francisco, CA, 1998, pp.428-439.
G. Sudipto, R. Rastogi and K. Shim, "CURE: An efficient clustering algorithm for large Databases," in Proc. ACM SIGMOD, Seattle, WA, 1998, pp.73-84.
T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An efficient data clustering method for very large databases,” in Proc. ACM SIGMOD, 1996, pp. 103–114.
W. Wang, J. Yang, and R. R. Muntz, “STING: A statistical information grid approach to spatial data mining,” in Proc VLDB, San Francisco, CA, USA, 1997, pp. 186–195.
M. Halkidi, Y. Batistakis, and M. Varzirgiannis, “On clustering validation techniques,” J. of Intelligent Information Systems, vol. 17, no. 2-3, pp. 107–145, 2001.
Karypis, G., Han, E.H., and Kumar, V.: “Chameleon: A Hierarchical Clustering Algorithm Using Dynamic Modeling,” IEEE Computer, vol. 32, no. 8, pp 68-75, August 1999.
Z. Chen and Y. F. Li, "Anomaly detection based on enhanced dbscan algorithm", Procedia Engineering, vol. 15, pp. 178-182, 2011.
H. Zhou, P. Want, and H. Li, "Research on adaptive parameters determination in DBSCAN algorithm," J. of Information & Computational Science, vol. 9, no. 7, pp. 1967-1973, 2012.
A. R. Chowdhury, M. E. Mollah, and M. A. Rahman, "An efficient method for subjectively choosing parameter k automatically in VDBSCAN (varied density based spatial clustering of applications with noise) algorithm," in Proc. ICCAE, Singapore, 2010, pp. 38-41.
M. Daszykowski, B. Walczak, and D. L. Massart, "Looking for Natural Patterns in Data. Part 1: Density Based Approach," Chemometrics and Intelligent Laboratory Systems, vol. 56, no. 2, pp. 83-92, 2001.
Clustering datasets, Available: http://cs.uef.fi/sipu/datasets/. Accessed on: April 23, 2017.
Downloads
Published
How to Cite
Issue
Section
License
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.