An Extended Clusters Assessment Method with the Multi-Viewpoints for Effective Visualization of Data Partitions
Keywords:Big Data, Cluster Analysis, Cluster Tendency, cVAT,, Multi-Viewpoints, VAT
Cluster analysis is the most important for the data partitions of unlabelled data in various big data applications. It analyses the data based on similarity features of data objects. Two significant steps of the cluster analysis are as follows: assess the initial cluster tendency, and explore the data partitions. Top big data clustering techniques, such as k-means ++, single pass k-means (spkm), mini-batch-k-means (mbkm), and spherical k-means, effectively generate the big data clusters. However, they cannot get the initial knowledge about the clustering tendency. Estimation of the knowledge about the number of clusters is known as the clustering tendency. Various estimation methods of cluster tendency are surveyed and finally investigated that visual assessment of cluster tendency (VAT) accurately assesses the clustering tendency. Finding the accurate similarity features plays a vital role in accurately assessing clusters in the VAT algorithm. This paper proposes a novel computational similarity measure for the best assessment of big data clusters. The experiments are conducted on big synthetic and big real datasets to illustrate the proposed technique's efficiency.
Hirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y., Herawan, T. (2014). Big Data Clustering: A Review. In: , et al. Computational Science and Its Applications – ICCSA 2014. ICCSA 2014. Lecture Notes in Computer Science, vol 8583. Springer, Cham. https://doi.org/10.1007/978-3-319-09156-3_49
J. Tong, L. Shi, L. Liu, J. Panneerselvam and Z. Han, "A novel influence maximization algorithm for a competitive environment based on social media data analytics," in Big Data Mining and Analytics, vol. 5, no. 2, pp. 130-139, June 2022, doi: 10.26599/BDMA.2021.9020024.
Balasundaram, A., Chellappan, C. An intelligent video analytics model for abnormal event detection in online surveillance video. J Real-Time Image Proc 17, 915–930 (2020). https://doi.org/10.1007/s11554-018-0840-6
G. J. Priya and S. Saradha, "Fraud Detection and Prevention Using Machine Learning Algorithms: A Review," 2021 7th International Conference on Electrical Energy Systems (ICEES), 2021, pp. 564-568, doi: 10.1109/ICEES51510.2021.9383631.
Buxmann, P., Hess, T. & Thatcher, J.B. AI-Based Information Systems. Bus Inf Syst Eng 63, 1–4 (2021). https://doi.org/10.1007/s12599-020-00675-8
Kłopotek, M.A. An Aposteriorical Clusterability Criterion for k-Means++ and Simplicity of Clustering. SN COMPUT. SCI. 1, 80 (2020). https://doi.org/10.1007/s42979-020-0079-8
SARMA, T.H., VISWANATH, P. & REDDY, B.E. Single pass kernel k-means clustering method. Sadhana 38, 407–419 (2013). https://doi.org/10.1007/s12046-013-0143-3
K. Peng, V. C. M. Leung and Q. Huang, "Clustering Approach Based on Mini Batch Kmeans for Intrusion Detection System Over Big Data," in IEEE Access, vol. 6, pp. 11897-11906, 2018, doi: 10.1109/ACCESS.2018.2810267.
I. Sharma and H. Sharma, "Recognizing Patterns in Text Data through Effective Initialization of Spherical K-means," 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), 2018, pp. 327-331, doi: 10.1109/ICECA.2018.8474766.
Fidan, H., Yuksel, M.E. A Novel Short Text Clustering Model Based on Grey System Theory. Arab J Sci Eng 45, 2865–2882 (2020). https://doi.org/10.1007/s13369-019-04191-0
J. C. Bezdek and R.J. Hathaway (2002). VAT: A tool for visual assessment of (cluster) tendency. In Proc. 2002 International Joint Conference on Neural Networks, Honolulu, HI, 2002, 2225-2230.
B Eswara Reddy, K Rajendra Prasad: Improving the performance of visualized clustering method. International Journal of System Assurance Engineering and Management (Springer), Volume 7(1), pp 102–111 (2016)
Rui X, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678
Rudolf Scitovski et al. cluster analysis and applications, springer, 2021
T. C. Havens and J. C. Bezdek, “An efficient formulation of the improved visual assessment of cluster tendency (iVAT) algorithm,” IEEE Trans. Knowl. Data Eng., vol. 24, no. 5, pp. 813–822, May 2012
L. Wang, X. Geng, J. Bezdek, C. Leckie and R. Kotagiri, "SpecVAT: Enhanced Visual Cluster Analysis," 2008 Eighth IEEE International Conference on Data Mining, 2008, pp. 638-647, doi: 10.1109/ICDM.2008.18.
Pattanodom, et al. "Clustering data with the presence of missing values by ensemble approach," 2016 Second Asian Conference on Defence Technology.
Alessia Amelio and Clara Pizzuti, "Is Normalized Mutual Information a Fair Measure for Comparing Community Detection Methods?", IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2015.
A. Asuncion and D. Newman, “UCI machine learning repository,” 2007.
P. Rathore, D. Kumar, J. C. Bezdek, S. Rajasegarar and M. Palaniswami, "A Rapid Hybrid Clustering Algorithm for Large Volumes of High Dimensional Data," in IEEE Transactions on Knowledge and Data Engineering, vol. 31, no. 4, pp. 641-654, 1 April 2019.
Rathore P., Bezdek J.C., Palaniswami M. (2021) Fast Cluster Tendency Assessment for Big, High-Dimensional Data. In: Lesot MJ., Marsala C. (eds) Fuzzy Approaches for Soft Computing and Approximate Reasoning: Theories and Applications. Studies in Fuzziness and Soft Computing, vol 394. Springer, Cham. https://doi.org/10.1007/978-3-030-54341-9_12
How to Cite
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.