An Extended Clusters Assessment Method with the Multi-Viewpoints for Effective Visualization of Data Partitions

Authors

  • Aswani Kumar Unnam Research Scholar, Department of CSE, Acharya Nagarjuna University, Guntur, Andhra Pradesh, India
  • Bandla Srinivasa Rao Professor in CSE, Department of CSE, Bhaskar Engineering College, Hyderabad, Telangana, India

Keywords:

Big Data, Cluster Analysis, Cluster Tendency, cVAT,, Multi-Viewpoints, VAT

Abstract

Cluster analysis is the most important for the data partitions of unlabelled data in various big data applications. It analyses the data based on similarity features of data objects. Two significant steps of the cluster analysis are as follows: assess the initial cluster tendency, and explore the data partitions. Top big data clustering techniques, such as k-means ++, single pass k-means (spkm),  mini-batch-k-means (mbkm), and spherical k-means, effectively generate the big data clusters. However, they cannot get the initial knowledge about the clustering tendency. Estimation of the knowledge about the number of clusters is known as the clustering tendency. Various estimation methods of cluster tendency are surveyed and finally investigated that visual assessment of cluster tendency (VAT) accurately assesses the clustering tendency. Finding the accurate similarity features plays a vital role in accurately assessing clusters in the VAT algorithm. This paper proposes a novel computational similarity measure for the best assessment of big data clusters. The experiments are conducted on big synthetic and big real datasets to illustrate the proposed technique's efficiency.

Downloads

Download data is not yet available.

References

Hirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y., Herawan, T. (2014). Big Data Clustering: A Review. In: , et al. Computational Science and Its Applications – ICCSA 2014. ICCSA 2014. Lecture Notes in Computer Science, vol 8583. Springer, Cham. https://doi.org/10.1007/978-3-319-09156-3_49

J. Tong, L. Shi, L. Liu, J. Panneerselvam and Z. Han, "A novel influence maximization algorithm for a competitive environment based on social media data analytics," in Big Data Mining and Analytics, vol. 5, no. 2, pp. 130-139, June 2022, doi: 10.26599/BDMA.2021.9020024.

Balasundaram, A., Chellappan, C. An intelligent video analytics model for abnormal event detection in online surveillance video. J Real-Time Image Proc 17, 915–930 (2020). https://doi.org/10.1007/s11554-018-0840-6

G. J. Priya and S. Saradha, "Fraud Detection and Prevention Using Machine Learning Algorithms: A Review," 2021 7th International Conference on Electrical Energy Systems (ICEES), 2021, pp. 564-568, doi: 10.1109/ICEES51510.2021.9383631.

Buxmann, P., Hess, T. & Thatcher, J.B. AI-Based Information Systems. Bus Inf Syst Eng 63, 1–4 (2021). https://doi.org/10.1007/s12599-020-00675-8

Kłopotek, M.A. An Aposteriorical Clusterability Criterion for k-Means++ and Simplicity of Clustering. SN COMPUT. SCI. 1, 80 (2020). https://doi.org/10.1007/s42979-020-0079-8

SARMA, T.H., VISWANATH, P. & REDDY, B.E. Single pass kernel k-means clustering method. Sadhana 38, 407–419 (2013). https://doi.org/10.1007/s12046-013-0143-3

K. Peng, V. C. M. Leung and Q. Huang, "Clustering Approach Based on Mini Batch Kmeans for Intrusion Detection System Over Big Data," in IEEE Access, vol. 6, pp. 11897-11906, 2018, doi: 10.1109/ACCESS.2018.2810267.

I. Sharma and H. Sharma, "Recognizing Patterns in Text Data through Effective Initialization of Spherical K-means," 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), 2018, pp. 327-331, doi: 10.1109/ICECA.2018.8474766.

Fidan, H., Yuksel, M.E. A Novel Short Text Clustering Model Based on Grey System Theory. Arab J Sci Eng 45, 2865–2882 (2020). https://doi.org/10.1007/s13369-019-04191-0

J. C. Bezdek and R.J. Hathaway (2002). VAT: A tool for visual assessment of (cluster) tendency. In Proc. 2002 International Joint Conference on Neural Networks, Honolulu, HI, 2002, 2225-2230.

B Eswara Reddy, K Rajendra Prasad: Improving the performance of visualized clustering method. International Journal of System Assurance Engineering and Management (Springer), Volume 7(1), pp 102–111 (2016)

Rui X, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678

Rudolf Scitovski et al. cluster analysis and applications, springer, 2021

T. C. Havens and J. C. Bezdek, “An efficient formulation of the improved visual assessment of cluster tendency (iVAT) algorithm,” IEEE Trans. Knowl. Data Eng., vol. 24, no. 5, pp. 813–822, May 2012

L. Wang, X. Geng, J. Bezdek, C. Leckie and R. Kotagiri, "SpecVAT: Enhanced Visual Cluster Analysis," 2008 Eighth IEEE International Conference on Data Mining, 2008, pp. 638-647, doi: 10.1109/ICDM.2008.18.

Pattanodom, et al. "Clustering data with the presence of missing values by ensemble approach," 2016 Second Asian Conference on Defence Technology.

Alessia Amelio and Clara Pizzuti, "Is Normalized Mutual Information a Fair Measure for Comparing Community Detection Methods?", IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2015.

A. Asuncion and D. Newman, “UCI machine learning repository,” 2007.

P. Rathore, D. Kumar, J. C. Bezdek, S. Rajasegarar and M. Palaniswami, "A Rapid Hybrid Clustering Algorithm for Large Volumes of High Dimensional Data," in IEEE Transactions on Knowledge and Data Engineering, vol. 31, no. 4, pp. 641-654, 1 April 2019.

Rathore P., Bezdek J.C., Palaniswami M. (2021) Fast Cluster Tendency Assessment for Big, High-Dimensional Data. In: Lesot MJ., Marsala C. (eds) Fuzzy Approaches for Soft Computing and Approximate Reasoning: Theories and Applications. Studies in Fuzziness and Soft Computing, vol 394. Springer, Cham. https://doi.org/10.1007/978-3-030-54341-9_12

VAT- Illustrative Example

Downloads

Published

16.01.2023

How to Cite

[1]
A. K. . Unnam and B. S. . Rao, “An Extended Clusters Assessment Method with the Multi-Viewpoints for Effective Visualization of Data Partitions”, Int J Intell Syst Appl Eng, vol. 11, no. 1s, pp. 51–56, Jan. 2023.