Analysis of Single-View and Multi-view K-Means Clustering on Big Data Environment

Authors

  • Satish S. Banait Department of Computer Engineering, K.K. Wagh IEER, Nashik, SPPU Pune, India
  • Namrata D. Ghuse Department of Computer Engineering, K.K. Wagh IEER, Nashik, SPPU Pune, India
  • Dipak D. Bage Department of Information Technology, Sandip Institute of Technology & research Centre, Nashik
  • Sonali N. Jadhav Department of Computer Engineering, K.K. Wagh IEER, Nashik, SPPU Pune, India
  • Avinash A. Somatkar Vishwakarma Institute of Information Technology, Pune
  • Vinod B. Bhamare Department of Computer Engineering, Sandip Institute of Technology & research Centre, Nashik

Keywords:

Multi-view dataset, Multi-view clustering techniques, K-means, Jaccard Coefficient (Jacc), Fowlkes Mallows Index (FM), Normalized Mutual Information (NMI), Rand Index (RI)

Abstract

Due to the revolutionary advancements in the signal sensing devices and its availability to civilians, the real time datasets are now having multiple views. Thus such a multi-view datasets are quite common in era of big data domain. As against learning of single-view, learning of multi-view has plenty of benefits. Clustering has been very useful technique in the machine learning and data mining. Traditional clustering techniques use only single set of features of the available dataset. However for the multi-view dataset with multiple features, how to ensemble all of these data views is a major concern. Thus problem is termed as multi-view clustering problem. The key benefits of multi-view clustering against single-view clustering are accurate description of data, reducing noises of data, and wider range of applications. This research works highlight the impact multi-view K-means clustering available in mvlearn python package with the traditional K-means clustering technique. To assess the impact of simple K-means technique and multi-view version of K-means technique, two datasets are utilized namely, nutrimouse and simulated dataset. In order to analyze the impact of multi-view clustering on clustering quality, traditional k-means technique is applied to individual views, concatenated view of the both the datasets, followed by the application of multi-view version of K-means technique on the both the datasets. We analyzed the clustering quality of multi-view K-means technique using various performance evaluation parameters such as Jaccard Coefficient (Jacc), Fowlkes Mallows Index (FM), Normalized Mutual Information (NMI), Rand Index (RI), and clustering execution times.

Downloads

Download data is not yet available.

References

Xu, D. C. Tao, and C. Xu, “A survey on multi-view learning”, arXiv preprint arXiv: 1304.5634, 2013.

C. Aggarwal and C. K. Reddy, “Data Clustering: Algorithms and Applications”, Boca Raton, FL, USA, Chapman and Hall/CRC, 2013.

S. L. Sun, “A survey of multi-view machine learning”, Neural Comput. Appl., vol. 23, nos. 7&8, pp. 2031–2038, 2013.

R. Xu and D. Wunsch, “Survey of clustering algorithms”, IEEE Trans. Neural Netw., vol. 16, no. 3, pp. 645–678, 2005.

Zhao J, Xie X, Xin X, Sun S, “Multi-view learning overview: Recent progress and new challenges”, pp. 43-54, Information Fusion. 2017.

J. A. Hartigan, “A K-Means Clustering Algorithm,” Appl Stat, vol. 28, no. 1, pp. 100-108, 1979.

L. Jing, M. K. Ng, and J. Z. Huang, “An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data,” IEEE Transactions on Knowledge & Data Engineering, vol. 19, no. 8, pp. 1026-1041, 2007.

Satish S Banait, Sane S. S., Talekar S. A. - : An Efficient Clustering Technique for Mining Big Data” International Journal of Next-Generation Computing, 2022, Vol 13, Issue 3, p702-715.

L. Zhu, F. L. Chung, and S. Wang, “Generalized Fuzzy C-Means Clustering Algorithm With Improved Fuzzy Partitions,” IEEE Transactions on Systems Man & Cybernetics Part B Cybernetics, vol. 39, no. 3, pp. 578-591, 2009.

L. O. Hall and D. B. Goldgof, “Convergence of the Single-Pass and Online Fuzzy C-Means Algorithms,” IEEE Transactions on Fuzzy Systems, vol. 19, no. 4, pp. 792-794, 2011.

K. Kamvar, S. Sepandar, K. Klein, D. Dan, M. Manning, & C. Christopher, “Spectral learning,” In International Joint Conference of Artificial Intelligence Stanford InfoLab, 2003.

A. Y. Ng, M. I. Jordan, and Y. Weiss, “On spectral clustering: Analysis and an algorithm,” In Advances in neural information processing systems, vol. 2, pp. 849–856, 2002.

X. Chen, X. Xu, J. Z. Huang, and Y. Ye, “TW-k-means: Automated two-level variable weighting clustering algorithm for multiview data,” IEEE Transactions on Knowledge & Data Engineering, vol. 25, no. 4, pp. 932-944, 2013.

X. Cai, F. Nie, and H. Huang, “Multi-view k-means clustering on big data,” In Twenty-Third International Joint conference on artificial intelligence, 2013.

W. Pedrycz, “Collaborative fuzzy clustering,” Pattern Recognition Letters, vol. 23, no. 14, pp. 1675-1686, 2002.

G. Cleuziou, M. Exbrayat, L. Martin, and J. H. Sublemontier, “CoFKM: A Centralized Method for Multiple-View Clustering”, pp. 752-757, Proceedings of 9th IEEE International Conference on Data Mining, 2009.

Y. Jiang, F. L. Chung, S. Wang, Z. Deng, J. Wang, and P. Qian, “Collaborative fuzzy clustering from multiple weighted views”, vol. 45, no. 4, pp. 688-701, IEEE Transactions on Cybernetics, 2015.

M.B. Blaschko and C.H. Lampert, “Correlational spectral clustering”, pp. 1-8, Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008.

K. Chaudhuri, S. M. Kakade, K. Livescu, and K. Sridharan, “Multi-view clustering via canonical correlation analysis,” in Proceedings of the 26th annual international conference on machine learning, pp. 129-136, 2009.

J. Liu, C. Wang, J. Gao, and J. Han, “Multi-view clustering via joint nonnegative matrix factorization”, pp. 252-260, Proceedings of the 2013 SIAM International Conference on Data Mining, 2013.

G. Tzortzis, A. Likas, “Kernel-based weighted multi-view clustering” , pp. 675– 684, Proceedings of the 12th International Conference on Data Mining, 2012.

S. Xiang, L. Yuan, W. Fan, Y. Wang, P.M. Thompson, J. Ye, “Multi-source learning with block-wise missing data for Alzheimer's disease prediction”, pp. 185–193, Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013.

https://mvlearn.github.io/auto_examples/cluster/plot_mv_kmeans_validation_simulated.html.

P. Martin, H. Guillou, F. Lasserre, S. Déjean, A. Lan, J-M. Pascussi, M. San Cristobal, P. Legrand, P. Besse, T. Pineau - Novel aspects of PPARalpha-mediated regulation of lipid and xenobiotic metabolism revealed through a nutrigenomic study. Hepatology, in press, 2007.

M. Ester, H. P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 1996, pp. 226–231.

Downloads

Published

23.02.2024

How to Cite

Banait, S. S. ., Ghuse, N. D. ., Bage, D. D. ., Jadhav, S. N. ., Somatkar, A. A. ., & Bhamare, V. B. . (2024). Analysis of Single-View and Multi-view K-Means Clustering on Big Data Environment. International Journal of Intelligent Systems and Applications in Engineering, 12(16s), 713–722. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/5028

Issue

Section

Research Article