Two-Phase Privacy Preserving Big Data Hybrid Clustering for Multi-Party Data Sharing

Authors

  • Manjula GS Research Scholar, Department of Computer Science, Alagappa University, Karaikudi
  • T. Meyyappan Professor, Department of Computer Science, Alagappa University, Karaikudi, Tamil Nadu, India

Keywords:

Big data, clustering, multiparty communication, partitioned database, privacy-preserving, secure channel

Abstract

In this study, we take on the challenge of private data clustering. Performing a clustering technique on the union of datasets held by many parties without disclosing any further information is a scenario that has been investigated. This issue, an instance of protected multi-party computing, may be addressed using existing protocols. DBSCAN and K-Medoid apply to all data types and produce clusters identical to conventional ones, while other clustering methods are only relevant to certain kinds of data. As its name implies, DBSCAN and K-Medoid are algorithms best suited for use with a single database. In this study, we propose a method for determining the separation of data points when the information is split across two servers. This study proposes a novel method, a modified version of the privacy-preserving hybrid clustering algorithm that may be used on data sets that have been vertically and horizontally partitioned and are spread over numerous nodes in a network. The results of the experiments showed that the new technique outperformed the old ones.

Downloads

Download data is not yet available.

References

Chen, C.L.P., Zhang, C.Y. (2014), "Data-intensive applications, challenges, techniques and technologies: A survey on Big Data." Inf. Sci. (Ny)., 275: 314-347. https://doi.org/10.1016/j.ins.2014.01.015

Wang, X.K., Yang, L.T., Liu, H.Z., Deen, M.J. (2017). "A big data-as-a-service framework: State-of-the-art and perspectives." IEEE Trans. Big Data, 4(3): 325-340. https://doi.org/10.1109/TBDATA.2017.2757942

Elkano, M., Sanz, J.A.A., Barrenechea, E., Bustince, H., Galar, M. (2019). CFM-BD: "A distributed rule induction algorithm for building Compact fuzzy models in big data classification problems". IEEE Trans. Fuzzy Syst., vol. 1. https://doi.org/10.1109/TFUZZ.2019.2900856

www.entrepreneur.com/article/273561. Accessed Nov. 18

SmZobaed and Mohsen Amini Salehi. "Big data in the cloud. In Encyclopedia of Big Data", Edited by Albert Zomaya and SherifSakr, Springer International Publishing.

Kumar, D., Bezdek, J.C., Palaniswami, M., Rajasegarar, S., Leckie, C., Havens, T.C. (2016). "A hybrid approach to clustering in big data. IEEE Transactions on Cybernetics", 46(10): 2372-2385. https://doi.org/10.1109/TCYB.2015.247741

Rayala, Venkat, and Satyanarayan Reddy Kalli. “Big Data Clustering Using Improvised Fuzzy C-Means Clustering.” Rev. d'IntelligenceArtif. 34, no. 6 (2020): 701-708.

M. Ahmed, A. N. Mahmood, and Md. R. Islam. 2016. "A Survey of Anomaly Detection Techniques in Financial Domain. In Future Generation Computer Systems".

Q. Guo, X. Lu, Y. Gao, J. Zhang, B. Yan, D. Su, A. Song, X. Zhao, and G.Wang. 2017. "Cluster Analysis: A New Approach for Identification of Underlying Risk Factors for Coronary Artery Disease in Essential Hypertensive Patients". In Scientific Reports.

G. Punj and D.W. Stewart. 1983. "Cluster Analysis in Marketing Research: Review and Suggestions for Application", In Journal of Marketing Research.

J. Hou, W. Liu, "Evaluating the density parameter in density peak based clustering", 2016, pp. 68–72.

Y. Wang, D. Wang, X. Zhang, W. Pang, C. Miao, A. Tan, Y. Zhou, Mcdpc: "multi-center density peak clustering", Neural Comput. Appl. (2020) 1–14.

Y. Shi, Z. Chen, Z. Qi, F. Meng, L. Cui, "A novel clustering-based image segmentation via density peaks algorithm with mid-level feature", Neural Comput.Appl. 28 (2017) 29–39.

A.D. Marco, R. Navigli, "Clustering and diversifying web search results with graph-based word sense induction", Comput. Linguist. 39 (2013) 709–754.

Z. Du, "Energy analysis of internet of things data mining algorithm for smart green communication networks", Comput. Commun. 152 (2020) 223–231.

S. Aghabozorgi, Y.W. Teh, "Stock market co-movement assessment using a three-phase clustering method", Expert Syst. Appl. 41 (2014) 1301–1314.

L. Sun, G. Chen, H. Xiong, C. Guo, "Cluster analysis in data-driven management and decisions", J. Manag. Sci. Eng. 2 (2017) 227–251.

J. Zhou, Z. Lai, C. Gao, X. Yue, W.K. Wong, "Rough-fuzzy clustering based on two-stage three-way approximations", IEEE Access 6 (2018) 27541–27554.

T. Gocken, M. Yaktubay, "Comparison of different clustering algorithms via genetic algorithm for vrptw", Int. J. Simul. Model. 18 (2019) 574–585.

K.M. Kumar, A.R.M. Reddy, "An efficient k-means clustering filtering algorithm using density based initial cluster centers", Inform. Sci. 418 (2017) 286–301.

R.C. Hrosik, E. Tuba, E. Dolicanin, R. Jovanovic, M. Tuba, "Brain image segmentation based on firefly algorithm combined with k-means clustering", Stud. Inf. Control 28 (2019) 167–176.

S. Sieranoja, P. Franti, "Fast and general density peaks clustering", Pattern Recognit. Lett. 128 (2019) 551–558.

Lei Xu, Chunxiao Jiang, Jian Wang, Jian Yuan, Yong Ren. "Information Security in Big Data: Privacy and Data Mining", IEEE Access, 2014, pp. 1149-1176

J. Vaidya, M. Kantarcoglu, and C. Clifton, "Privacy-preserving Naïve Bayes classification," Int. J. Very Large Data Bases, vol. 17, no. 4, pp. 879–898, 2008.

R. Akhter, R. J. Chowdhury, K. Emura, T. Islam, M. S. Rahman, and N. Rubaiyat, "Privacy-preserving two-party k-means clustering in malicious model", in Proc. IEEE 37th Annu. Comput. Softw. Appl. Conf. Workshops (COMPSACW), Jul. 2013, pp. 121–126.

I. De and A. Tripathy, "A secure two party hierarchical clustering approach for vertically partitioned data set with accuracy measure", in Proc. 2nd Int. Symp. Recent Adv. Intell. Informat., 2014, pp. 153–162.

1.Agrawal, R., Srikant, R.: "Privacy preserving data mining". In: Proceedings of the 2000 ACM SIGMOD Conference on Management of Data, Dallas, TX, May 2000, pp. 439–450. ACM Press, New York (2000)

Jha, S., Kruger, L., McDaniel, P.: "Privacy Preserving Clustering". In: di Vimercati, S.d.C., Syverson, P.F., Gollmann, D. (eds.) ESORICS 2005. LNCS, vol. 3679, pp. 397–417. Springer, Heidelberg (2005)

Vaidya, J., Clifton, C.: "Privacy-preserving k-means clustering over vertically partitioned data". In: Proceedings of the 9th ACM SIGKDD International Conference on knowledge Discovery and Data Mining, Washington, DC, August 2003, ACM Press, New York (2003)

Krishna Prasad, P., Pandu Rangan, C.: "Privacy preserving BIRCH algorithm for clustering over vertically partitioned databases". In: Jonker, W., Petkovi´c, M. (eds.) SDM 2006. LNCS, vol. 4165, pp. 84–99. Springer, Heidelberg (2006)

Jagannathan, G., Wright, R.N.: "Privacy-preserving distributed k-means clustering over arbitrarily partitioned data". In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, Illinois, August 2005, pp. 593–599. ACM Press, New York (2005)

N. Kumar, M. Rathee, N. Chandran, D. Gupta, A. Rastogi, and R. Sharma, "CrypTFlow: Secure TensorFlow inference," in IEEE S&P, 2020.

D. Rathee, M. Rathee, N. Kumar, N. Chandran, D. Gupta, A. Rastogi, and R. Sharma, "CrypTFlow2: Practical 2-party secure inference," in CCS, 2020.

P. Mishra, R. Lehmkuhl, A. Srinivasan, W. Zheng, and R. A. Popa, "Delphi: A cryptographic inference service for neural networks," in USENIX Security, 2020.

Dhabliya, M. D. (2018). A Scientific Approach and Data Analysis of Chemicals used in Packed Juices. Forest Chemicals Review, 01–05.

A. Patra, T. Schneider, A. Suresh, and H. Yalame, "ABY2. 0: Improved mixed-protocol secure two-party computation," in USENIX Security, 2021.

V. Haralampieva, D. Rueckert, and J. Passerat-Palmbach, "A systematic comparison of encrypted machine learning solutions for image classification," in PPMLP, 2020.

F. Boemer, R. Cammarota, D. Demmler, T. Schneider, and H. Yalame, "MP2ML: A mixed-protocol machine learning framework for private inference," in ARES, 2020.

N.R. Adam and J.C. Wortmann. "Security-control methods for statistical databases: A comparative study". ACM Computing Surveys, 21, 1989.

D.E. Denning. "A security model for the statistical database problem". ACM Transactions on Database Systems (TODS), 5, 1980.

M. Klusch, S. Lodi, and Gianluca Moro. "Distributed clustering based on sampling local density estimates". In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI 2003), pages 485–490, 2003.

S. Merugu and J. Ghosh. "Privacy-preserving distributed clustering using generative models". In Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM 2003), pages 211–218, 2003.

S. Oliveira and O. R. Zaiane. “Privacy preserving clustering by data transformation”. In XVIII Simp´osioBrasileiro de Bancos de Dados, 6-8 de Outubro (SBBD 2003), pages 304–318, 2003.

P. Bunn and R. Ostrovsky. 2007. "Secure Two-Party K-means Clustering". In CCS. ACM.

Dhabliya, D. (2021a). AODV Routing Protocol Implementation: Implications for Cybersecurity. In Intelligent and Reliable Engineering Systems (pp. 144–148). CRC Press.

A. Jäschke and F. Armknecht. 2018. "Unsupervised Machine Learning on Encrypted Data". In SAC. Springer.

Mahesh, R., and T. Meyyappan. "Fuzzy based cell generalization to improve the data utility with minimal loss of information." Journal of Intelligent & Fuzzy Systems 37, no. 1 (2019): 217-225.

P. Mohassel, M. Rosulek, and N. Trieu. 2020. "Practical Privacy-Preserving Kmeans Clustering". In PoPETS. Sciendo.

M. S. Rahman, A. Basu, and S. Kiyomoto. 2017. "Towards Outsourced Privacy- Preserving Multiparty DBSCAN", In Pacific Rim International symposium on Dependable Computing. IEEE.

Kasthuri, S., and T. Meyyappan. "Detection of sensitive items in market basket database using association rule mining for privacy preserving." In 2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering, pp. 200-203. IEEE, 2013.

Mahesh, R., and T. Meyyappan. "Anonymization technique through record elimination to preserve privacy of published data." In 2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering, pp. 328-332. IEEE, 2013.

M. Ester, H.-P. Kriegel, J. Sander, X. Xu, et al. 1996. "A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise", In SIGKDD Conference on Knowledge Discovery and Data Mining. ACM.

I.V. Anikin, K. Alnajjar, "Fuzzy stream cipher system", 2015 International Siberian Conference on Control and Communications (SIBCON), 2015. DOI: 10.1109/SIBCON.2015.7146976.

Zhuo, Linlin, Kenli Li, Bo Liao, Hao Li, Xiaohui Wei, and Keqin Li. "HCFS: a density peak based clustering algorithm employing a hierarchical strategy", IEEE Access 7 (2019): 74612-74624.

G. Karypis, E.-H. Han, and V. Kumar, "Chameleon: Hierarchical clustering using dynamic modeling," Computer, vol. 32, no. 8, pp. 68_75, Aug. 1999.

M. Lichman, ``Uci machine learning repository,'' Tech. Rep. 2013.

Thang, Vu Viet, D. V. Pantiukhin, and A. I. Galushkin. "A hybrid clustering algorithm: the fastDBSCAN", In 2015 International Conference on Engineering and Telecommunication (EnT), pp. 69-74. IEEE, 2015.

Bhat , A. H. ., & H V, B. A. . (2023). E2BNAR: Energy Efficient Backup Node Assisted Routing for Wireless Sensor Networks . International Journal on Recent and Innovation Trends in Computing and Communication, 11(3s), 193–204. https://doi.org/10.17762/ijritcc.v11i3s.618

Auma, G., Goldberg, R., Oliveira, A., Seo-joon, C., & Nakamura, E. Enhancing Sentiment Analysis Using Transfer Learning Techniques. Kuwait Journal of Machine Learning, 1(3). Retrieved from http://kuwaitjournals.com/index.php/kjml/article/view/129

Downloads

Published

12.07.2023

How to Cite

GS, M. ., & Meyyappan, T. . (2023). Two-Phase Privacy Preserving Big Data Hybrid Clustering for Multi-Party Data Sharing. International Journal of Intelligent Systems and Applications in Engineering, 11(9s), 501–510. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/3187

Issue

Section

Research Article