Two-Phase Privacy Preserving Big Data Hybrid Clustering for Multi-Party Data Sharing
Keywords:
Big data, clustering, multiparty communication, partitioned database, privacy-preserving, secure channelAbstract
In this study, we take on the challenge of private data clustering. Performing a clustering technique on the union of datasets held by many parties without disclosing any further information is a scenario that has been investigated. This issue, an instance of protected multi-party computing, may be addressed using existing protocols. DBSCAN and K-Medoid apply to all data types and produce clusters identical to conventional ones, while other clustering methods are only relevant to certain kinds of data. As its name implies, DBSCAN and K-Medoid are algorithms best suited for use with a single database. In this study, we propose a method for determining the separation of data points when the information is split across two servers. This study proposes a novel method, a modified version of the privacy-preserving hybrid clustering algorithm that may be used on data sets that have been vertically and horizontally partitioned and are spread over numerous nodes in a network. The results of the experiments showed that the new technique outperformed the old ones.
Downloads
References
Chen, C.L.P., Zhang, C.Y. (2014), "Data-intensive applications, challenges, techniques and technologies: A survey on Big Data." Inf. Sci. (Ny)., 275: 314-347. https://doi.org/10.1016/j.ins.2014.01.015
Wang, X.K., Yang, L.T., Liu, H.Z., Deen, M.J. (2017). "A big data-as-a-service framework: State-of-the-art and perspectives." IEEE Trans. Big Data, 4(3): 325-340. https://doi.org/10.1109/TBDATA.2017.2757942
Elkano, M., Sanz, J.A.A., Barrenechea, E., Bustince, H., Galar, M. (2019). CFM-BD: "A distributed rule induction algorithm for building Compact fuzzy models in big data classification problems". IEEE Trans. Fuzzy Syst., vol. 1. https://doi.org/10.1109/TFUZZ.2019.2900856
www.entrepreneur.com/article/273561. Accessed Nov. 18
SmZobaed and Mohsen Amini Salehi. "Big data in the cloud. In Encyclopedia of Big Data", Edited by Albert Zomaya and SherifSakr, Springer International Publishing.
Kumar, D., Bezdek, J.C., Palaniswami, M., Rajasegarar, S., Leckie, C., Havens, T.C. (2016). "A hybrid approach to clustering in big data. IEEE Transactions on Cybernetics", 46(10): 2372-2385. https://doi.org/10.1109/TCYB.2015.247741
Rayala, Venkat, and Satyanarayan Reddy Kalli. “Big Data Clustering Using Improvised Fuzzy C-Means Clustering.” Rev. d'IntelligenceArtif. 34, no. 6 (2020): 701-708.
M. Ahmed, A. N. Mahmood, and Md. R. Islam. 2016. "A Survey of Anomaly Detection Techniques in Financial Domain. In Future Generation Computer Systems".
Q. Guo, X. Lu, Y. Gao, J. Zhang, B. Yan, D. Su, A. Song, X. Zhao, and G.Wang. 2017. "Cluster Analysis: A New Approach for Identification of Underlying Risk Factors for Coronary Artery Disease in Essential Hypertensive Patients". In Scientific Reports.
G. Punj and D.W. Stewart. 1983. "Cluster Analysis in Marketing Research: Review and Suggestions for Application", In Journal of Marketing Research.
J. Hou, W. Liu, "Evaluating the density parameter in density peak based clustering", 2016, pp. 68–72.
Y. Wang, D. Wang, X. Zhang, W. Pang, C. Miao, A. Tan, Y. Zhou, Mcdpc: "multi-center density peak clustering", Neural Comput. Appl. (2020) 1–14.
Y. Shi, Z. Chen, Z. Qi, F. Meng, L. Cui, "A novel clustering-based image segmentation via density peaks algorithm with mid-level feature", Neural Comput.Appl. 28 (2017) 29–39.
A.D. Marco, R. Navigli, "Clustering and diversifying web search results with graph-based word sense induction", Comput. Linguist. 39 (2013) 709–754.
Z. Du, "Energy analysis of internet of things data mining algorithm for smart green communication networks", Comput. Commun. 152 (2020) 223–231.
S. Aghabozorgi, Y.W. Teh, "Stock market co-movement assessment using a three-phase clustering method", Expert Syst. Appl. 41 (2014) 1301–1314.
L. Sun, G. Chen, H. Xiong, C. Guo, "Cluster analysis in data-driven management and decisions", J. Manag. Sci. Eng. 2 (2017) 227–251.
J. Zhou, Z. Lai, C. Gao, X. Yue, W.K. Wong, "Rough-fuzzy clustering based on two-stage three-way approximations", IEEE Access 6 (2018) 27541–27554.
T. Gocken, M. Yaktubay, "Comparison of different clustering algorithms via genetic algorithm for vrptw", Int. J. Simul. Model. 18 (2019) 574–585.
K.M. Kumar, A.R.M. Reddy, "An efficient k-means clustering filtering algorithm using density based initial cluster centers", Inform. Sci. 418 (2017) 286–301.
R.C. Hrosik, E. Tuba, E. Dolicanin, R. Jovanovic, M. Tuba, "Brain image segmentation based on firefly algorithm combined with k-means clustering", Stud. Inf. Control 28 (2019) 167–176.
S. Sieranoja, P. Franti, "Fast and general density peaks clustering", Pattern Recognit. Lett. 128 (2019) 551–558.
Lei Xu, Chunxiao Jiang, Jian Wang, Jian Yuan, Yong Ren. "Information Security in Big Data: Privacy and Data Mining", IEEE Access, 2014, pp. 1149-1176
J. Vaidya, M. Kantarcoglu, and C. Clifton, "Privacy-preserving Naïve Bayes classification," Int. J. Very Large Data Bases, vol. 17, no. 4, pp. 879–898, 2008.
R. Akhter, R. J. Chowdhury, K. Emura, T. Islam, M. S. Rahman, and N. Rubaiyat, "Privacy-preserving two-party k-means clustering in malicious model", in Proc. IEEE 37th Annu. Comput. Softw. Appl. Conf. Workshops (COMPSACW), Jul. 2013, pp. 121–126.
I. De and A. Tripathy, "A secure two party hierarchical clustering approach for vertically partitioned data set with accuracy measure", in Proc. 2nd Int. Symp. Recent Adv. Intell. Informat., 2014, pp. 153–162.
1.Agrawal, R., Srikant, R.: "Privacy preserving data mining". In: Proceedings of the 2000 ACM SIGMOD Conference on Management of Data, Dallas, TX, May 2000, pp. 439–450. ACM Press, New York (2000)
Jha, S., Kruger, L., McDaniel, P.: "Privacy Preserving Clustering". In: di Vimercati, S.d.C., Syverson, P.F., Gollmann, D. (eds.) ESORICS 2005. LNCS, vol. 3679, pp. 397–417. Springer, Heidelberg (2005)
Vaidya, J., Clifton, C.: "Privacy-preserving k-means clustering over vertically partitioned data". In: Proceedings of the 9th ACM SIGKDD International Conference on knowledge Discovery and Data Mining, Washington, DC, August 2003, ACM Press, New York (2003)
Krishna Prasad, P., Pandu Rangan, C.: "Privacy preserving BIRCH algorithm for clustering over vertically partitioned databases". In: Jonker, W., Petkovi´c, M. (eds.) SDM 2006. LNCS, vol. 4165, pp. 84–99. Springer, Heidelberg (2006)
Jagannathan, G., Wright, R.N.: "Privacy-preserving distributed k-means clustering over arbitrarily partitioned data". In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, Illinois, August 2005, pp. 593–599. ACM Press, New York (2005)
N. Kumar, M. Rathee, N. Chandran, D. Gupta, A. Rastogi, and R. Sharma, "CrypTFlow: Secure TensorFlow inference," in IEEE S&P, 2020.
D. Rathee, M. Rathee, N. Kumar, N. Chandran, D. Gupta, A. Rastogi, and R. Sharma, "CrypTFlow2: Practical 2-party secure inference," in CCS, 2020.
P. Mishra, R. Lehmkuhl, A. Srinivasan, W. Zheng, and R. A. Popa, "Delphi: A cryptographic inference service for neural networks," in USENIX Security, 2020.
Dhabliya, M. D. (2018). A Scientific Approach and Data Analysis of Chemicals used in Packed Juices. Forest Chemicals Review, 01–05.
A. Patra, T. Schneider, A. Suresh, and H. Yalame, "ABY2. 0: Improved mixed-protocol secure two-party computation," in USENIX Security, 2021.
V. Haralampieva, D. Rueckert, and J. Passerat-Palmbach, "A systematic comparison of encrypted machine learning solutions for image classification," in PPMLP, 2020.
F. Boemer, R. Cammarota, D. Demmler, T. Schneider, and H. Yalame, "MP2ML: A mixed-protocol machine learning framework for private inference," in ARES, 2020.
N.R. Adam and J.C. Wortmann. "Security-control methods for statistical databases: A comparative study". ACM Computing Surveys, 21, 1989.
D.E. Denning. "A security model for the statistical database problem". ACM Transactions on Database Systems (TODS), 5, 1980.
M. Klusch, S. Lodi, and Gianluca Moro. "Distributed clustering based on sampling local density estimates". In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI 2003), pages 485–490, 2003.
S. Merugu and J. Ghosh. "Privacy-preserving distributed clustering using generative models". In Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM 2003), pages 211–218, 2003.
S. Oliveira and O. R. Zaiane. “Privacy preserving clustering by data transformation”. In XVIII Simp´osioBrasileiro de Bancos de Dados, 6-8 de Outubro (SBBD 2003), pages 304–318, 2003.
P. Bunn and R. Ostrovsky. 2007. "Secure Two-Party K-means Clustering". In CCS. ACM.
Dhabliya, D. (2021a). AODV Routing Protocol Implementation: Implications for Cybersecurity. In Intelligent and Reliable Engineering Systems (pp. 144–148). CRC Press.
A. Jäschke and F. Armknecht. 2018. "Unsupervised Machine Learning on Encrypted Data". In SAC. Springer.
Mahesh, R., and T. Meyyappan. "Fuzzy based cell generalization to improve the data utility with minimal loss of information." Journal of Intelligent & Fuzzy Systems 37, no. 1 (2019): 217-225.
P. Mohassel, M. Rosulek, and N. Trieu. 2020. "Practical Privacy-Preserving Kmeans Clustering". In PoPETS. Sciendo.
M. S. Rahman, A. Basu, and S. Kiyomoto. 2017. "Towards Outsourced Privacy- Preserving Multiparty DBSCAN", In Pacific Rim International symposium on Dependable Computing. IEEE.
Kasthuri, S., and T. Meyyappan. "Detection of sensitive items in market basket database using association rule mining for privacy preserving." In 2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering, pp. 200-203. IEEE, 2013.
Mahesh, R., and T. Meyyappan. "Anonymization technique through record elimination to preserve privacy of published data." In 2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering, pp. 328-332. IEEE, 2013.
M. Ester, H.-P. Kriegel, J. Sander, X. Xu, et al. 1996. "A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise", In SIGKDD Conference on Knowledge Discovery and Data Mining. ACM.
I.V. Anikin, K. Alnajjar, "Fuzzy stream cipher system", 2015 International Siberian Conference on Control and Communications (SIBCON), 2015. DOI: 10.1109/SIBCON.2015.7146976.
Zhuo, Linlin, Kenli Li, Bo Liao, Hao Li, Xiaohui Wei, and Keqin Li. "HCFS: a density peak based clustering algorithm employing a hierarchical strategy", IEEE Access 7 (2019): 74612-74624.
G. Karypis, E.-H. Han, and V. Kumar, "Chameleon: Hierarchical clustering using dynamic modeling," Computer, vol. 32, no. 8, pp. 68_75, Aug. 1999.
M. Lichman, ``Uci machine learning repository,'' Tech. Rep. 2013.
Thang, Vu Viet, D. V. Pantiukhin, and A. I. Galushkin. "A hybrid clustering algorithm: the fastDBSCAN", In 2015 International Conference on Engineering and Telecommunication (EnT), pp. 69-74. IEEE, 2015.
Bhat , A. H. ., & H V, B. A. . (2023). E2BNAR: Energy Efficient Backup Node Assisted Routing for Wireless Sensor Networks . International Journal on Recent and Innovation Trends in Computing and Communication, 11(3s), 193–204. https://doi.org/10.17762/ijritcc.v11i3s.618
Auma, G., Goldberg, R., Oliveira, A., Seo-joon, C., & Nakamura, E. Enhancing Sentiment Analysis Using Transfer Learning Techniques. Kuwait Journal of Machine Learning, 1(3). Retrieved from http://kuwaitjournals.com/index.php/kjml/article/view/129
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.