Evaluating the Effectiveness of Clustering-Based K-Anonymity and KNN Cluster for Privacy Preservation
Keywords:Data anonymization, Privacy preservation, Clustering-based k-Anonymity, KNN Cluster K-Member, Privacy-enhancing techniques.
Due to the increasing number of data-driven innovations, privacy preservation has emerged as a paramount concern in the domain of data anonymity. Several techniques have been proposed to address this issue, and this paper aims to evaluate the three most popular ones. The study will look into clustering-based k-anonymity with KNN Cluster and K-Member with K=5 and 10. The importance of preserving personal information has become more apparent in the age of big data. Due to the increasing processing capabilities and the amount of data collected, the risk of unauthorized access and breaches has increased. This has prompted the need for effective data anonymization strategies. One of the most common methods of protecting personal information is by grouping similar people into clusters. This method ensures that each member of the group is indistinguishable from the others within the cluster. Another method is the KNN Cluster algorithm, which takes into account the proximity of the individuals to the feature. Finally, the K-Member algorithm is designed to identify the most representative of a given dataset. The paper aims to analyze and compare the three most popular methods for protecting personal information. We performed experiments with varying values of k, such as k=5 and k=10, to evaluate their privacy preservation effectiveness. The study is conducted on a scale of data utility, computational efficiency, and information loss. The results of the study will be analyzed and compared to provide a comprehensive understanding of the various limitations and strengths of each approach. This research will also help policymakers, data scientists, and data custodians make informed decisions when it comes to implementing anonymization strategies. The paper provides an in-depth evaluation of the clustering-based KNN Cluster, K-Member, and K-Anonymity techniques, focusing on their privacy effectiveness when protecting varying values of k. Its findings will help advance the field of privacy-enhancing mechanisms in the context of data-driven applications, and it will facilitate the creation of more robust and efficient methods
D. Slijepčević, M. Henzl, L. Daniel Klausner, T. Dam, P. Kieseberg, and M. Zeppelzauer, “k-Anonymity in practice: How generalisation and suppression affect machine learning classifiers,” Comput. Secur., vol. 111, p. 102488, 2021, doi: 10.1016/j.cose.2021.102488.
K. Wang, W. Zhao, J. Cui, Y. Cui, and J. Hu, “A K-anonymous clustering algorithm based on the analytic hierarchy process,” J. Vis. Commun. Image Represent., vol. 59, pp. 76–83, 2019, doi: 10.1016/j.jvcir.2018.12.052.
S. Chester, B. M. Kapron, G. Srivastava, and S. Venkatesh, “Complexity of social network anonymization,” Soc. Netw. Anal. Min., vol. 3, no. 2, pp. 151–166, 2013, doi: 10.1007/s13278-012-0059-7.
K. Guo and Q. Zhang, “Fast clustering-based anonymization approaches with time constraints for data streams,” Knowledge-Based Syst., vol. 46, pp. 95–108, 2013, doi: 10.1016/j.knosys.2013.03.007.
S. Fletcher and M. Z. Islam, “An anonymization technique using intersected decision trees,” J. King Saud Univ. - Comput. Inf. Sci., vol. 27, no. 3, pp. 297–304, 2015, doi: 10.1016/j.jksuci.2014.06.015.
V. Khetani, Y. Gandhi, S. Bhattacharya, S. N. Ajani, and S. Limkar, “INTELLIGENT SYSTEMS AND APPLICATIONS IN ENGINEERING Cross-Domain Analysis of ML and DL : Evaluating their Impact in Diverse Domains,” vol. 11, pp. 253–262, 2023.
L. Sweeney, “Achieving k-anonymity privacy protection using generalization and suppression,” Int. J. Uncertainty, Fuzziness Knowlege-Based Syst., vol. 10, no. 5, pp. 571–588, 2002, doi: 10.1142/S021848850200165X.
M. E. Kabir, H. Wang, and E. Bertino, “Efficient systematic clustering method for k-anonymization,” Acta Inform., vol. 48, no. 1, pp. 51–66, 2011, doi: 10.1007/s00236-010-0131-6.
X. Sun, L. Sun, and H. Wang, “Extended k-anonymity models against sensitive attribute disclosure,” Comput. Commun., vol. 34, no. 4, pp. 526–535, 2011, doi: 10.1016/j.comcom.2010.03.020.
G. Loukides, A. Gkoulalas-Divanis, and J. Shao, “Efficient and flexible anonymization of transaction data,” Knowl. Inf. Syst., vol. 36, no. 1, pp. 153–210, 2013, doi: 10.1007/s10115-012-0544-3.
F. Ben Fredj, N. Lammari, and I. Comyn-Wattiau, “Abstracting anonymization techniques: A prerequisite for selecting a generalization algorithm,” Procedia Comput. Sci., vol. 60, no. 1, pp. 206–215, 2015, doi: 10.1016/j.procs.2015.08.120.
Z. G. Chen, H. S. Kang, S. N. Yin, and S. R. Kim, “An efficient privacy protection in mobility social network services with novel clustering-based anonymization,” Eurasip J. Wirel. Commun. Netw., vol. 2016, no. 1, pp. 1–9, 2016, doi: 10.1186/s13638-016-0767-1.
H. Lee, S. Kim, J. W. Kim, and Y. D. Chung, “Utility-preserving anonymization for health data publishing,” BMC Med. Inform. Decis. Mak., vol. 17, no. 1, pp. 1–12, 2017, doi: 10.1186/s12911-017-0499-0.
W. L. Croft, W. Shi, J. R. Sack, and J. P. Corriveau, “Comparison of approaches of geographic partitioning for data anonymization,” J. Geogr. Syst., vol. 19, no. 3, pp. 221–248, 2017, doi: 10.1007/s10109-017-0251-4.
J. Salas and J. Domingo-Ferrer, “Some Basics on Privacy Techniques, Anonymization and their Big Data Challenges,” Math. Comput. Sci., vol. 12, no. 3, pp. 263–274, 2018, doi: 10.1007/s11786-018-0344-6.
R. Wei, H. Tian, and H. Shen, “Improving k-anonymity based privacy preservation for collaborative filtering,” Comput. Electr. Eng., vol. 67, pp. 509–519, 2018, doi: 10.1016/j.compeleceng.2018.02.017.
Z. El Ouazzani and H. El Bakkali, “A new technique ensuring privacy in big data: K-Anonymity without prior value of the threshold k,” Procedia Comput. Sci., vol. 127, pp. 52–59, 2018, doi: 10.1016/j.procs.2018.01.097.
A. Majeed, S. Khan, and S. O. Hwang, “A Comprehensive Analysis of Privacy-Preserving Solutions Developed for Online Social Networks,” Electron., vol. 11, no. 13, 2022, doi: 10.3390/electronics11131931.
O. Takaki, N. Hamamoto, A. Takefusa, S. Yokoyama, and K. Aida, “Reasonable Setting Values for Anonymization Algorithms for Online Educational Data Analysis Support System,” Procedia Comput. Sci., vol. 207, no. Kes, pp. 2556–2566, 2022, doi: 10.1016/j.procs.2022.09.314.
A. Kiran and N. Shirisha, “K-Anonymization approach for privacy preservation using data perturbation techniques in data mining,” Mater. Today Proc., vol. 64, pp. 578–584, 2022, doi: 10.1016/j.matpr.2022.05.117.
L. Caruccio, D. Desiato, G. Polese, G. Tortora, and N. Zannone, “A decision-support framework for data anonymization with application to machine learning processes,” Inf. Sci. (Ny)., vol. 613, pp. 1–32, 2022, doi: 10.1016/j.ins.2022.09.004.
R. C. J. Neto, P. Mérindol, and F. Theoleyre, “Enabling privacy by anonymization in the collection of similar data in multi-domain IoT,” Comput. Commun., vol. 203, no. January 2022, pp. 60–76, 2023, doi: 10.1016/j.comcom.2023.02.022.
 Y. T. Tsou et al., “(k, ε, δ)-Anonymization: privacy-preserving data release based on k-anonymity and differential privacy,” Serv. Oriented Comput. Appl., vol. 15, no. 3, pp. 175–185, 2021, doi: 10.1007/s11761-021-00324-2.
U. Sopaoglu and O. Abul, “Classification utility aware data stream anonymization,” Appl. Soft Comput., vol. 110, p. 107743, 2021, doi: 10.1016/j.asoc.2021.107743.
A. Girka, V. Terziyan, M. Gavriushenko, and A. Gontarenko, “Anonymization as homeomorphic data space transformation for privacy-preserving deep learning,” Procedia Comput. Sci., vol. 180, pp. 867–876, 2021, doi: 10.1016/j.procs.2021.01.337.
B. B. Mehta and U. P. Rao, “Improved l-diversity: Scalable anonymization approach for Privacy Preserving Big Data Publishing,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 4, pp. 1423–1430, 2022, doi: 10.1016/j.jksuci.2019.08.006.
S. Chakraborty and B. K. Tripathy, “Alpha-anonymization techniques for privacy preservation in social networks,” Soc. Netw. Anal. Min., vol. 6, no. 1, pp. 1–11, 2016, doi: 10.1007/s13278-016-0337-x.
W. Y. Lin, D. C. Yang, and J. T. Wang, “Privacy preserving data anonymization of spontaneous ADE reporting system dataset,” BMC Med. Inform. Decis. Mak., vol. 16, no. Suppl 1, 2016, doi: 10.1186/s12911-016-0293-4.
“Adult - UCI Machine Learning Repository.” .
Nair, K. S. S. . (2023). Rapidly Convergent Series from Positive Term Series. International Journal on Recent and Innovation Trends in Computing and Communication, 11(3), 79–86. https://doi.org/10.17762/ijritcc.v11i3.6204
Kshirsagar, P. R., Yadav, R. K., Patil, N. N., & Makarand L, M. (2022). Intrusion Detection System Attack Detection and Classification Model with Feed-Forward LSTM Gate in Conventional Dataset. Machine Learning Applications in Engineering Education and Management, 2(1), 20–29. Retrieved from http://yashikajournals.com/index.php/mlaeem/article/view/21
How to Cite
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.