Balancing Privacy and Utility in K-Anonymity: A Comparison of Top-Down, Mondrian, and Improved Mondrian Algorithms

Authors

  • Anup Maurya Ph.D. Scholar,Pacific University, Udaipur Rajasthan, India
  • Manuj Joshi Associate Professor, Pacific University, Udaipur Rajasthan, India.

Keywords:

Mondrian, privacy-preserving, algorithm, generalizing, consideration

Abstract

K-anonymity is a privacy-preserving technique used to protect sensitive information in datasets by generalizing or suppressing identifying information. In this research paper, we examine three algorithms for achieving k-anonymity: top-down, Mondrian, and improved Mondrian. The top-down algorithm begins by selecting the highest-level attribute in a hierarchical data structure and generalizing it to the least specific value. The process is then repeated for the next highest-level attribute until k-anonymity is achieved. The Mondrian algorithm is a partition-based approach that recursively splits the dataset into smaller partitions until k-anonymity is achieved. The enhanced Mondrian algorithm takes into account the data distribution within each partition. According to the results of the experiments, the improved algorithm performs better than the top-down and mondrian algorithms when it comes to both execution time and information loss. The latest version of Mondrian's algorithm significantly reduced the number of partition sizes required to achieve K-anonymy. It is a better method than ever before when it comes to protecting large datasets. The ability of the algorithm to take into consideration the distribution of attribute values in each partition allows it to perform more efficient privacy-preserving work.

Downloads

Download data is not yet available.

References

B. Kenig and T. Tassa, “A practical approximation algorithm for optimal k-anonymity,” Data Min. Knowl. Discov., vol. 25, no. 1, pp. 134–168, 2012, doi: 10.1007/s10618-011-0235-9.

J. Soria-Comas, J. Domingo-Ferrer, D. Sánchez, and S. Martínez, “Enhancing data utility in differential privacy via microaggregation-based k-anonymity,” VLDB J., vol. 23, no. 5, pp. 771–794, 2014, doi: 10.1007/s00778-014-0351-4.

P. R. Bhaladhare and D. C. Jinwala, “Novel approaches for privacy preserving data mining in k-anonymity model,” J. Inf. Sci. Eng., vol. 32, no. 1, pp. 63–78, 2016.

J. Domingo-Ferrer and V. Torra, “Ordinal, continuous and heterogeneous k-anonymity through microaggregation,” Data Min. Knowl. Discov., vol. 11, no. 2, pp. 195–212, 2005, doi: 10.1007/s10618-005-0007-5.

C. C. Aggarwal, “On k-anonymity and the curse of dimensionality,” VLDB 2005 - Proc. 31st Int. Conf. Very Large Data Bases, vol. 2, pp. 901–909, 2005.

C. Chiu and C. Tsai, “A k -Anonymity Clustering Method for Effective Data,” pp. 89–99, 2007.

S. Vijayarani, A. Tamilarasi, and M. Sampoorna, “Analysis of privacy preserving K-anonymity methods and techniques,” Proc. 2010 Int. Conf. Commun. Comput. Intell. INCOCCI-2010, pp. 540–545, 2010.

Mahajan, R. ., Patil, P. R. ., Potgantwar, A. ., & Bhaladhare, P. R. . (2023). Novel Load Balancing Optimization Algorithm to Improve Quality-of-Service in Cloud Environment. International Journal on Recent and Innovation Trends in Computing and Communication, 11(2), 57–64. https://doi.org/10.17762/ijritcc.v11i2.6110

R. C. W. Wong, J. Li, A. W. C. Fu, and K. Wang, “(α, k)-anonymity: An enhanced k-anonymity model for privacy-preserving data publishing,” Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., vol. 2006, pp. 754–759, 2006, doi: 10.1145/1150402.1150499.

K. LeFevre, D. J. DeWitt, and R. Ramakrishnan, “Mondrian multidimensional K-anonymity,” Proc. - Int. Conf. Data Eng., vol. 2006, p. 25, 2006, doi: 10.1109/ICDE.2006.101.

A. Friedman, R. Wolff, and A. Schuster, “Providing k-anonymity in data mining,” VLDB J., vol. 17, no. 4, pp. 789–804, 2008, doi: 10.1007/s00778-006-0039-5.

L. Zhang, J. Xuan, R. Si, and R. Wang, “An Improved Algorithm of Individuation K-Anonymity for Multiple Sensitive Attributes,” Wirel. Pers. Commun., vol. 95, no. 3, pp. 2003–2020, 2017, doi: 10.1007/s11277-016-3922-4.

C. Ling, W. Zhang, and H. He, “K-anonymity privacy-preserving algorithm for IoT applications in virtualization and edge computing,” Cluster Comput., vol. 4, 2022, doi: 10.1007/s10586-022-03755-4.

B. Sowmiya and E. Poovammal, “A Heuristic K-Anonymity Based Privacy Preserving for Student Management Hyperledger Fabric blockchain,” Wirel. Pers. Commun., vol. 127, no. 2, pp. 1359–1376, 2021, doi: 10.1007/s11277-021-08582-1.

D. Slijepčević, M. Henzl, L. Daniel Klausner, T. Dam, P. Kieseberg, and M. Zeppelzauer, “k-Anonymity in practice: How generalisation and suppression affect machine learning classifiers,” Comput. Secur., vol. 111, 2021, doi: 10.1016/j.cose.2021.102488.

W. Mahanan, W. A. Chaovalitwongse, and J. Natwichai, “Data privacy preservation algorithm with k-anonymity,” World Wide Web, vol. 24, no. 5, pp. 1551–1561, 2021, doi: 10.1007/s11280-021-00922-2.

Y. T. Tsou et al., “(k, ε, δ)-Anonymization: privacy-preserving data release based on k-anonymity and differential privacy,” Serv. Oriented Comput. Appl., vol. 15, no. 3, pp. 175–185, 2021, doi: 10.1007/s11761-021-00324-2.

Y. C. Tsai, S. L. Wang, I. H. Ting, and T. P. Hong, “Flexible sensitive K-anonymization on transactions,” World Wide Web, vol. 23, no. 4, pp. 2391–2406, 2020, doi: 10.1007/s11280-020-00798-8.

W. Mahanan, W. A. Chaovalitwongse, and J. Natwichai, “Data anonymization: a novel optimal k-anonymity algorithm for identical generalization hierarchy data in IoT,” Serv. Oriented Comput. Appl., vol. 14, no. 2, pp. 89–100, 2020, doi: 10.1007/s11761-020-00287-w.

J. Wang and M. P. Kwan, “Daily activity locations k-anonymity for the evaluation of disclosure risk of individual GPS datasets,” Int. J. Health Geogr., vol. 19, no. 1, pp. 1–14, 2020, doi: 10.1186/s12942-020-00201-9.

K. Arava and S. Lingamgunta, “Adaptive k-Anonymity Approach for Privacy Preserving in Cloud,” Arab. J. Sci. Eng., vol. 45, no. 4, pp. 2425–2432, 2020, doi: 10.1007/s13369-019-03999-0.

K. Murakami and T. Uno, “Optimization algorithm for k-anonymization of datasets with low information loss,” Int. J. Inf. Secur., vol. 17, no. 6, pp. 631–644, 2018, doi: 10.1007/s10207-017-0392-y.

R. M. E. Rajendran Keerthana, Jayabalan Manoj, “A Study on k-anonymity, l-diversity, and t-closeness Techniques focusing Medical Data,” IJCSNS Int. J. Comput. Sci. Netw. Secur., vol. 17, no. 12, pp. 172–177, 2017, [Online]. Available: https://www.researchgate.net/publication/322330948_A_Study_on_k-anonymity_l-diversity_and_t-closeness_Techniques_focusing_Medical_Data.

D. Tran and M. Sokolova, “Applying multi-label and multi-class classification to enhance K-anonymity in sequential releases,” Prog. Artif. Intell., vol. 5, no. 4, pp. 277–288, 2016, doi: 10.1007/s13748-016-0096-y.

K. El Emam et al., “A Globally Optimal k-Anonymity Method for the De-Identification of Health Data,” J. Am. Med. Informatics Assoc., vol. 16, no. 5, pp. 670–682, 2009, doi: 10.1197/jamia.M3144.

Ali Ahmed, Machine Learning in Healthcare: Applications and Challenges , Machine Learning Applications Conference Proceedings, Vol 1 2021.

R. Bredereck, A. Nichterlein, R. Niedermeier, and G. Philip, “The effect of homogeneity on the computational complexity of combinatorial data anonymization,” Data Min. Knowl. Discov., vol. 28, no. 1, pp. 65–91, 2014, doi: 10.1007/s10618-012-0293-7.

“Adult income dataset | Kaggle.” [Online]. Available: https://www.kaggle.com/datasets/wenruliu/adult-income-dataset.

H. Wimmer and L. Powell, “A Comparison of the Effects of K-Anonymity on Machine Learning Algorithms,” Int. J. Adv. Comput. Sci. Appl., vol. 5, no. 11, pp. 1–9, 2014, doi: 10.14569/ijacsa.2014.051126.

K-anonymity (src – datacamp)

Downloads

Published

01.07.2023

How to Cite

Maurya, A. ., & Joshi, M. . (2023). Balancing Privacy and Utility in K-Anonymity: A Comparison of Top-Down, Mondrian, and Improved Mondrian Algorithms. International Journal of Intelligent Systems and Applications in Engineering, 11(7s), 224–235. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/2948