A Privacy-Preserving Data Mining Approach in Multi-Dimensional Data Set based on the Random and Cumulative Integrated Noise

Authors

Keywords:

Cumulative noise, data mining, data perturbation, privacy preservation, RPCN

Abstract

Since the emergence of Big Data, data mining has changed for the better. We now have unprecedented opportunities to discover new knowledge and support decision-making. While Big Data methods are gaining traction, privacy is becoming a major concern. In order to overcome these difficulties, we employ new data perturbation approaches based on random translation and random projections, as well as additive noise. In addition, we assess the performance of different data perturbation modes and their corresponding attack modes using input-output maximum a posteriori (MAP) attacks. Firstly, we evaluate random projections for online classifications. Two data perturbation modalities are assessed: random translation (RT) and random projection (R). Independent noise (RPIN), or cumulative noise (RCPN) are also assessed. Our results show that a combination of 2 MAP attacks (MAX (A-RT, A-RCPN-1)) vs. RPCN method is the most efficient. As the data record moves away from the known data record, the attack becomes less efficient, indicating that over time, RPCN gradually improves data privacy. Thanks to our work, we can apply these perturbation techniques to more than just classification tasks. We can apply them to cluster, anomaly detection and regression, which opens up new research directions. We are also exploring privacy preservation techniques that are tailored to the streaming nature of real-world data sets to improve privacy for nominal data.

Downloads

Download data is not yet available.

References

Bifet A, Kirkby R. Data stream mining: a practical approach. The university of Waikato. 2009 Aug. Centers for Disease Control and Prevention. National survey of family growth data. 2005 (retrieved February12,2019)http://www.greenteapress.com/thinkstats/nsfg.html.

Chamikara MAP, Bertók P, Liu D, Camtepe S, Khalil I. An efficient and scalable privacy preserving algorithm for big data and data streams.ComputSecur2019;87:101570. http://dx.doi.org/10.1016/j.cose.2019.101570.

Chamikara MAP, Bertók P, Liu D, Camtepe S, Khalil I. An efficient and scalable privacy preserving algorithm for big data and data streams.ComputSecur2019;87:101570. http://dx.doi.org/10.1016/j.cose.2019.101570.

Matatov N, Rokach L, Maimon O. Privacy-preserving data mining: A feature set partitioning approach. Information Sciences. 2010 Jul 15;180(14):2696-720. https://doi.org/10.1016/j.ins.2010.03.011

Chamikara MAP, Bertók P, Liu D, Camtepe S, Khalil I. Efficient privacy preservation of big data for accurate data mining. Inform Sci 2020;527:420–43. http://dx.doi.org/10.1016/j.ins.2019.05.053.

Chamikara MAP, Bertok P, Khalil I, Liu D, Camtepe S. Privacy preserving distributed machine learning with federated learning. ComputCommun2021;171:112–25. http://dx.doi.org/10.1016/j.comcom.2021.02.014

Chamikara MA, Bertok P, Liu D, Camtepe S, Khalil I. Efficient privacy preservation of big data for accurate data mining. Information Sciences.2020Jul1;527:42043. https://doi.org/10.1016/j.ins.2019.05.053

Virupaksha S, Dondeti V. Anonymized noise addition in subspaces for privacy preserved data mining in high dimensional continuous data. Peer-to-Peer Networking and Applications. 2021 May;14(3):1608-28. https://doi.org/10.1007/s12083-021-01080-y

Deshkar PA, Patil JM, Niranjane PB, Niranjane V, Thakur N, Dabhade VD. Studies on the Use of Various Noise Strategies for Perturbing Data in Privacy-Preserving Data Mining. International Journal of Intelligent Systems and Applications in Engineering. 2024;12(8s):281-9.

Denham B, Pears R, Naeem MA. Enhancing random projection with independent and cumulative additive noise for privacy-preserving data stream mining. Expert Syst Appl 2020;152(8):321–35. http://dx.doi.org/10.1016/j.eswa.2020.113380.

Virupaksha S, Dondeti V. Subspace based noise addition for privacy preserved data mining on high dimensional continuous data. Journal of Ambient Intelligence and Humanized Computing. 2020 Mar 21:1-7. https://doi.org/10.1007/s10618-005-1396-1

Fang W, Wen XZ, Zheng Y, Zhou M. A survey of big data security and privacy preserving. IETE Tech Rev 2017; 34(5):544–60. http://dx.doi.org/10. 1080/02564602.2016.1215269.

Kadampur MA. A noise addition scheme in decision tree for privacy preserving data mining. arXiv preprint arXiv:1001.3504. 2010 Jan 20.

https://doi.org/10.48550/arXiv.1001.3504

K. Xing, C. Hu, J. Yu, X. Cheng and F. Zhang, "Mutual Privacy Preserving $k$ -Means Clustering in Social Participatory Sensing," in IEEE Transactions on Industrial Informatics, vol. 13, no. 4, pp. 2066-2076, Aug. 2017, doi: 10.1109/TII.2017.2695487.

J. Vaidya, B. Shafiq, W. Fan, D. Mehmood and D. Lorenzi, "A Random Decision Tree Framework for Privacy-Preserving Data Mining," in IEEE Transactions on Dependable and Secure Computing, vol. 11, no. 5, pp. 399-411, Sept.-Oct. 2014, doi: 10.1109/TDSC.2013.43.

Z. Xiao, X. Fu and R. S. M. Goh, "Data Privacy-Preserving Automation Architecture for Industrial Data Exchange in Smart Cities," in IEEE Transactions on Industrial Informatics, vol. 14, no. 6, pp. 2780-2791, June 2018, doi: 10.1109/TII.2017.2772826.

H. Chen, K. Mei, Y. Zhou, N. Wang, M. Tang and G. Cai, "A Density Peaking Clustering Algorithm for Differential Privacy Preservation," in IEEE Access, vol. 11, pp. 54240-54253, 2023, doi: 10.1109/ACCESS.2023.3281652.

T. Tassa and D. J. Cohen, "Anonymization of Centralized and Distributed Social Networks by Sequential Clustering," in IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 2, pp. 311-324, Feb. 2013, doi: 10.1109/TKDE.2011.232.

M. Kanmaz, M. A. Aydin and A. Sertbas, "A New Geometric Data Perturbation Method for Data Anonymization Based on Random Number Generators," in Journal of Web Engineering, vol. 20, no. 6, pp. 1947-1970, September 2021, doi: 10.13052/jwe1540-9589.20613.

K. Bhaduri, M. D. Stefanski and A. N. Srivastava, "Privacy-Preserving Outlier Detection Through Random Nonlinear Data Distortion," in IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 41, no. 1, pp. 260-272, Feb. 2011, doi: 10.1109/TSMCB.2010.2051540.

Y. -T. Tsou, H. -L. Chen and J. -Y. Chen, "RoD: Evaluating the Risk of Data Disclosure Using Noise Estimation for Differential Privacy," in IEEE Transactions on Big Data, vol. 7, no. 1, pp. 214-226, 1 March 2021, doi: 10.1109/TBDATA.2019.2916108.

K. -P. Lin and M. -S. Chen, "On the Design and Analysis of the Privacy-Preserving SVM Classifier," in IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 11, pp. 1704-1717, Nov. 2011, doi: 10.1109/TKDE.2010.193.

S. M. Darwish, R. M. Essa, M. A. Osman and A. A. Ismail, "Privacy Preserving Data Mining Framework for Negative Association Rules: An Application to Healthcare Informatics," in IEEE Access, vol. 10, pp. 76268-76280, 2022, doi: 10.1109/ACCESS.2022.3192447.

L. Li, R. Lu, K. -K. R. Choo, A. Datta and J. Shao, "Privacy-Preserving-Outsourced Association Rule Mining on Vertically Partitioned Databases," in IEEE Transactions on Information Forensics and Security, vol. 11, no. 8, pp. 1847-1861, Aug. 2016, doi: 10.1109/TIFS.2016.2561241.

R. Mendes and J. P. Vilela, "Privacy-Preserving Data Mining: Methods, Metrics, and Applications," in IEEE Access, vol. 5, pp. 10562-10582, 2017, doi: 10.1109/ACCESS.2017.2706947.

Z. Zhou, Y. Wang, X. Yu and J. Miao, "A Targeted Privacy-Preserving Data Publishing Method Based on Bayesian Network," in IEEE Access, vol. 10, pp. 89555-89567, 2022, doi: 10.1109/ACCESS.2022.3201641.

Y. Li, M. Chen, Q. Li and W. Zhang, "Enabling Multilevel Trust in Privacy Preserving Data Mining," in IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 9, pp. 1598-1612, Sept. 2012, doi: 10.1109/TKDE.2011.124.

M. B. Malik, M. A. Ghazi and R. Ali, "Privacy Preserving Data Mining Techniques: Current Scenario and Future Prospects," 2012 Third International Conference on Computer and Communication Technology, Allahabad, India, 2012, pp. 26-32, doi: 10.1109/ICCCT.2012.15.

Downloads

Published

23.02.2024

How to Cite

Vyas, S. K. ., Karmore, S. ., & Jain, P. . (2024). A Privacy-Preserving Data Mining Approach in Multi-Dimensional Data Set based on the Random and Cumulative Integrated Noise . International Journal of Intelligent Systems and Applications in Engineering, 12(16s), 549–561. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/4892

Issue

Section

Research Article