Studies on the Use of Various Noise Strategies for Perturbing Data in Privacy-Preserving Data Mining

Authors

  • Prarthana A. Deshkar Assistant Professor, Department of Computer Technology, Yeshwantrao Chavan College of Engineering, Nagpur
  • Jaikumar M. Patil Associate Professor, Department of CSE, Shri Sant Gajanan Maharaj College of Engineering, Shegaon
  • Pornima B. Niranjane Assistant Professor. Department of CSE Babasaheb Naik College of Engineering,
  • Vaishali Niranjane Assistant Professor, Department of Electronics and Telecommunication Yeshwantrao Chavan College of Engineering, Nagpur
  • Nisha Thakur Assistant Professor, Faculty of Engineering and Technology, DMIHER(DU), Sawangi, Wardha
  • Vaibhav D. Dabhade Assistant Professor, MET Institute of Engineering, Nashik

Keywords:

Privacy Preserving, Data Mining, Data perturbation, Maximum A Posteriori, Principal Component Analysis (PCA)

Abstract

Privacy Preserving Data Mining techniques broadly fall under the categories of randomization and value distortion. Randomization replaces the existing value with a non existing value whereas value distortion modifies each value in the database. Data perturbation is widely used randomization technique that promises both valid data mining result and secured privacy. Existing research works were conducted with additive and multiplicative forms of data perturbation. Privacy is measured with the increasing level of error rate based on different types of attacks. Attacks in data perturbation are normally noise filtering methods which are based on linear and nonlinear filtering schemes. Linear filtering process produces the output data which will be in linear combination of the input values. The perturbed copies in additive and multiplicative data perturbation is generated with Laplace noiseWhen the input data is in linear distribution and is subjected to perturbation, linear filtering schemes are used to reconstruct the data and analyze the privacy measure. In nonlinear filtering schemes, the output of the filtering process will not be in linear distribution. This means that for a non linear data distribution, linear noise filtering schemes cannot be used for accurate analysis. Attacks considered so far are based on only linear filtering schemes. The perturbed models are evaluated with both privacy and utility of data mining. The work describes both linear and non linear type of attacks over the generated perturbed data. Attacks for additive data perturbation include Maximum A Posteriori (MAP) and Principal Component Analysis (PCA) based filtering method.

Downloads

Download data is not yet available.

References

S. M. Jesus and O. C. Rodríguez, "Estimating excess noise from deep sea mining: a simulated test case," OCEANS 2023 - Limerick, Limerick, Ireland, 2023, pp. 1-6, doi: 10.1109/OCEANSLimerick52467.2023.10244695.

T. Zhang, X. Liu, J. Wang and W. K. Victor Chan, "Improve Data Mining Performance by Noise Redistribution: A Mixed Integer Programming Formulation," 2023 IEEE International Conference on Smart Internet of Things (SmartIoT), Xining, China, 2023, pp. 190-195, doi: 10.1109/SmartIoT58732.2023.00034.

P. Dubey and A. Rajavat, "Effective K-means clustering algorithm for efficient data mining," 2023 2nd International Conference on Vision Towards Emerging Trends in Communication and Networking Technologies (ViTECoN), Vellore, India, 2023, pp. 1-6, doi: 10.1109/ViTECoN58111.2023.10157179.

T. Zhang, X. Liu, J. Wang and W. K. Victor Chan, "Improve Data Mining Performance by Noise Redistribution: A Mixed Integer Programming Formulation," 2023 IEEE International Conference on Smart Internet of Things (SmartIoT), Xining, China, 2023, pp. 190-195, doi: 10.1109/SmartIoT58732.2023.00034.

P. Lin, S. Peng, Y. Xiang, C. Li, X. Cui and W. Zhang, "Structure-Oriented CUR Low-Rank Approximation for Random Noise Attenuation of Seismic Data," in IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1-13, 2023, Art no. 4504713, doi: 10.1109/TGRS.2023.3297999.

M. F. Uddin, "An Enhanced Machine Learning Approach to Identify Noise and Detect Relevant Structures for Predictive Modeling," 2023 9th International Conference on Information Technology Trends (ITT), Dubai, United Arab Emirates, 2023, pp. 55-60, doi: 10.1109/ITT59889.2023.10184237.

L. Wang and T. Zhang, "The Application of Data Mining Algorithm in the Legal Protection of Personal Data," 2022 IEEE 2nd International Conference on Data Science and Computer Application (ICDSCA), Dalian, China, 2022, pp. 1339-1342, doi: 10.1109/ICDSCA56264.2022.9988630.

Y. Lu, X. -D. Liu, J. -J. Song, T. -H. Xu and E. C. C. Tsang, "Data Noise Suppression-based Attribute Reduction," 2022 International Conference on Wavelet Analysis and Pattern Recognition (ICWAPR), Toyama, Japan, 2022, pp. 89-94, doi: 10.1109/ICWAPR56446.2022.9947188.

Y. Lu, X. -D. Liu, J. -J. Song, T. -H. Xu and E. C. C. Tsang, "Data Noise Suppression-based Attribute Reduction," 2022 International Conference on Wavelet Analysis and Pattern Recognition (ICWAPR), Toyama, Japan, 2022, pp. 89-94, doi: 10.1109/ICWAPR56446.2022.9947188.

S. Wang, P. Song, J. Tan, B. He, Q. Wang and G. Du, "Attention-Based Neural Network for Erratic Noise Attenuation From Seismic Data With a Shuffled Noise Training Data Generation Strategy," in IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1-16, 2022, Art no. 5918916, doi: 10.1109/TGRS.2022.3197929.

R. Churchill and L. Singh, "Topic-Noise Models: Modeling Topic and Noise Distributions in Social Media Post Collections," 2021 IEEE International Conference on Data Mining (ICDM), Auckland, New Zealand, 2021, pp. 71-80, doi: 10.1109/ICDM51629.2021.00017.

J. Shan, Y. Lin and X. Zhu, "A New Range Noise Perturbation Method based on Privacy Preserving Data Mining," 2020 IEEE International Conference on Artificial Intelligence and Information Systems (ICAIIS), Dalian, China, 2020, pp. 131-136, doi: 10.1109/ICAIIS49377.2020.9194850.

C. Boonseng, R. Boonseng and K. Kularbphettong, "Noise and Vibration Analysis of Dry-Type Power Transformer for Monitoring and Data Mining Applications," 2019 22nd International Conference on Electrical Machines and Systems (ICEMS), Harbin, China, 2019, pp. 1-5, doi: 10.1109/ICEMS.2019.8922194.

Y. Shi, "Multi-Dimensional Processing for Big Data with Noise," 2019 IEEE International Conference on Power, Intelligent Computing and Systems (ICPICS), Shenyang, China, 2019, pp. 686-690, doi: 10.1109/ICPICS47731.2019.8942582.

S. Yaji and B. Neelima, "Optimizing Privacy-Preserving Data Mining Model in Multivariate Datasets," 2019 PhD Colloquium on Ethically Driven Innovation and Technology for Society (PhD EDITS), Bangalore, India, 2019, pp. 1-3, doi: 10.1109/PhDEDITS47523.2019.8986965.

W. Li, H. Zhu, W. Liu, D. Chen, J. Jiang and Q. Jin, "An Anti-Noise Process Mining Algorithm Based on Minimum Spanning Tree Clustering," in IEEE Access, vol. 6, pp. 48756-48764, 2018, doi: 10.1109/ACCESS.2018.2865540.

H. S. Aggarwal, A. Kansal and A. Jamshed, "Noisy information and progressive data-mining giving rise to privacy preservation," 2017 3rd International Conference on Advances in Computing,Communication & Automation (ICACCA) (Fall), Dehradun, India, 2017, pp. 1-5, doi: 10.1109/ICACCAF.2017.8344673.

Y. Sei and A. Ohsuga, "Private True Data Mining: Differential Privacy Featuring Errors to Manage Internet-of-Things Data," in IEEE Access, vol. 10, pp. 8738-8757, 2022, doi: 10.1109/ACCESS.2022.3143813.

Downloads

Published

13.12.2023

How to Cite

Deshkar, P. A. ., Patil, J. M. ., Niranjane, P. B. ., Niranjane, V. ., Thakur, N. ., & Dabhade, V. D. . (2023). Studies on the Use of Various Noise Strategies for Perturbing Data in Privacy-Preserving Data Mining. International Journal of Intelligent Systems and Applications in Engineering, 12(8s), 281–289. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/4119

Issue

Section

Research Article

Most read articles by the same author(s)