Studies on the Use of Various Noise Strategies for Perturbing Data in Privacy-Preserving Data Mining
Keywords:
Privacy Preserving, Data Mining, Data perturbation, Maximum A Posteriori, Principal Component Analysis (PCA)Abstract
Privacy Preserving Data Mining techniques broadly fall under the categories of randomization and value distortion. Randomization replaces the existing value with a non existing value whereas value distortion modifies each value in the database. Data perturbation is widely used randomization technique that promises both valid data mining result and secured privacy. Existing research works were conducted with additive and multiplicative forms of data perturbation. Privacy is measured with the increasing level of error rate based on different types of attacks. Attacks in data perturbation are normally noise filtering methods which are based on linear and nonlinear filtering schemes. Linear filtering process produces the output data which will be in linear combination of the input values. The perturbed copies in additive and multiplicative data perturbation is generated with Laplace noiseWhen the input data is in linear distribution and is subjected to perturbation, linear filtering schemes are used to reconstruct the data and analyze the privacy measure. In nonlinear filtering schemes, the output of the filtering process will not be in linear distribution. This means that for a non linear data distribution, linear noise filtering schemes cannot be used for accurate analysis. Attacks considered so far are based on only linear filtering schemes. The perturbed models are evaluated with both privacy and utility of data mining. The work describes both linear and non linear type of attacks over the generated perturbed data. Attacks for additive data perturbation include Maximum A Posteriori (MAP) and Principal Component Analysis (PCA) based filtering method.
Downloads
References
S. M. Jesus and O. C. Rodríguez, "Estimating excess noise from deep sea mining: a simulated test case," OCEANS 2023 - Limerick, Limerick, Ireland, 2023, pp. 1-6, doi: 10.1109/OCEANSLimerick52467.2023.10244695.
T. Zhang, X. Liu, J. Wang and W. K. Victor Chan, "Improve Data Mining Performance by Noise Redistribution: A Mixed Integer Programming Formulation," 2023 IEEE International Conference on Smart Internet of Things (SmartIoT), Xining, China, 2023, pp. 190-195, doi: 10.1109/SmartIoT58732.2023.00034.
P. Dubey and A. Rajavat, "Effective K-means clustering algorithm for efficient data mining," 2023 2nd International Conference on Vision Towards Emerging Trends in Communication and Networking Technologies (ViTECoN), Vellore, India, 2023, pp. 1-6, doi: 10.1109/ViTECoN58111.2023.10157179.
T. Zhang, X. Liu, J. Wang and W. K. Victor Chan, "Improve Data Mining Performance by Noise Redistribution: A Mixed Integer Programming Formulation," 2023 IEEE International Conference on Smart Internet of Things (SmartIoT), Xining, China, 2023, pp. 190-195, doi: 10.1109/SmartIoT58732.2023.00034.
P. Lin, S. Peng, Y. Xiang, C. Li, X. Cui and W. Zhang, "Structure-Oriented CUR Low-Rank Approximation for Random Noise Attenuation of Seismic Data," in IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1-13, 2023, Art no. 4504713, doi: 10.1109/TGRS.2023.3297999.
M. F. Uddin, "An Enhanced Machine Learning Approach to Identify Noise and Detect Relevant Structures for Predictive Modeling," 2023 9th International Conference on Information Technology Trends (ITT), Dubai, United Arab Emirates, 2023, pp. 55-60, doi: 10.1109/ITT59889.2023.10184237.
L. Wang and T. Zhang, "The Application of Data Mining Algorithm in the Legal Protection of Personal Data," 2022 IEEE 2nd International Conference on Data Science and Computer Application (ICDSCA), Dalian, China, 2022, pp. 1339-1342, doi: 10.1109/ICDSCA56264.2022.9988630.
Y. Lu, X. -D. Liu, J. -J. Song, T. -H. Xu and E. C. C. Tsang, "Data Noise Suppression-based Attribute Reduction," 2022 International Conference on Wavelet Analysis and Pattern Recognition (ICWAPR), Toyama, Japan, 2022, pp. 89-94, doi: 10.1109/ICWAPR56446.2022.9947188.
Y. Lu, X. -D. Liu, J. -J. Song, T. -H. Xu and E. C. C. Tsang, "Data Noise Suppression-based Attribute Reduction," 2022 International Conference on Wavelet Analysis and Pattern Recognition (ICWAPR), Toyama, Japan, 2022, pp. 89-94, doi: 10.1109/ICWAPR56446.2022.9947188.
S. Wang, P. Song, J. Tan, B. He, Q. Wang and G. Du, "Attention-Based Neural Network for Erratic Noise Attenuation From Seismic Data With a Shuffled Noise Training Data Generation Strategy," in IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1-16, 2022, Art no. 5918916, doi: 10.1109/TGRS.2022.3197929.
R. Churchill and L. Singh, "Topic-Noise Models: Modeling Topic and Noise Distributions in Social Media Post Collections," 2021 IEEE International Conference on Data Mining (ICDM), Auckland, New Zealand, 2021, pp. 71-80, doi: 10.1109/ICDM51629.2021.00017.
J. Shan, Y. Lin and X. Zhu, "A New Range Noise Perturbation Method based on Privacy Preserving Data Mining," 2020 IEEE International Conference on Artificial Intelligence and Information Systems (ICAIIS), Dalian, China, 2020, pp. 131-136, doi: 10.1109/ICAIIS49377.2020.9194850.
C. Boonseng, R. Boonseng and K. Kularbphettong, "Noise and Vibration Analysis of Dry-Type Power Transformer for Monitoring and Data Mining Applications," 2019 22nd International Conference on Electrical Machines and Systems (ICEMS), Harbin, China, 2019, pp. 1-5, doi: 10.1109/ICEMS.2019.8922194.
Y. Shi, "Multi-Dimensional Processing for Big Data with Noise," 2019 IEEE International Conference on Power, Intelligent Computing and Systems (ICPICS), Shenyang, China, 2019, pp. 686-690, doi: 10.1109/ICPICS47731.2019.8942582.
S. Yaji and B. Neelima, "Optimizing Privacy-Preserving Data Mining Model in Multivariate Datasets," 2019 PhD Colloquium on Ethically Driven Innovation and Technology for Society (PhD EDITS), Bangalore, India, 2019, pp. 1-3, doi: 10.1109/PhDEDITS47523.2019.8986965.
W. Li, H. Zhu, W. Liu, D. Chen, J. Jiang and Q. Jin, "An Anti-Noise Process Mining Algorithm Based on Minimum Spanning Tree Clustering," in IEEE Access, vol. 6, pp. 48756-48764, 2018, doi: 10.1109/ACCESS.2018.2865540.
H. S. Aggarwal, A. Kansal and A. Jamshed, "Noisy information and progressive data-mining giving rise to privacy preservation," 2017 3rd International Conference on Advances in Computing,Communication & Automation (ICACCA) (Fall), Dehradun, India, 2017, pp. 1-5, doi: 10.1109/ICACCAF.2017.8344673.
Y. Sei and A. Ohsuga, "Private True Data Mining: Differential Privacy Featuring Errors to Manage Internet-of-Things Data," in IEEE Access, vol. 10, pp. 8738-8757, 2022, doi: 10.1109/ACCESS.2022.3143813.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.