Investigations and Design of Privacy-Preserving Data Mining Technique for Secure Data Publishing

Authors

  • Mayur Rathi Department of Computer Science & Engineering, Shri Vaishnav Vidyapeeth Vishwavidyalaya Indore, India
  • Anand Rajavat Department of Computer Science & Engineering, Shri Vaishnav Vidyapeeth Vishwavidyalaya Indore, India

Keywords:

PPDM, Privacy, Noise Inclusion Algorithm, Dimensionality Reduction, Experimental Study

Abstract

Growing digital data motivates us to develop data-intensive applications, for providing diverse areas of solutions. In this context, sometimes we need cross-domain data, where multiple data owners are contributing their data. In this situation, data dimensionality and privacy is the main concern. Thus, to deal with the privacy issues recent techniques are applying cryptographic and noise-based techniques. But, these techniques are resource-consuming and suffer from poor performance in terms of data utility. In this paper, we investigate a Privacy Preserving Data Mining (PPDM) model to deal with complexities of PPDM modeling. First, we have described the dimensionality issue with PPDM and discuss the utilization of dimensionality reduction techniques such as Principle Component Analysis (PCA), Linear Discriminant Analysis (LDA), Kernel Principle Component Analysis (KPCA), and Correlation Coefficient (CC). Second, we have considered the issue of privacy and utility of data. The aim is to balance privacy requirements and data utility. Thus, a client-side random noise inclusion algorithm is proposed. Finally, by picking suitable and effective techniques, we implemented a complete PPDM model. The experiments on publically available UCI datasets are performed. Finally, performance in terms of data quality matrix, privacy matrix, and performance matrix is discussed.

Downloads

Download data is not yet available.

References

Mukhopadhyay, A.; Maulik, U.; Bandyopadhyay, S.; Coello, C. A. (2014) .A Survey of Multi-objective Evolutionary Algorithms for Data Mining: Part I. IEEE Transactions on Evolutionary Computation, Volume 18, No. 1, pp. 20-35.

Aldeen, Y. A. A. S.; Salleh M.; Razzaque, M. A. (2015) .A Comprehensive Review on Privacy Preserving Data Mining. SpringerPlus 4:694, pp. 1-36.

Danasana, J.; Kumar R.; Dey, D. (2012) .Mining Association Rules For Horizontally Partitioned Databases Using CK Secure Sum Technique. International Journal of Distributed and Parallel Systems, Volume 3, No.6, pp. 149-157.

Ponce, J.; Karahoca, A. (2009). Data Mining and Knowledge Discovery in Real Life Applications. Published by In-Teh, First published, Printed in Croatia (book)

Mendes, R.; Vilela, J. P. (2014). Privacy-Preserving Data Mining: Methods, Metrics, and Applications. IEEE, Vol. 5, 2169-3536.

Xu, L.; Jiang, C.; Wang, J.; Yuan, J.; Ren, Y. (2014) .Information Security in Big Data: Privacy and Data Mining. IEEE, Volume 2, 2169-3536.

Vasan, K. K.; Surendiran, B. (2016). Dimensionality reduction using Principal Component Analysis for network intrusion detection. Perspectives in Science, Elsevier, 8, 510—512.

Nettleton, D. F.; Puig, A. O.; Fornells, A. (2010) .A study of the effect of different types of noise on the precision of supervised learning techniques. Springer Science+Business Media B.V., Artif Intell Rev, 33, 275–306.

Anitha, P.; Krithka, G.; Choudhry, M. D. (2014). Machine Learning Techniques for learning features of any kind of data: A Case Study. International Journal of Advanced Research in Computer Engineering & Technology, Vol 3, Issue 12.

Chitradevi, B.; Thinaharan, N. (2015). Role of Decision Making in Data Mining Systems. International Journal of Trend in Research and Development, Volume 2, 5.

Swamy, S. K.; Manjula, S. H.; Venugopal, K. R.; Iyengar, S. S.; Patnaik, L. M. (2014). Association Rule Sharing Model for Privacy Preservation and Collaborative Data Mining Efficiency. IEEE Proceedings of RAECS UIET Panjab University Chandigarh.

Basiri, A.; Amirian, P.; Winstanley, A.; Moore, T. (2017). Making Tourist Guidance Systems more Intelligent, Adaptive and Personalised using Crowd Sourced Movement Data. Journal of Ambient Intell Human Comput, Springer, Volume 9, pp. 413-427.

Kou, G.; Peng, Y.; Shi, Y.; Chen, Z. (2007). Privacy-Preserving Data Mining of Medical Data Using Data Separation–Based Techniques. Data Science Journal, Volume 6, Supplement, pp. 429-434.

Chen, C. L. P.; Zhang, C-Y. (2014). Data-Intensive Applications, Challenges, Techniques and Technologies: A Survey on Big Data. Information Sciences 275, Elsevier, pp. 314–347.

Xu, C.; Tao, D.; Xu, C.; Rui, Y. (2014). Large-Margin Weakly Supervised Dimensionality Reduction. Proceedings of the 31 st International Conference on Machine Learning, Beijing, China, JMLR: W&CP, Volume 32, No 2, pp. 865-873.

Bouzas, D.; Arvanitopoulos, N.; Tefas, A. (2015). Graph Embedded Nonparametric Mutual Information For Supervised Dimensionality Reduction. IEEE Transactions on Neural Networks and Learning Systems, Volume 26, No. 5, pp. 957-963.

Sunitha, L.; Raju, M. B.; Srinivasa, B. S. (2013). A Comparative Study between Noisy Data and Outlier Data in Data Mining. International Journal of Current Engineering and Technology, Volume 3, No. 2, pp. 575-577.

Xiong, H.; Pandey, G.; Steinbach, M.; Kumar, V. (2006). Enhancing Data Analysis with Noise Removal. IEEE Transactions on Knowledge and Data Engineering, Volume 18, Issue 3, pp. 304-319.

Zhang, W.; Lin, Y.; Xiao, S.; Wu, J.; Zhou, S. (2015). Privacy Preserving Ranked Multi-Keyword Search for Multiple Data Owners in Cloud Computing. IEEE Transactions on Computers Journal of Latex Class Files, Vol. 6, No. 1, pp. 1-14.

Dong, Y.; Du, B.; Zhang, L.; Zhang, L. (2017). Dimensionality Reduction and Classification of Hyperspectral Images Using Ensemble Discriminative Local Metric Learning. IEEE Transactions on Geoscience and Remote Sensing, 0196-2892.

Lin, J. C. W.; Wu, J. M. T.; Viger, P. F.; Djenouri, Y.; Chen, C. H.; Zhang, Y. (2019). A Sanitization Approach to Secure Shared Data in an IoT Environment. IEEE, Vol 7, 2169-3536.

Artoni, F.; Delorme, A.; Makeig, S. (2018). Applying dimension reduction to EEG data by Principal Component Analysis reduces the quality of its subsequent Independent Component decomposition. Neuroimage; 175: 176–187.

Binol, H. (2018). Ensemble Learning Based Multiple Kernel Principal Component Analysis for Dimensionality Reduction and Classification of Hyperspectral Imagery. Hindawi Mathematical Problems in Engineering, Article ID 9632569, 14 pages.

Abrahamsen, T. J.; Hansen, L. K. (2011). A Cure for Variance Inflation in High Dimensional Kernel Principal Component Analysis. Journal of Machine Learning Research, 12, 2027-2044.

Ly, A.; Marsman, M.; Wagenmakers, E. J. (2018). Analytic posteriors for Pearson’s correlation coefficient. Statistica Neerlandica, Vol. 72, nr. 1, pp. 4–13.

Kumar, S.; Chong, I. (2018). Correlation Analysis to Identify the Effective Data in Machine Learning: Prediction of Depressive Disorder and Emotion States. Int. J. Environ. Res. Public Health, 15, 2907.

Hira, Z. M.; Gillies, D. F. (2015). A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data. Advances in Bioinformatics, Article ID 198363.

Sameen, M. I.; Pradhan, B.; Lee, S. (2018). Self-Learning Random Forests Model for Mapping Groundwater Yield in Data-Scarce Areas. Natural Resources Research.

Ali, N.; Neagu, D.; Trundle, P. (2019). Evaluation of k nearest neighbour classifer performance for heterogeneous data sets. SN Applied Sciences, 1, 1559.

Cao, Y.; Li, P.; Zhang, Y. (2018). Parallel processing algorithm for railway signal fault diagnosis data based on cloud computing. Future Generation Computer Systems, 88, 279–283.

Gallego, S. R.; Krawczyk, B.; García, S.; Wozniak, M.; Herrera, F. (2017). A survey on data preprocessing for data stream mining: Current status and future directions. Neurocomputing, 239, 39–57.

Pyo, J. S.; Seong, L. J.; Juyeon, L. (2020). Method of improving the performance of public-private innovation networks by linking heterogeneous DBs: Prediction using ensemble and PPDM models. Technological Forecasting & Social Change, 161, 120258.

Zhang, J.; Cormode, G.; Procopiuc, C. M.; Srivastava, D.; Xiao, X. (2017). PrivBayes: Private Data Release via Bayesian Networks. ACM Trans. Database Syst. 42, 4, Article 25, 41 pages.

Vaghashia, H.; Ganatra, A. (2015). A Survey: Privacy Preservation Techniques in Data Mining. International Journal of Computer Applications, 0975 – 8887, Volume 119, No.4.

https://archive.ics.uci.edu/ml/index.php

Munkhdalai, L.; Munkhdalai, T.; Park, K. H.; Lee, H. G.; Li, M.; Ryu, K. H. (2019). Mixture of Activation Functions With Extended Min-Max Normalization for Forex Market Prediction. IEEE, VOLUME 7.

Hasanipanah, M.; Faradonbeh, R. S.; Amnieh, H. B.; Armaghani, D. J.; Monjezi, M. (2016). Forecasting blast induced ground vibration developing a CART model. Engineering with Computers, Springer-Verlag London.

Lohani, A.; Singh, J.; Lohani, A. (2016). Comparative Analysis Of Classification Methods Using Privacy Preserving Data Mining. International Journal of Recent Trends in Engineering & Research, Volume 02, Issue 04.

Nasiri, N.;Keyvanpour, M. R. (2020).Classification and Evaluation of Privacy Preserving Data Mining Methods. 11th International Conference on Information and Knowledge Technology (IKT), 17-22.

Cuzzocrea, A.;Leung, C. K.; Olawoyin, A. M.; Fadda, E. (2022).Supporting Privacy-Preserving Big Data Analytics on Temporal Open Big Data. Procedia Computer Science, vol. 198, 112-121.

Sei, Y.;Ohsuga, A. (2022).Private True Data Mining: Differential Privacy Featuring Errors to Manage Internet-of-Things Data. in IEEE Access, doi: 10.1109/ACCESS.2022.3143813, vol. 10, 8738-8757.

Mahmoudi, M. R.;Heydari, M. H.; Qasem, S. N.; Mosavi, A. (2021).Principal Component Analysis To Study The Relations Between The Spread Rates Of COVID-19 In High Risks Countries. Alexandria Engineering Journal, vol. 60, 457-464.

Ahmed, U.;Srivastava, G.; Lin, J. C. W.; (2021).A Machine Learning Model for Data Sanitization. Computer Networks, vol. 189 107914, 107-914.

Hewage, U. H. W. A.; Sinha, R; Naeem, M.A.; (2023).Privacy-preserving data (stream) mining techniques and their impact on data mining accuracy: a systematic literature review. Artif Intell Rev (2023). https://doi.org/10.1007/s10462-023-10425-3.

Naresh, V. S.;Thamarai, M.; (2023).Privacy-preserving data mining and machine learning in healthcare: Applications, challenges, and solutions. WIREs Data Mining and Knowledge Discovery, 13( 2), e1490. https://doi.org/10.1002/widm.1490.

Yadav, R. ., & Singh, R. . (2023). A Hyper-parameter Tuning based Novel Model for Prediction of Software Maintainability. International Journal on Recent and Innovation Trends in Computing and Communication, 11(2), 106–113. https://doi.org/10.17762/ijritcc.v11i2.6134

Anna, G., Jansen, M., Anna, J., Wagner, A., & Fischer, A. Machine Learning Applications for Quality Assurance in Engineering Education. Kuwait Journal of Machine Learning, 1(1). Retrieved from http://kuwaitjournals.com/index.php/kjml/article/view/109

Downloads

Published

11.07.2023

How to Cite

Rathi, M. ., & Rajavat, A. . (2023). Investigations and Design of Privacy-Preserving Data Mining Technique for Secure Data Publishing. International Journal of Intelligent Systems and Applications in Engineering, 11(9s), 351–367. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/3125

Issue

Section

Research Article