Evaluation of Privacy-Preserving Techniques: Bouncy Castle Encryption and Machine Learning Algorithms for Secure Classification of Sensitive Data

Authors

  • Tungar D. V. Department of Computer Engineering, MET's Institute of Engineering, Bhujbal Knowledge City, Adagaon, Nashik (Maharashtra), India, Savitribai Phule Pune University, Pune (Maharashtra), India
  • Patil D. V. Department of Computer Engineering, Gokhale Education Society's R.H. Sapat College of Engineering, Management Studies and Research, Nashik (Maharashtra), India

Keywords:

Machine learning, Data encryption, privacy, data privacy, sensitive information security, data security, RSA-2048 algorithm, encryption, predictive accuracy

Abstract

In the present era of data-driven society, where the protection of sensitive information is of utmost importance, this research paper aims to enhance the effectiveness of data encryption and compare it with machine learning models (ML). This research examines different privacy-preserving methods using the machine learning algorithms and encryption library of Bouncy Castle for securely classifying sensitive data. To encrypt the dataset, the study utilizes the Bouncy Castle providing encryption by RSA-2048 algorithm. The method uses lookup substitution with k-anonymization, which reduces data risks, to improve privacy protection and system performance. This method successfully substitutes sensitive data with anonymized values, protecting privacy while enhancing system efficiency. To implement privacy preservation techniques, a dataset containing employee details is employed, and several classifiers including Adaboost M1, Naïve Bayes, Decision Tree J48, Random Forest and Decision Tree ID3, are utilized. The effectiveness of these algorithms is determined by evaluating their performance using significant metrics. The outcomes show that the Decision Tree J48 method surpasses others in terms of classification performance. Furthermore, the study evaluates the encryption process's efficiency and effectiveness by assessing processing time, encryption time, and decryption time. This investigation gives light on the technique's impact on data security and time overhead. The research's conclusions offer insightful information about the value of data encryption and help decision-makers make sensible choices when selecting appropriate security measures for various use cases. These observations help us comprehend the importance of data encryption and how it helps to protect the security and privacy of that data.

Downloads

Download data is not yet available.

References

A. Ali, A. W. Septyanto, I. Chaudhary, H. Al Hamadi, H. M. Alzoubi, and Z. F. Khan, “Applied Artificial Intelligence as Event Horizon Of Cyber Security,” in 2022 International Conference on Business Analytics for Technology and Security, ICBATS 2022, 2022. doi: 10.1109/ICBATS54253.2022.9759076.

A. A. Mughal, “Cybersecurity Hygiene in the Era of Internet of Things (IoT): Best Practices and Challenges,” Appl. Res. Artif. Intell. Cloud Comput., vol. 2, no. 1, pp. 1–31, 2019.

O. G. Abood and S. K. Guirguis, “A Survey on Cryptography Algorithms,” Int. J. Sci. Res. Publ., vol. 8, no. 7, pp. 495–516, 2018, doi: 10.29322/ijsrp.8.7.2018.p7978.

C. Riman and P. E. Abi-Char, “Comparative Analysis of Block Cipher-Based Encryption Algorithms: A Survey,” Comput. Fraud, vol. 3, no. 1, pp. 1–7, 2015, doi: 10.12691/iscf-3-1-1.

P. Dixit, A. K. Gupta, M. C. Trivedi, and V. K. Yadav, “Traditional and hybrid encryption techniques: A survey,” Netw. Commun. Data Knowl. Eng. Springer Singapore, vol. 2, pp. 239–248, 2018, doi: 10.1007/978-981-10-4600-1_22.

J. Liu, Y. Tian, Y. Zhou, Y. Xiao, and N. Ansari, “Privacy preserving distributed data mining based on secure multi-party computation,” Comput. Commun., vol. 153, pp. 208–216, 2020, doi: 10.1016/j.comcom.2020.02.014.

M. Blanton, A. Kang, and C. Yuan, “Improved Building Blocks for Secure Multi-party Computation Based on Secret Sharing with Honest Majority,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2020, pp. 377–397. doi: 10.1007/978-3-030-57808-4_19.

D. H. Vu, T. D. Luong, and T. B. Ho, “An efficient approach for secure multi-party computation without authenticated channel,” Inf. Sci. (Ny)., vol. 527, pp. 356–368, 2020, doi: 10.1016/j.ins.2019.07.031.

Sudhakar, M. ., & Kaliyamurthie, K. P. . (2023). A Novel Machine learning Algorithms used to Detect Credit Card Fraud Transactions. International Journal on Recent and Innovation Trends in Computing and Communication, 11(2), 163–168. https://doi.org/10.17762/ijritcc.v11i2.6141

N. Kaaniche, M. Laurent, and S. Belguith, “Privacy enhancing technologies for solving the privacy-personalization paradox: Taxonomy and survey,” Journal of Network and Computer Applications, vol. 171. Academic Press, p. 102807, Dec. 01, 2020. doi: 10.1016/j.jnca.2020.102807.

S. Murthy, A. Abu Bakar, F. Abdul Rahim, and R. Ramli, “A Comparative Study of Data Anonymization Techniques,” in Proceedings - 5th IEEE International Conference on Big Data Security on Cloud, BigDataSecurity 2019, 5th IEEE International Conference on High Performance and Smart Computing, HPSC 2019 and 4th IEEE International Conference on Intelligent Data and Securit, IEEE, May 2019, pp. 306–309. doi: 10.1109/BigDataSecurity-HPSC-IDS.2019.00063.

B. Ojokoh and E. Adebisi, “A review of question answering systems,” J. Web Eng., vol. 17, no. 8, pp. 717–758, 2019, doi: 10.13052/jwe1540-9589.1785.

G. M. Biancofiore, Y. Deldjoo, T. Di Noia, E. Di Sciascio, and F. Narducci, “Interactive Question Answering Systems: Literature Review,” arxiv.org, Sep. 2022, Accessed: May 31, 2023. [Online]. Available: https://arxiv.org/abs/2209.01621

F. J. Aufa, Endroyono, and A. Affandi, “Security System Analysis in Combination Method: RSA Encryption and Digital Signature Algorithm,” in Proceedings - 2018 4th International Conference on Science and Technology, ICST 2018, 2018. doi: 10.1109/ICSTC.2018.8528584.

J. C. Asenjo, “Data Masking, Encryption, and their Effect on Classification Performance: Trade-offs Between Data Security and Utility,” 2017.

K. Arava and S. Lingamgunta, “Adaptive k-Anonymity Approach for Privacy Preserving in Cloud,” Arab. J. Sci. Eng., vol. 45, no. 4, pp. 2425–2432, Apr. 2020, doi: 10.1007/s13369-019-03999-0.

J. Andrew, J. Karthikeyan, and J. Jebastin, “Privacy Preserving Big Data Publication on Cloud Using Mondrian Anonymization Techniques and Deep Neural Networks,” in 2019 5th International Conference on Advanced Computing and Communication Systems, ICACCS 2019, 2019, pp. 722–727. doi: 10.1109/ICACCS.2019.8728384.

M. N. Alenezi, H. Alabdulrazzaq, and N. Q. Mohammad, “Symmetric encryption algorithms: Review and evaluation study,” Int. J. Commun. Networks Inf. Secur., vol. 12, no. 2, pp. 256–272, 2020.

H. Chen, Z. Lin, L. Mo, and C. Tan, “Identification of Colorectal Cancer Using Near-Infrared Spectroscopy and Adaboost with Decision Stump,” Anal. Lett., vol. 50, no. 16, pp. 2608–2618, Nov. 2017, doi: 10.1080/00032719.2017.1310880.

S. CHALO and İ. Berkan AYDİLEK, “A New Preprocessing Method for Diabetes and Biomedical Data Classification,” Qubahan Acad. J., vol. 2, no. 4, pp. 6–18, 2023, doi: 10.48161/qaj.v2n4a135.

N. Khanna, “J48 Classification (C4.5 Algorithm) in a Nutshell,” Medium, 2003. https://medium.com/@nilimakhanna1/j48-classification-c4-5-algorithm-in-a-nutshell-24c50d20658e.

V. K. Vineetha and P. Samuel, “A Multinomial Naïve Bayes Classifier for identifying Actors and Use Cases from Software Requirement Specification documents,” in 2022 2nd International Conference on Intelligent Technologies, CONIT 2022, 2022. doi: 10.1109/CONIT55038.2022.9848290.

R. S. Khairy, A. S. Hussein, and H. T. H. S. ALRikabi, “The Detection of Counterfeit Banknotes Using Ensemble Learning Techniques of AdaBoost and Voting,” Int. J. Intell. Eng. Syst., vol. 14, no. 1, pp. 326–339, 2021, doi: 10.22266/IJIES2021.0228.31.

Brown, R., Brown, J., Rodriguez, C., Garcia, J., & Herrera, J. Predictive Analytics for Effective Resource Allocation in Engineering Education. Kuwait Journal of Machine Learning, 1(1). Retrieved from http://kuwaitjournals.com/index.php/kjml/article/view/91

Y. L. Pavlov, “Random forests,” Random For., vol. 45, pp. 1–122, 2019, doi: 10.4324/9781003109396-5.

Downloads

Published

01.07.2023

How to Cite

D. V. , T. ., & D. V., P. . (2023). Evaluation of Privacy-Preserving Techniques: Bouncy Castle Encryption and Machine Learning Algorithms for Secure Classification of Sensitive Data. International Journal of Intelligent Systems and Applications in Engineering, 11(7s), 429 –. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/2976