Code Plagiarism and Originality Detection using Machine Learning for Ethical Code Practices

Authors

  • Harshali Patil Head of Department, Computer Engineering, Thakur College of Engineering and Technology, Kandivali, Mumbai-400101, INDIA
  • Siddhi Ambre Assistant Professor, Department of Computer Engineering, Thakur of College of Engineering and Technology, Kandivali, Mumbai-400101, INDIA
  • Karuna Bhosale Assistant Professor School of Engineering & Technology, Pimpri Chinchwad University, Talegoan, Pune-410506, INDIA
  • Anamika Singh Assistant Professor, Department of CS&E (cyber security), Thakur College of Engineering and Technology, Kandivali, Mumbai-400101, INDIA
  • Harsh Jha Department of Computer Engineering, Thakur of College of Engineering and Technology, Kandivali, Mumbai-400101, INDIA
  • Vedika Mandre Department of Computer Engineering, Thakur of College of Engineering and Technology, Kandivali, Mumbai-400101, INDIA
  • Ankit Maurya Department of Computer Engineering, Thakur of College of Engineering and Technology, Kandivali, Mumbai-400101, INDIA

Keywords:

Code Analysis, Code Originality System, Code Similarity, Ethical Coding Practices, Plagiarism Detection

Abstract

The design and development of a Code Originality System—a sophisticated software solution aimed at preserving the intellectual property rights of developers, upholding code quality, and promoting ethical coding practices. The primary goal of this research is to create a Code Originality System that utilizes various algorithms to analyze code similarity and detect plagiarism. Emphasis is placed on safeguarding intellectual property, ensuring code quality, and encouraging ethical coding practices. Utilizing token-based approaches and advanced machine learning models, the Code Originality System addresses challenges of code diversity, scalability, privacy, and algorithmic precision. The research emphasizes a pivotal role in safeguarding code integrity, offering insights into architectural components, customizable features, and integration capabilities. The study presents a robust Code Originality System, revealing its effectiveness in tackling challenges and underscoring its role in fostering innovation. The findings, supported by conclusive statistical data, highlight the system's uniqueness and its contribution to responsible and ethical software development practices. This research pioneers a Code Originality System, providing a critical stride towards a future defined by responsible and ethical software development practices.

Downloads

Download data is not yet available.

References

Cosma G, Joy M. Source-code plagiarism: A UK academic perspective. In: The 7th Annual Conference of the HEA Network for Information and Computer Sciences. HEA Network for Information and ComputerSciences; 2006.

Cosma G, Joy M. Towards a definition of source-code plagiarism. IEEE Trans Educ. 2008;51(2):195–200.

Culwin F, MacLeod A, Lancaster T. Source Code Plagiarism in UK HE Computing Schools, Issues, Attitudes, and Tools. South Bank University, London; 2001.

Đurić Z, Gašević D. A source code similarity system for plagiarism detection. Comput J. 2013;56(1):70–86.

Joy M, Cosma G, Yau JY-K, Sinclair J. Source code plagiarism – a student perspective. IEEE Trans Educ.2011;54(1):125–132.

Hage J, Rademaker P, Vugt N. A Comparison of Plagiarism Detection Tools. Department of Information and Computing Sciences, Utrecht University. 2014.

Joy M, Luck M. Plagiarism in programming assignments. IEEE Trans Educ. 1999;42(2):129–133.

Lancaster T. Effective and Efficient Plagiarism Detection. PhD Thesis, South Bank University, London; 2003.Availablefrom: http://www.academia.edu/168972/Effective_and_Efficient_Plagiarism_Detection

Lancaster T, Culwin F. Using freely available tools to produce a partially automated plagiarism. In: Proc. of the 21st ASCILITE Conference, Perth, Australia; 2004. p. 520–529.

Wang C, Xu H, Zhang D. Copyright issues in code similarity detection: An empirical study on GitHub. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management.ACM;2019.

Burrows S, Tahaghoghi SMM, Zobel J. Efficient plagiarism detection for large code repositories. Softw Pract Exper. 2007;37(2):151–175.

Lancaster T, Culwin F. Classifications of plagiarism detection engines. Innov Teach Learn Inf Comput Sci. 2005;4(2).

Mozgovoy M. Desktop tools for offline plagiarism detection in computer programs. Informatics Educ. 2006;5(1):97–112.

Mozgovoy M, Fredriksson K, White D, Joy M, Sutien E. Fast plagiarism detection system. In: SPIRE’05, Buenos Aires, Argentina; 2005. p. 267–270.

Prechelt L, Malpohl G, Philippsen M. Finding plagiarisms among a set of programs with JPlag. J Universal Computer Sci. 2002;8(11):1016–1038.

Prechelt L, Malpohl G, Phlippsen M. Finding Plagiarisms Among a Set of Programs. Universität Karlsruhe, Fakultültät für Informatik; 2000. Available from: http://page.mi.fu-berlin.de/~prechelt/Biblio/jplagTR.pdf

Saini R, Sukhwani A, Ghose AK. Code plagiarism detection using machine learning techniques. Int J Comput Appl. 2017;178(41):22–28.

Lavesson N, Samuelsson C. A survey of privacy in code analysis. J Privacy Confidentiality. 2018;9(2).

Martin B. Plagiarism: a misplaced emphasis. J Inf Ethics. 1994;3(2):36–47.

Ahtiainen A, Surakka S, Rahikainen M. Plaggie: GNU-licensed source code plagiarism detection engine for Java exercises. In: Baltic sea ’06; 2006. p. 141–142.

Baxter ID, Yahin A, Moura L, Sant’Anna M, Bier L. Clone detection using abstract syntax trees. In: ICSM’98; 1998. p. 368–377.

Fowler M. Catalog of refactorings. 2013. Available from: https://refactoring.com/catalog/

Kamiya T, Kusumoto S, Inoue K. CCFInder: a multilinguistic token-based code clone detection system for large scale source code. Trans Softw Eng. 2002;28(7):654–670.

Kapser C. Godfrey m: Cloning considered harmful” considered harmful. In: 2006 13th working conference on reverse engineering; 2006. p. 19–28.

Udupa SK, Debray SK, Madou M (2005) Deobfuscation: reverse engineering obfuscated code. In: WCRE ’05, pp 45–56

United States District Court (2011) Oracle America, Inc. v. Google Inc., No. 3:2010cv03561 – Document 642 (N.D. Cal. 2011).

Downloads

Published

24.03.2024

How to Cite

Patil, H. ., Ambre, S. ., Bhosale, K. ., Singh, A. ., Jha, H. ., Mandre, V. ., & Maurya, A. . (2024). Code Plagiarism and Originality Detection using Machine Learning for Ethical Code Practices. International Journal of Intelligent Systems and Applications in Engineering, 12(3), 209–215. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/5242

Issue

Section

Research Article