Automated Detecting and Tracing for Plagiarized Programs using Gumbel Distribution Model 


Vol. 16,  No. 6, pp. 453-462, Dec.  2009
10.3745/KIPSTA.2009.16.6.453


PDF
  Abstract

Studies on software plagiarism detection, prevention and judgement have become widespread due to the growing of interest and importance for the protection and authentication of software intellectual property. Many previous studies focused on comparing all pairs of submitted codes by using attribute counting, token pattern, program parse tree, and similarity measuring algorithm. It is important to provide a clear-cut model for distinguishing plagiarism and collaboration. This paper proposes a source code clustering algorithm using a probability model on extreme value distribution. First, we propose an asymmetric distance measure pdist(Pa,Pb) to measure the similarity of Pa and Pb. Then, we construct the Plagiarism Direction Graph (PDG) for a given program set using pdist(Pa,Pb) as edge weights. And, we transform the PDG into a Gumbel Distance Graph (GDG) model, since we found that the pdist(Pa,Pb) score distribution is similar to a well-known Gumbel distribution. Second, we newly define pseudo-plagiarism which is a sort of virtual plagiarism forced by a very strong functional requirement in the specification. We conducted experiments with 18 groups of programs (more than 700 source codes) collected from the ICPC (International Collegiate Programming Contest) and KOI (Korean Olympiad for Informatics) programming contests. The experiments showed that most plagiarized codes could be detected with high sensitivity and that our algorithm successfully separated real plagiarism from pseudo plagiarism.

  Statistics


  Cite this article

[IEEE Style]

J. H. Ji, G. Woo, H. G. Cho, "Automated Detecting and Tracing for Plagiarized Programs using Gumbel Distribution Model," The KIPS Transactions:PartA, vol. 16, no. 6, pp. 453-462, 2009. DOI: 10.3745/KIPSTA.2009.16.6.453.

[ACM Style]

Jeong Hoon Ji, Gyun Woo, and Hwan Gue Cho. 2009. Automated Detecting and Tracing for Plagiarized Programs using Gumbel Distribution Model. The KIPS Transactions:PartA, 16, 6, (2009), 453-462. DOI: 10.3745/KIPSTA.2009.16.6.453.