Software metrics and plagiarism detection

Abstract The reliability of plagiarism detection systems, which try to identify similar programs in large populations, is critically dependent on the choice of program representation. Software metrics conventionally used as representations are described, and the limitations of metrics adapted from software complexity measures are outlined. An application-specific metric is proposed, one that represents the structure of a program as a variable-length profile. Its constituent terms, each recording the control structures in a program fragment, are ordered for efficient comparision. The superior performance of the plagiarism detection system based on this profile is reported, and deriving complexity measures from the profile is discussed.