Detecting collusion in computer based programming assignments

Plagiarism and collusion are problems that educators have to deal with in the design, implementation and marking of any assignments. It is particularly difficult when the deliverable for an assignment is computer code - which can be easily copied and its origins obfuscated. This compounded when there are a large number of submission marked by different people - similiarities between assignments are then easily missed, marking the detection of collusion in particular very difficult. This paper decribes a process of identifying similarity between different pieces of computer code based on techniques used in bioinformatics. It was able to measure the "degree" of relationship between different assignments, providing a visual diagram of the similarity between various submission. This was then used to preselect assignments for closer inspection and comparison