Web-scale data gathering with BlueJ

Many investigations of students' initial learning of programming are based on small-scale studies of their interactions with a learning environment. Although this research has led to significant improvements in the understanding of student behaviour (and tool support), it has often been restricted to small numbers of students at single institutions. This paper describes an initiative to instrument the widely-used BlueJ environment to collect data on a much larger scale, and make that data available to Computing Education researchers. The availability of this data has the potential to enable research not previously possible. This paper discusses the type of data that will be gathered, the restrictions placed on identifying students, and mechanisms for associating the data with contextual data gathered outside the scope of the initiative.