A Dataset of Scratch Programs: Scraped, Shaped and Scored

Scratch is increasingly popular, both as an introductory programming language and as a research target in the computing education research field. In this paper, we present a dataset of 250K recent Scratch projects from 100K different authors scraped from the Scratch project repository. We processed the projects' source code and metadata to encode them into a database that facilitates querying and further analysis. We further evaluated the projects in terms of programming skills and mastery, and included the project scoring results. The dataset enables the analysis of the source code of Scratch projects, of their quality characteristics, and of the programming skills that their authors exhibit. The dataset can be used for empirical research in software engineering and computing education.

[1]  Yasmin B. Kafai,et al.  Programming in the wild: trends in youth computational participation in the online scratch community , 2014, WiPSCE.

[2]  Thomas Connolly,et al.  Evaluation of Computer Games Developed by Primary School Children to Gauge Understanding of Programming Concepts , 2012 .

[3]  Felienne Hermans,et al.  Do code smells hamper novice programming? A controlled experiment on Scratch programs , 2016, 2016 IEEE 24th International Conference on Program Comprehension (ICPC).

[4]  Eric Rosenbaum,et al.  Scratch: programming for all , 2009, Commun. ACM.

[5]  Aditya Johri,et al.  Uncovering Trajectories of Informal Learning in Large Online Communities of Creators , 2015, L@S.

[6]  Diana Franklin,et al.  Hairball: lint-inspired static analysis of scratch projects , 2013, SIGCSE '13.

[7]  Mitchel Resnick,et al.  Programming by choice: urban youth learning programming with scratch , 2008, SIGCSE '08.

[8]  Gregorio Robles,et al.  Automatic detection of bad programming habits in scratch: A preliminary study , 2014, 2014 IEEE Frontiers in Education Conference (FIE) Proceedings.

[9]  Gregorio Robles,et al.  Software clones in scratch projects: on the presence of copy-and-paste in computational thinking learning , 2017, 2017 IEEE 11th International Workshop on Software Clones (IWSC).

[10]  Mauricio A. Saca Refactoring improving the design of existing code , 2017, 2017 IEEE 37th Central America and Panama Convention (CONCAPAN XXXVII).

[11]  Sayamindu Dasgupta,et al.  Remixing as a Pathway to Computational Thinking , 2016, CSCW.

[12]  Linda M. Seiter,et al.  Modeling the learning progressions of computational thinking of primary grade students , 2013, ICER.

[13]  Felienne Hermans,et al.  How Kids Code and How We Know: An Exploratory Study on the Scratch Repository , 2016, ICER.

[14]  Benjamin Mako Hill,et al.  A longitudinal dataset of five years of public activity in the Scratch online community , 2017, Scientific Data.

[15]  Mordechai Ben-Ari,et al.  Learning computer science concepts with scratch , 2010, ICER '10.

[16]  Mordechai Ben-Ari,et al.  Habits of programming in scratch , 2011, ITiCSE '11.

[17]  Gregorio Robles,et al.  Dr. Scratch: Automatic Analysis of Scratch Projects to Assess and Foster Computational Thinking , 2015 .