The Secret Life of Hackathon Code Where does it come from and where does it go?

Background: Hackathons have become popular events for teams to collaborate on projects and develop software prototypes. Most existing research focuses on activities during an event with limited attention to the evolution of the code brought to or created during a hackathon. Aim: We aim to understand the evolution of hackathon-related code, specifically, how much hackathon teams rely on pre-existing code or how much new code they develop during a hackathon. Moreover, we aim to understand if and where that code gets reused, and what factors affect reuse. Method: We collected information about 22,183 hackathon projects from Devpost– a hackathon database – and obtained related code (blobs), authors, and project characteristics from the World of Code. We investigated if code blobs in hackathon projects were created before, during, or after an event by identifying the original blob creation date and author, and also checked if the original author was a hackathon project member. We tracked code reuse by first identifying all commits containing blobs created during an event before determining all projects that contain those commits. Result: While only approximately 9.14% of the code blobs are created during hackathons, this amount is still significant considering time and member constraints of such events. Approximately a third of these code blobs get reused in other projects. The number of associated technologies and the number of participants in a project increase reuse probability. Conclusion: Our study demonstrates to what extent pre-existing code is used and new code is created during a hackathon and how much of it is reused elsewhere afterwards. Our findings help to better understand code reuse as a phenomenon and the role of hackathons in this context and can serve as a starting point for further studies in this area.

[1]  Sebastian Spaeth,et al.  Knowledge Reuse in Open Source Software: An Exploratory Study of 15 Open Source Projects , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[2]  Galit Shmueli,et al.  To Explain or To Predict? , 2010, 1101.0891.

[3]  Arnab Nandi,et al.  Hackathons as an Informal Learning Platform , 2016, SIGCSE.

[4]  Hilmar Lapp,et al.  The 2006 NESCent Phyloinformatics Hackathon: A Field Report , 2007, Evolutionary Bioinformatics Online.

[5]  Nick Taylor,et al.  Strategies for Engaging Communities in Creating Physical Civic Technologies , 2018, CHI.

[6]  James D. Herbsleb,et al.  Designing Corporate Hackathons With a Purpose: The Future of Software Development , 2019, IEEE Software.

[7]  Lieven De Marez,et al.  Urban socio-technical innovations with and by citizens , 2014 .

[8]  A. Mockus,et al.  Large-Scale Code Reuse in Open Source Software , 2007, First International Workshop on Emerging Trends in FLOSS Research and Development (FLOSS'07: ICSE Workshops 2007).

[9]  Audris Mockus,et al.  Detecting and Characterizing Bots that Commit Code , 2020, 2020 IEEE/ACM 17th International Conference on Mining Software Repositories (MSR).

[10]  Jeanette Falk Olesen,et al.  10 Years of Research With and On Hackathons , 2020, Conference on Designing Interactive Systems.

[11]  Kiev Gama,et al.  A Hackathon Methodology for Undergraduate Course Projects , 2018, 2018 IEEE Frontiers in Education Conference (FIE).

[12]  Alexander Nolte,et al.  Touched by the Hackathon: a study on the connection between Hackathon participants and start-up founders , 2019, IWSiB@ESEC/SIGSOFT FSE.

[13]  Audris Mockus,et al.  Patterns of Effort Contribution and Demand and User Classification based on Participation Patterns in NPM Ecosystem , 2019, PROMISE.

[14]  Audris Mockus,et al.  Modeling Relationship between Post-Release Faults and Usage in Mobile Software , 2018, PROMISE.

[15]  Audris Mockus,et al.  A Complete Set of Related Git Repositories Identified via Community Detection Approaches Based on Shared Commits , 2020, MSR.

[16]  A. Fowler Informal STEM Learning in Game Jams, Hackathons and Game Creation Events , 2016 .

[17]  James D. Herbsleb,et al.  From Diversity by Numbers to Diversity as Process: Supporting Inclusiveness in Software Development Teams with Brainstorming , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[18]  Alexander Nolte,et al.  What Do We Know About Hackathon Outcomes and How to Support Them? - A Systematic Literature Review , 2020, CollabTech.

[19]  Hanna Kienzler,et al.  Learning through inquiry: a Global Health Hackathon , 2017 .

[20]  Kiev Gama,et al.  Engaging Women’s Participation in Hackathons: A Qualitative Study with Participants of a Female-focused Hackathon , 2020, ICGJ.

[21]  Rabe Abdalkareem,et al.  On code reuse from StackOverflow: An exploratory study on Android apps , 2017, Inf. Softw. Technol..

[22]  James D. Herbsleb,et al.  How to organize a hackathon - A planning kit , 2020, ArXiv.

[23]  Katsuro Inoue,et al.  Identifying Source Code Reuse across Repositories Using LCS-Based Source Code Similarity , 2014, 2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation.

[24]  Alexis Hope,et al.  Hackathons as Participatory Design: Iterating Feminist Utopias , 2019, CHI.

[25]  Anthony Arendt,et al.  Hack weeks as a model for data science education and collaboration , 2017, Proceedings of the National Academy of Sciences.

[26]  Adolfo Villafiorita,et al.  Hacking for Southern Africa: Collaborative development of hyperlocal services for marginalised communities , 2016, 2016 IST-Africa Week Conference.

[27]  Jouni Ikonen,et al.  Code camps and hackathons in education - literature review and lessons learned , 2019, HICSS.

[28]  Alexander J. Hartemink,et al.  Principled computational methods for the validation discovery of genetic regulatory networks , 2001 .

[29]  Apostolos Ampatzoglou,et al.  CODE reuse in practice: Benefiting or harming technical debt , 2020, J. Syst. Softw..

[30]  K. Cranston,et al.  Community and Code: Nine Lessons from Nine NESCent Hackathons , 2017 .

[31]  Audris Mockus,et al.  A Dataset and an Approach for Identity Resolution of 38 Million Author IDs extracted from 2B Git Commits , 2020, MSR.

[32]  Audris Mockus,et al.  An Exploratory Study of Bot Commits , 2020, ICSE.

[33]  Olivier Sallou,et al.  Community-driven development for computational biology at Sprints, Hackathons and Codefests , 2014, BMC Bioinformatics.

[34]  Sebastian Spaeth,et al.  Code Reuse in Open Source Software , 2008, Manag. Sci..

[35]  James D. Herbsleb,et al.  How to Hackathon: Socio-technical Tradeoffs in Brief, Intensive Collocation , 2016, CSCW.

[36]  Manuel Serrano,et al.  Replication package for , 2020, Artifact Digital Object Group.

[37]  Audris Mockus,et al.  Are Software Dependency Supply Chain Metrics Useful in Predicting Change of Popularity of NPM Packages? , 2018, PROMISE.

[38]  David Cobham,et al.  FROM APPFEST TO ENTREPRENEURS: USING A HACKATHON EVENT TO SEED A UNIVERSITY STUDENT-LED ENTERPRISE , 2017 .

[39]  Audris Mockus,et al.  World of code: enabling a research workflow for mining and analyzing the universe of open source VCS data , 2020, Empirical Software Engineering.

[40]  Foutse Khomh,et al.  Why reinventing the wheels? An empirical study on library reuse and re-implementation , 2019, Empirical Software Engineering.

[41]  Joachim Henkel,et al.  Code Reuse in Open Source Software Development: Quantitative Evidence, Drivers, and Impediments , 2010, J. Assoc. Inf. Syst..

[42]  Audris Mockus,et al.  Deriving a usage-independent software quality metric , 2020, Empirical Software Engineering.

[43]  James D. Herbsleb,et al.  How to Support Newcomers in Scientific Hackathons - An Action Research Study on Expert Mentoring , 2020, Proc. ACM Hum. Comput. Interact..

[44]  James D. Herbsleb,et al.  Understanding Hackathons for Science: Collaboration, Affordances, and Outcomes , 2019, iConference.

[45]  Daniel M. Germán Using Software Distributions to Understand the Relationship among Free and Open Source Software Projects , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[46]  Audris Mockus,et al.  Representation of Developer Expertise in Open Source Software , 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE).

[47]  Julien Cohen-Adad,et al.  Brainhack: a collaborative workshop for the open neuroscience community , 2016, GigaScience.

[48]  Shiven Kumar,et al.  Unleashing innovation through internal hackathons , 2014, 2014 IEEE Innovations in Technology Conference.

[49]  Nick Taylor,et al.  Everybody's Hacking: Participation and the Mainstreaming of Hackathons , 2018, CHI.

[50]  Alexander Nolte What Happens to All These Hackathon Projects? – Identifying Factors to Promote Hackathon Project Continuation , 2020 .

[51]  Lisa Federer,et al.  Closing gaps between open software and public data in a hackathon setting: User-centered software prototyping , 2016, F1000Research.

[52]  Janne Järvinen,et al.  What are Hackathons for? , 2015, IEEE Software.

[53]  Audris Mockus,et al.  World of Code: An Infrastructure for Mining the Universe of Open Source VCS Data , 2019, 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR).