The Types, Roles, and Practices of Documentation in Data Analytics Open Source Software Libraries

Computational research and data analytics increasingly relies on complex ecosystems of open source software (OSS) “libraries” – curated collections of reusable code that programmers import to perform a specific task. Software documentation for these libraries is crucial in helping programmers/analysts know what libraries are available and how to use them. Yet documentation for open source software libraries is widely considered low-quality. This article is a collaboration between CSCW researchers and contributors to data analytics OSS libraries, based on ethnographic fieldwork and qualitative interviews. We examine several issues around the formats, practices, and challenges around documentation in these largely volunteer-based projects. There are many different kinds and formats of documentation that exist around such libraries, which play a variety of educational, promotional, and organizational roles. The work behind documentation is similarly multifaceted, including writing, reviewing, maintaining, and organizing documentation. Different aspects of documentation work require contributors to have different sets of skills and overcome various social and technical barriers. Finally, most of our interviewees do not report high levels of intrinsic enjoyment for doing documentation work (compared to writing code). Their motivation is affected by personal and project-specific factors, such as the perceived level of credit for doing documentation work versus more ‘technical’ tasks like adding new features or fixing bugs. In studying documentation work for data analytics OSS libraries, we gain a new window into the changing practices of data-intensive research, as well as help practitioners better understand how to support this often invisible and infrastructural work in their projects.

[1]  Daniel M. Germán,et al.  How the R Community Creates and Curates Knowledge: A Comparative Study of Stack Overflow and Mailing Lists , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[2]  Jon Hindmarsh,et al.  Workplace Studies: List of contributors , 2000 .

[3]  M. Buckland What is a “document”? , 1997 .

[4]  Hans Petter Langtangen,et al.  Python scripting for computational science , 2004 .

[5]  Jake Vanderplas,et al.  Python Data Science Handbook: Essential Tools for Working with Data , 2016 .

[6]  Anton Nekrutenko,et al.  Ten Simple Rules for Reproducible Computational Research , 2013, PLoS Comput. Biol..

[7]  C. Brodsky The Discovery of Grounded Theory: Strategies for Qualitative Research , 1968 .

[8]  Kevin Crowston,et al.  A Coordination Theory Approach to Organizational Process Design , 1997 .

[9]  Lucy A. Suchman,et al.  Making work visible , 1995, CACM.

[10]  Jonathan T. Morgan,et al.  The Rise and Decline of an Open Collaboration System , 2013 .

[11]  Christoph Treude,et al.  The Social Side of Software Platform Ecosystems , 2016, CHI.

[12]  Christopher Kelty,et al.  Two Bits: The Cultural Significance of Free Software , 2008 .

[13]  E. Deci,et al.  Intrinsic and Extrinsic Motivations: Classic Definitions and New Directions. , 2000, Contemporary educational psychology.

[14]  Ian M. Mitchell,et al.  Best Practices for Scientific Computing , 2012, PLoS biology.

[15]  Carsten S. Østerlund,et al.  Relations in Practice: Sorting Through Practice Theories on Knowledge Sharing in Complex Organizations , 2005, Inf. Soc..

[16]  Adam Croom,et al.  Roads and Bridges: The Unseen Labor Behind Our Digital Infrastructure / Ford Foundation , 2016 .

[17]  Grant Williams,et al.  Analyzing User Comments on YouTube Coding Tutorial Videos , 2017, 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC).

[18]  Susan Elliott Sim,et al.  What Counts as Software Process? Negotiating the Boundary of Software Work Through Artifacts and Conversation , 2009, Computer Supported Cooperative Work (CSCW).

[19]  Daniel M. Germán,et al.  How the R community creates and curates knowledge: an extended study of stack overflow and mailing lists , 2017, Empirical Software Engineering.

[20]  Leif Singer,et al.  Software engineering at the speed of light: how developers stay current using twitter , 2014, ICSE.

[21]  Kevin Crowston,et al.  Defining Open Source Software Project Success , 2003, ICIS.

[22]  Christine Reid,et al.  The Myth of the Paperless Office , 2003, J. Documentation.

[23]  Marjorie Rush Hovde,et al.  Tactics for Building Images of Audience in Organizational Contexts , 2000 .

[24]  E. Goffman The Presentation of Self in Everyday Life , 1959 .

[25]  Christoph Treude,et al.  Blogging developer knowledge: Motivations, challenges, and future directions , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[26]  Florian Nadel,et al.  Workplace Studies Recovering Work Practice And Informing System Design , 2016 .

[27]  Bert R. Boyce How to write a usable user manual , 1986, J. Am. Soc. Inf. Sci..

[28]  Ciaran B. Trace Documenting Work and Working Documents: Perspectives from Workplace Studies, CSCW, and Genre Studies , 2011, 2011 44th Hawaii International Conference on System Sciences.

[29]  Thomas A. Finholt,et al.  Tensions across the scales: planning infrastructure for the long-term , 2007, GROUP.

[30]  Austen Rainer,et al.  Investigating developers' email discussions during decision-making in Python language evolution , 2017, EASE.

[31]  B. Latour,et al.  Laboratory Life: The Social Construction of Scientific Facts , 1983 .

[32]  A. Strauss,et al.  The discovery of grounded theory: strategies for qualitative research aldine de gruyter , 1968 .

[33]  Leif Singer,et al.  How Social and Communication Channels Shape and Challenge a Participatory Culture in Software Development , 2017, IEEE Transactions on Software Engineering.

[34]  Wolfgang Bangerth,et al.  What makes computational open source software libraries successful , 2013 .

[35]  S. Shapin Laboratory life. The social construction of scientific facts , 1981, Medical History.

[36]  L. E. Lassiter,et al.  The Chicago Guide to Collaborative Ethnography , 2005 .

[37]  Lucas C. Parra,et al.  Origins of power-law degree distribution in the heterogeneity of human activity in social networks , 2013, Scientific Reports.

[38]  John M. Carroll,et al.  Principles and Heuristics for Designing Minimalist Instruction , 1998 .

[39]  J. Overhage,et al.  Sorting Things Out: Classification and Its Consequences , 2001, Annals of Internal Medicine.

[40]  S. L. Star,et al.  The Ethnography of Infrastructure , 1999 .

[41]  A. Strauss THE ARTICULATION OF PROJECT WORK: AN ORGANIZATIONAL PROCESS , 1988 .

[42]  N. Tkacz Wikipedia and the Politics of Openness , 2014 .

[43]  Georgios Gousios,et al.  Work practices and challenges in pull-based development: the contributor's perspective , 2015, ICSE.

[44]  Jon Hindmarsh,et al.  Workplace Studies: Exploring the workplace , 2000 .

[45]  Brian Fitzgerald,et al.  Why Hackers Do What They Do: Understanding Motivation and Effort in Free/Open Source Software Projects , 2007 .

[46]  James D. Herbsleb,et al.  Social coding in GitHub: transparency and collaboration in an open software repository , 2012, CSCW.

[47]  Oded Nov,et al.  Investigating the Motivational Paths of Peer Production Newcomers , 2017, CHI.

[48]  Carole A. Goble,et al.  The Software Sustainability Institute: Changing Research Software Attitudes and Practices , 2013, Computing in Science & Engineering.

[49]  Diane Vaughan,et al.  The Challenger Launch Decision: Risky Technology, Culture, and Deviance at NASA , 1996 .

[50]  R. Stuart Geiger,et al.  Trace Ethnography: Following Coordination through Documentary Practices , 2011, 2011 44th Hawaii International Conference on System Sciences.

[51]  H. Garfinkel Studies in Ethnomethodology , 1968 .

[52]  Etienne Wenger,et al.  Situated Learning: Legitimate Peripheral Participation , 1991 .

[53]  Karim R. Lakhani,et al.  Why Hackers Do What They Do: Understanding Motivation and Effort in Free/Open Source Software Projects , 2003 .

[54]  Marco Aurélio Gerosa,et al.  A systematic literature review on the barriers faced by newcomers to open source software projects , 2015, Inf. Softw. Technol..

[55]  R. Stuart Geiger,et al.  Beyond opening up the black box: Investigating the role of algorithmic systems in Wikipedian organizational culture , 2017, Big Data Soc..

[56]  G. Bowker,et al.  The multiple bodies of the medical record : Toward a sociology of an artifact , 1996 .

[57]  Georgios Gousios,et al.  Work Practices and Challenges in Pull-Based Development: The Integrator's Perspective , 2014, ICSE.

[58]  Pamela J. Hinds,et al.  Out of Sight, Out of Sync: Understanding Conflict in Distributed Teams , 2003, Organ. Sci..

[59]  Amanda Menking,et al.  The Heart Work of Wikipedia: Gendered, Emotional Labor in the World's Largest Online Encyclopedia , 2015, CHI.

[60]  W. Bean Parkinson's Law, and Other Studies in Administration. , 1958 .

[61]  Marie Campbell,et al.  Literacy, Experience, Power , 1995 .

[62]  Susan Leigh Star,et al.  Layers of Silence, Arenas of Voice: The Ecology of Visible and Invisible Work , 1999, Computer Supported Cooperative Work (CSCW).

[63]  Paul Dourish,et al.  The human infrastructure of cyberinfrastructure , 2006, CSCW '06.