Remote, but Connected

Data science practitioners face the challenge of continually honing their skills such as data wrangling and visualization. As data scientists seek online spaces to network, learn and share resources with one another, each individual has to employ their own ad-hoc strategy to practice their data science skills. Given these disjointed efforts, it is crucial to ask: how can we build an inclusive, welcoming online community of practice that unites data scientists in their collective efforts to become experts? Daily hashtags on Twitter are used on specific days and have shown promise in forming a community of practice (CoP) in social networking sites like Twitter, but how do they benefit the community and its members? To understand how daily hashtags benefit data scientists and form an online CoP, we conducted a qualitative study on #TidyTuesday---a daily hashtag project for data scientists using R---using the framework of CoP as a lens for analysis. We conducted semi-structured interviews with 26 participants and uncovered motivations behind their participation in #TidyTuesday, how the project benefited them, and how it cultivated an online CoP. Our findings contribute to the CSCW research on community of practices by providing design trade-offs of using daily hashtags on Twitter, and guidelines on growing and sustaining an online community of practice for data scientists.

[1]  Premkumar T. Devanbu,et al.  How social Q&A sites are changing knowledge sharing in open source software communities , 2014, CSCW.

[2]  Marco Aurélio Gerosa,et al.  Social Barriers Faced by Newcomers Placing Their First Contribution in Open Source Software Projects , 2015, CSCW.

[3]  Emerson Murphy-Hill,et al.  Data Analysts and Their Software Practices: A Profile of the Sabermetrics Community and Beyond , 2020, Proc. ACM Hum. Comput. Interact..

[4]  Daniel M. Germán,et al.  The Evolution of the R Software Ecosystem , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[5]  Judith Segal Some Problems of Professional End User Developers , 2007 .

[6]  Philip J. Guo,et al.  Practitioners Teaching Data Science in Industry and Academia: Expectations, Workflows, and Challenges , 2019, CHI.

[7]  Bn Bogdan Vasilescu Social aspects of collaboration in online software communities , 2014 .

[8]  K. Starkey,et al.  Cultivating Communities of Practice : A Guide to Managing Knowledge , 2009 .

[9]  Kouichi Kishida,et al.  Toward an understanding of the motivation of open source software developers , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[10]  Clifford A. Shaffer,et al.  Reconciling the Promise and Pragmatics of Enhancing Computing Pedagogy with Data Science , 2018, SIGCSE.

[11]  Peter Baumgartner,et al.  R – Data Science , 2017 .

[12]  ชวิตรา ตันติมาลา Constructing Grounded Theory: A Practical Guide through Qualitative Analysis , 2017 .

[13]  James D. Hollan,et al.  Exploration and Explanation in Computational Notebooks , 2018, CHI.

[14]  Brad A. Myers,et al.  Variolite: Supporting Exploratory Programming by Data Scientists , 2017, CHI.

[15]  Ingrid Erickson,et al.  The Translucence of Twitter , 2008 .

[16]  Liza Potts,et al.  Tweeting disaster: hashtag constructions and collisions , 2011, SIGDOC '11.

[17]  Etienne Wenger,et al.  Communities of Practice: Learning, Meaning, and Identity , 1998 .

[18]  Yiqun Liu,et al.  Discover breaking events with popular hashtags in twitter , 2012, CIKM.

[19]  Sarah A. Gilbert Learning in a Twitter-based community of practice: an exploration of knowledge exchange as a motivation for participation in #hcsmca , 2016 .

[20]  George Strawn,et al.  Data Scientist , 2016, IT Professional.

[21]  Colin M. Gray,et al.  Understanding Social Roles in an Online Community of Volatile Practice , 2018, ACM Trans. Soc. Comput..

[22]  Robert E. Kraut,et al.  Shaping Pro and Anti-Social Behavior on Twitch Through Moderation and Example-Setting , 2017, CSCW.

[23]  Miryung Kim,et al.  The Emerging Role of Data Scientists on Software Development Teams , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[24]  Souti Chattopadhyay,et al.  What's Wrong with Computational Notebooks? Pain Points, Needs, and Design Opportunities , 2020, CHI.

[25]  Mary Shaw,et al.  The state of the art in end-user software engineering , 2011, ACM Comput. Surv..

[26]  Todd Graham,et al.  Discursive Equality and Everyday Talk Online: The Impact of "Superparticipants" , 2014, J. Comput. Mediat. Commun..

[27]  Ellen W. Zegura,et al.  Care and the Practice of Data Science for Social Good , 2018, COMPASS.

[28]  Nicholas J. Horton,et al.  Data Science in Statistics Curricula: Preparing Students to “Think with Data” , 2014, 1410.3127.

[29]  Philip J. Guo,et al.  Paradise unplugged: identifying barriers for female participation on stack overflow , 2016, SIGSOFT FSE.

[30]  James D. Herbsleb,et al.  Impression formation in online peer production: activity traces and personal profiles in github , 2013, CSCW.

[31]  Benjamin S. Baumer,et al.  Tidy data , 2022, Modern Data Science with R.

[32]  Brian P. Bailey,et al.  Voyant: generating structured feedback on visual designs using a crowd of non-experts , 2014, CSCW.

[33]  Steve Oney,et al.  How Data Scientists Use Computational Notebooks for Real-Time Collaboration , 2019, Proc. ACM Hum. Comput. Interact..

[34]  E. Edgington Review of The Discovery of Grounded Theory: Strategies for Qualitative Research. , 1967 .

[35]  Laura A. Pasquini,et al.  The #acadv Community: Networked Practices, Professional Development, and Ongoing Knowledge Sharing in Advising , 2019, NACADA Journal.

[36]  Fang Wu,et al.  Social Networks that Matter: Twitter Under the Microscope , 2008, First Monday.

[37]  M.,et al.  Sense of community: A definition and theory , 1986 .

[38]  Greg Wilson,et al.  Software Carpentry: Getting Scientists to Write Better Code by Making Them More Productive , 2006, Computing in Science & Engineering.

[39]  Amy X. Zhang,et al.  How do Data Science Workers Collaborate? Roles, Workflows, and Tools , 2020, Proc. ACM Hum. Comput. Interact..

[40]  Jeffrey Heer,et al.  Proactive wrangling: mixed-initiative end-user programming of data transformation scripts , 2011, UIST.

[41]  Michael J. Muller,et al.  How Data Science Workers Work with Data: Discovery, Capture, Curation, Design, Creation , 2019, CHI.

[42]  Jennifer Marlow,et al.  Activity traces and signals in software developer recruitment and hiring , 2013, CSCW.

[43]  Coye Cheshire,et al.  Readers are not free-riders: reading as a form of participation on wikipedia , 2010, CSCW '10.

[44]  Jennifer Marlow,et al.  From rookie to all-star: professional development in a graphic design social networking site , 2014, CSCW.

[45]  Amy Bruckman,et al.  Becoming Wikipedian: transformation of participation in a collaborative online encyclopedia , 2005, GROUP.

[46]  Daniel M. Germán,et al.  How the R Community Creates and Curates Knowledge: A Comparative Study of Stack Overflow and Mailing Lists , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[47]  B. Wellman,et al.  Imagining Twitter as an Imagined Community , 2011 .

[48]  Aniket Kittur,et al.  A Contingency View of Transferring and Adapting Best Practices within Online Communities , 2016, CSCW.

[49]  Michele Zappavigna,et al.  Ambient affiliation: A linguistic perspective on Twitter , 2011, New Media Soc..

[50]  Aniket Kittur,et al.  The polymath project: lessons from a successful online collaboration in mathematics , 2011, CHI.

[51]  Etienne Wenger,et al.  Situated Learning: Legitimate Peripheral Participation , 1991 .

[52]  James D. Herbsleb,et al.  Social coding in GitHub: transparency and collaboration in an open software repository , 2012, CSCW.

[53]  J. Carlson Avoiding Traps in Member Checking , 2010 .

[54]  Amy Bruckman,et al.  Growing Their Own: Legitimate Peripheral Participation for Computational Learning in an Online Fandom Community , 2017, CSCW.

[55]  Colin M. Gray,et al.  Supporting Distributed Critique through Interpretation and Sense-Making in an Online Creative Community , 2017, Proc. ACM Hum. Comput. Interact..

[56]  C. W. Morris Imagined communities: Reflections on the origin and spread of nationalism , 1995 .

[57]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[58]  William Snyder,et al.  Cultivating Communities of Practice: A Guide to Managing Knowledge , 2002 .

[59]  Brian P. Bailey,et al.  What do you think?: a case study of benefit, expectation, and interaction in a large online critique community , 2012, CSCW.

[60]  Brad A. Myers,et al.  The Story in the Notebook: Exploratory Data Science using a Literate Programming Tool , 2018, CHI.

[61]  Chris Parnin,et al.  "We Don't Do That Here": How Collaborative Editing with Mentors Improves Engagement in Social Q&A Communities , 2018, CHI.

[62]  Steven J. Jackson,et al.  Data Vision: Learning to See Through Algorithmic Abstraction , 2017, CSCW.

[63]  MAYA HOLIKATTI,et al.  Learning to Airbnb by Engaging in Online Communities of Practice , 2019, Proc. ACM Hum. Comput. Interact..

[64]  Quentin Jones,et al.  Virtual-Communities, Virtual Settlements & Cyber-Archaeology: A Theoretical Outline , 2006, J. Comput. Mediat. Commun..

[65]  R. Caruana,et al.  Data Diff: Interpretable, Executable Summaries of Changes in Distributions for Data Wrangling , 2018, KDD.

[66]  Cecilia Loureiro-Koechlin,et al.  The Emergence of Converging Communities via Twitter , 2013, J. Community Informatics.

[67]  Jeffrey Heer,et al.  Enterprise Data Analysis and Visualization: An Interview Study , 2012, IEEE Transactions on Visualization and Computer Graphics.

[68]  Carsten S. Østerlund,et al.  Planet hunters and seafloor explorers: legitimate peripheral participation through practice proxies in online citizen science , 2014, CSCW.