Contextual Documentation Referencing on Stack Overflow

Software engineering is knowledge-intensive and requires software developers to continually search for knowledge, often on community question answering platforms such as Stack Overflow. Such information sharing platforms do not exist in isolation, and part of the evidence that they exist in a broader software documentation ecosystem is the common presence of hyperlinks to other documentation resources found in forum posts. With the goal of helping to improve the information diffusion between Stack Overflow and other documentation resources, we conducted a study to answer the question of how and why documentation is referenced in Stack Overflow threads. We sampled and classified 759 links from two different domains, regular expressions and Android development, to qualitatively and quantitatively analyze the links' context and purpose, including attribution, awareness, and recommendations. We found that links on Stack Overflow serve a wide range of distinct purposes, ranging from citation links attributing content copied into Stack Overflow, over links clarifying concepts using Wikipedia pages, to recommendations of software components and resources for background reading. This purpose spectrum has major corollaries, including our observation that links to documentation resources are a reflection of the information needs typical to a technology domain. We contribute a framework and method to analyze the context and purpose of Stack Overflow links, a public dataset of annotated links, and a description of five major observations about linking practices on Stack Overflow. We further point to potential tool support to enhance the information diffusion between Stack Overflow and other documentation resources.

[1]  Christoph Treude,et al.  Crowd Documentation : Exploring the Coverage and the Dynamics of API Discussions on Stack Overflow , 2012 .

[2]  Zhenchang Xing,et al.  Predicting semantically linkable knowledge in developer online forums via convolutional neural network , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[3]  Martin P. Robillard,et al.  A field study of API learning obstacles , 2011, Empirical Software Engineering.

[4]  Zhenchang Xing,et al.  Enhancing Knowledge Sharing in Stack Overflow via Automatic External Web Resources Linking , 2017, 2017 22nd International Conference on Engineering of Complex Computer Systems (ICECCS).

[5]  Nicole Novielli,et al.  How to ask for technical help? Evidence-based guidelines for writing questions on Stack Overflow , 2017, Inf. Softw. Technol..

[6]  Michele Lanza,et al.  Understanding and Classifying the Quality of Technical Forum Questions , 2014, 2014 14th International Conference on Quality Software.

[7]  K. Charmaz,et al.  Constructing Grounded Theory , 2014 .

[8]  Christoph Treude,et al.  SOTorrent: Reconstructing and Analyzing the Evolution of Stack Overflow Posts , 2018, 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR).

[9]  R. Fisher On the Interpretation of χ2 from Contingency Tables, and the Calculation of P , 2010 .

[10]  Christoph Treude,et al.  9.6 Million Links in Source Code Comments: Purpose, Evolution, and Decay , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[11]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[12]  Aline Chevalier,et al.  Web designers and web users: Influence of the ergonomic quality of the web site on the information search , 2006, Int. J. Hum. Comput. Stud..

[13]  Zhenchang Xing,et al.  LinkLive: discovering Web learning resources for developers from Q&A discussions , 2018, World Wide Web.

[14]  Zhenchang Xing,et al.  What do developers search for on the web? , 2017, Empirical Software Engineering.

[15]  Zhenchang Xing,et al.  The structure and dynamics of knowledge network in domain-specific Q&A sites: a case study of stack overflow , 2017, Empirical Software Engineering.

[16]  Alessandro Bozzon,et al.  Asking the right question in collaborative q&a systems , 2014, HT.

[17]  D. McFadden Conditional logit analysis of qualitative choice behavior , 1972 .

[18]  Lena Mamykina,et al.  Design lessons from the fastest q&a site in the west , 2011, CHI.

[19]  David Lo,et al.  An empirical study on developer interactions in StackOverflow , 2013, SAC '13.

[20]  Leif Singer,et al.  A study of innovation diffusion through link sharing on stack overflow , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[21]  Christoph Treude,et al.  How do programmers ask and answer questions on the web?: NIER track , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[22]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[23]  Andreas Schubert,et al.  How developers use API documentation: an observation study , 2019, CDQR.

[24]  Christoph Treude,et al.  AutoSpearman: Automatically Mitigating Correlated Software Metrics for Interpreting Defect Models , 2018, 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[25]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[26]  J. C. Lindenlaub,et al.  Have You Tried ... ? , 1994, Proceedings of 1994 IEEE Frontiers in Education Conference - FIE '94.

[27]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[28]  Alberto Bacchelli,et al.  Quality Questions Need Quality Code: Classifying Code Fragments on Stack Overflow , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[29]  Emerson R. Murphy-Hill,et al.  Is programming knowledge related to age? An exploration of stack overflow , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[30]  Brent J. Hecht,et al.  Examining Wikipedia With a Broader Lens: Quantifying the Value of Wikipedia's Relationships with Other Large-Scale Online Communities , 2018, CHI.

[31]  Christoph Treude,et al.  Understanding Stack Overflow Code Fragments , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[32]  Ahmed E. Hassan,et al.  An Experience Report on Defect Modelling in Practice: Pitfalls and Challenges , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[33]  Stephan Diehl,et al.  Usage and attribution of Stack Overflow code snippets in GitHub projects , 2018, Empirical Software Engineering.

[34]  Li Zhang,et al.  An Empirical Study of Link Sharing in Review Comments , 2017, Communications in Computer and Information Science.

[35]  Martin P. Robillard,et al.  Patterns of Knowledge in API Reference Documentation , 2013, IEEE Transactions on Software Engineering.

[36]  R. Fisher On the Interpretation of χ2 from Contingency Tables, and the Calculation of P , 2018, Journal of the Royal Statistical Society Series A (Statistics in Society).

[37]  Charles A. Sutton,et al.  Why, when, and what: Analyzing Stack Overflow questions by topic, type, and code , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).