Deficient documentation detection a methodology to locate deficient project documentation using topic analysis

A project's documentation is the primary source of information for developers using that project. With hundreds of thousands of programming-related questions posted on programming Q&A websites, such as Stack Overflow, we question whether the developer-written documentation provides enough guidance for programmers. In this study, we wanted to know if there are any topics which are inadequately covered by the project documentation. We combined questions from Stack Overflow and documentation from the PHP and Python projects. Then, we applied topic analysis to this data using latent Dirichlet allocation (LDA), and found topics in Stack Overflow that did not overlap the project documentation. We successfully located topics that had deficient project documentation. We also found topics in need of tutorial documentation that were outside of the scope of the PHP or Python projects, such as MySQL and HTML.

[1]  Robert C. Williges,et al.  An Evaluation of Critical Incidents for Software Documentation Design , 1986 .

[2]  Timothy Lethbridge,et al.  The relevance of software documentation, tools and technologies: a survey , 2002, DocEng '02.

[3]  Janice Singer,et al.  How software engineers use documentation: the state of the practice , 2003, IEEE Software.

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Letha H. Etzkorn,et al.  Source Code Retrieval for Bug Localization Using Latent Dirichlet Allocation , 2008, 2008 15th Working Conference on Reverse Engineering.

[6]  Pierre Baldi,et al.  Mining the coherence of GNOME bug reports with statistical topic models , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[7]  Christoph Treude,et al.  Measuring API documentation on the web , 2011, Web2SE '11.

[8]  Ahmed E. Hassan,et al.  What are developers talking about? An analysis of topics and trends in Stack Overflow , 2014, Empirical Software Engineering.

[9]  Reid Holmes,et al.  Automatically locating relevant programming help online , 2012, 2012 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[10]  Golara Garousi,et al.  A Hybrid Methodology for Analyzing Software Documentation Quality and Usage , 2012 .

[11]  Christoph Treude,et al.  Crowd Documentation : Exploring the Coverage and the Dynamics of API Discussions on Stack Overflow , 2012 .

[12]  Hewijin Christine Jiau,et al.  Facing up to the inequality of crowdsourced API documentation , 2012, SOEN.

[13]  Frank Maurer,et al.  What makes a good code example?: A study of programming Q&A in StackOverflow , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).