Mining developers' communication to assess software quality: promises, challenges, perils

In recent years, researchers are building models relying on a wide variety of data that can be extracted from software repositories, concerning for example characteristics of source code changes, or be related to bug introduction and fixing. Software repositories also contain a huge amount of non-structured information, often expressed in natural language, concerning communication between developers, as well as tags, commit notes, or comments developers produce during their activities. This keynote illustrates, on the one hand, how explanatory or predictive models build upon software repositories could be enhanced by integrating them with the analysis of communication among developers. On the other hand, the keynote warns agains perils in doing that, due to the intrinsic imprecision and incompleteness of such a textual information, and explains how such problems could, at least, be mitigated.

[1]  Romain Robbes,et al.  Linking e-mails and source code artifacts , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[2]  Gail C. Murphy,et al.  Reducing the effort of bug report triage: Recommenders for development-oriented decisions , 2011, TSEM.

[3]  Dan Klein,et al.  Fast Exact Inference with a Factored Model for Natural Language Parsing , 2002, NIPS.

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..