Summarizing Complex Development Artifacts by Mining Heterogeneous Data

Summarization is hailed as a promising approach to reduce the amount of information that must be taken in by the person who wants to understand development artifacts, such as pieces of code, bug reports, emails, etc. However, existing approaches treat artifacts as pure textual entities, disregarding the heterogeneous and partially structured nature of most artifacts, which contain intertwined pieces of distinct type, such as source code, diffs, stack traces, human language, etc. We present a novel approach to augment existing summarization techniques (such as LexRank) to deal with the heterogeneous and multidimensional nature of complex artifacts. Our preliminary results on heterogeneous artifacts suggest our approach outperforms the current text-based approaches.

[1]  Collin McMillan,et al.  Improving automated source code summarization via an eye-tracking study of programmers , 2014, ICSE.

[2]  Martin P. Robillard,et al.  Recommendation Systems for Software Engineering , 2010, IEEE Software.

[3]  Gabriele Bavota,et al.  Mining StackOverflow to turn the IDE into a self-confident programming prompter , 2014, MSR 2014.

[4]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[5]  Andrea De Lucia,et al.  Using IR methods for labeling source code artifacts: Is it worthwhile? , 2012, 2012 20th IEEE International Conference on Program Comprehension (ICPC).

[6]  Senthil Mani,et al.  AUSUM: approach for unsupervised bug report summarization , 2012, SIGSOFT FSE.

[7]  Andrian Marcus,et al.  On the Use of Automated Text Summarization Techniques for Summarizing Source Code , 2010, 2010 17th Working Conference on Reverse Engineering.

[8]  Gail C. Murphy,et al.  Generating natural language summaries for crosscutting source code concerns , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[9]  Lori L. Pollock,et al.  Automatic generation of natural language summaries for Java classes , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[10]  Leon Moonen,et al.  Generating robust parsers using island grammars , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[11]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[12]  Michele Lanza,et al.  Extracting structured data from natural language documents with island parsing , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[13]  Krzysztof Czarnecki,et al.  Modelling the ‘hurried’ bug report reading process to summarize bug reports , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[14]  Cristina V. Lopes,et al.  Archetypal Internet-Scale Source Code Searching , 2008, OSS.

[15]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[16]  Gabriele Bavota,et al.  Automatic generation of release notes , 2014, SIGSOFT FSE.

[17]  Gail C. Murphy,et al.  Automatic Summarization of Bug Reports , 2014, IEEE Transactions on Software Engineering.