Extracting problematic API features from forum discussions

Software engineering activities often produce large amounts of unstructured data. Useful information can be extracted from such data to facilitate software development activities, such as bug reports management and documentation provision. Online forums, in particular, contain extensive valuable information that can aid in software development. However, no work has been done to extract problematic API features from online forums. In this paper, we investigate ways to extract problematic API features that are discussed as a source of difficulty in each thread, using natural language processing and sentiment analysis techniques. Based on a preliminary manual analysis of the content of a discussion thread and a categorization of the role of each sentence therein, we decide to focus on a negative sentiment sentence and its close neighbors as a unit for extracting API features. We evaluate a set of candidate solutions by comparing tool-extracted problematic API design features with manually produced golden test data. Our best solution yields a precision of 89%. We have also investigated three potential applications for our feature extraction solution: (i) highlighting the negative sentence and its neighbors to help illustrate the main API feature; (ii) searching helpful online information using the extracted API feature as a query; (iii) summarizing the problematic features to reveal the “hot topics” in a forum.

[1]  Daqing Hou,et al.  Evaluating forum discussions to inform the design of an API critic , 2012, 2012 20th IEEE International Conference on Program Comprehension (ICPC).

[2]  Roger Levy,et al.  Tregex and Tsurgeon: tools for querying and manipulating tree data structures , 2006, LREC.

[3]  Steven Abney,et al.  Part-of-Speech Tagging and Partial Parsing , 1997 .

[4]  Gerardo Canfora,et al.  Mining source code descriptions from developer communications , 2012, 2012 20th IEEE International Conference on Program Comprehension (ICPC).

[5]  Sunita Sarawagi,et al.  Information Extraction , 2008 .

[6]  Krzysztof Czarnecki,et al.  Modelling the ‘hurried’ bug report reading process to summarize bug reports , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[7]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[8]  Lin Li,et al.  Obstacles in Using Frameworks and APIs: An Exploratory Study of Programmers' Newsgroup Discussions , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[9]  Gail C. Murphy,et al.  Summarizing software artifacts: a case study of bug reports , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[10]  David Lo,et al.  Finding relevant answers in software forums , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[11]  Alberto Bacchelli,et al.  Content classification of development emails , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[12]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.