Understanding developers' natural language queries with interactive clarification

When performing software maintenance tasks, developers often need to understand a series of background knowledge based on information distributed in different software repositories such as source codes, version control systems and bug tracking systems. An effective way to support developers to understand such knowledge is to provide an integrated knowledge base and allow them to ask questions using natural language. Existing approaches cannot well support natural language questions that involve a series of conceptual relationships and are phrased in a flexible way. In this paper, we propose an interactive approach for understanding developers' natural language queries. The approach can understand a developer's natural language questions phrased in different ways by generating a set of ranked and human-readable candidate questions and getting feedback from the developer. Based on the candidate question confirmed by the developer, the approach can then synthesize an answer by constructing and executing a structural query to the knowledge base. We have implemented a tool following the proposed approach and conducted a user study using the tool. The results show that our approach can help developers get the desired answers more easily and accurately.

[1]  Elnar Hajiyev,et al.  codeQuest: Scalable Source Code Queries with Datalog , 2006, ECOOP.

[2]  Claes Wohlin,et al.  Experimentation in software engineering: an introduction , 2000 .

[3]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[4]  Kris De Volder,et al.  Navigating and querying code without getting lost , 2003, AOSD '03.

[5]  Gerald Reif,et al.  Supporting developers with natural language queries , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[6]  Gerald Reif,et al.  SEON: a pyramid of ontologies for software evolution and its applications , 2012, Computing.

[7]  David Lo,et al.  Active code search: incorporating user feedback to improve code search relevance , 2014, ASE.

[8]  Collin McMillan,et al.  Portfolio: Searching for relevant functions and their usages in millions of lines of code , 2013, TSEM.

[9]  Harald C. Gall,et al.  Distributed and Collaborative Software Analysis , 2010, Collaborative Software Engineering.

[10]  Jinqiu Yang,et al.  Inferring semantically related words from software context , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[11]  David Lo,et al.  Automated construction of a software-specific word similarity database , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[12]  Gail C. Murphy,et al.  Answering conceptual queries with Ferret , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[13]  Thomas Fritz,et al.  Using information fragments to answer the questions developers ask , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[14]  David Lo,et al.  SEWordSim: software-specific word similarity database , 2014, ICSE Companion.

[15]  Abraham Bernstein,et al.  Repositories with iSPARQL and a Software Evolution Ontology , 2007 .

[16]  Emily Hill,et al.  Identifying Word Relations in Software: A Comparative Study of Semantic Similarity Tools , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[17]  Yonggang Zhang,et al.  Empowering Software Maintainers with Semantic Web Technologies , 2007, ESWC.

[18]  Harald C. Gall,et al.  Evaluating a query framework for software evolution data , 2013, TSEM.

[19]  Hongyu Zhang,et al.  Integrating software engineering data using semantic web technologies , 2011, MSR '11.

[20]  Brad A. Myers,et al.  Debugging reinvented , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.