A Neural Question Answering System for Basic Questions about Subroutines

A question answering (QA) system is a type of conversational AI that generates natural language answers to questions posed by human users. QA systems often form the backbone of interactive dialogue systems, and have been studied extensively for a wide variety of tasks ranging from restaurant recommendations to medical diagnostics. Dramatic progress has been made in recent years, especially from the use of encoderdecoder neural architectures trained with big data input. In this paper, we take initial steps to bringing state-of-the-art neural QA technologies to Software Engineering applications by designing a context-based QA system for basic questions about subroutines. We curate a training dataset of 10.9 million question/context/answer tuples based on rules we extract from recent empirical studies. Then, we train a custom neural QA model with this dataset and evaluate the model in a study with professional programmers. We demonstrate the strengths and weaknesses of the system, and lay the groundwork for its use in eventual dialogue systems for software engineering.

[1]  Collin McMillan,et al.  Detecting speech act types in developer question/answer conversations during bug repair , 2018, ESEC/SIGSOFT FSE.

[2]  David DeVault,et al.  Challenges in Building Highly-Interactive Dialog Systems , 2017, AI Mag..

[3]  Mario Fritz,et al.  Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Collin McMillan,et al.  A Neural Model for Generating Natural Language Summaries of Program Subroutines , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[5]  Deng Cai,et al.  Dialogue Act Recognition via CRF-Attentive Structured Network , 2017, SIGIR.

[6]  Michael Eichberg,et al.  What should developers be aware of? An empirical study on the directives of API documentation , 2011, Empirical Software Engineering.

[7]  Marilyn A. Walker,et al.  Evaluating Interactive Dialogue Systems: Extending Component Evaluation to Integrated System Evaluation , 1997, Real Applications@ACL/EACL.

[8]  Yurong Liu,et al.  A survey of deep neural network architectures and their applications , 2017, Neurocomputing.

[9]  Mariana L. Neves,et al.  Neural Domain Adaptation for Biomedical Question Answering , 2017, CoNLL.

[10]  Alonso H. Vera,et al.  No AI Is an Island: The Case for Teaming Intelligence , 2019, AI Mag..

[11]  Mary McGee Wood,et al.  Squibs and Discussions: Evaluating Discourse and Dialogue Coding Schemes , 2005, CL.

[12]  Denys Poshyvanyk,et al.  SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair , 2018, IEEE Transactions on Software Engineering.

[13]  Ausif Mahmood,et al.  Review of Deep Learning Algorithms and Architectures , 2019, IEEE Access.

[14]  Gwendal Daniel,et al.  OpenAPI Bot: A Chatbot to Help You Understand REST APIs , 2020, ICWE.

[15]  Yoshua Bengio,et al.  Professor Forcing: A New Algorithm for Training Recurrent Networks , 2016, NIPS.

[16]  Philip S. Yu,et al.  Multi-modal Attention Network Learning for Semantic Source Code Retrieval , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[17]  Giuliano Antoniol,et al.  The Use of Text Retrieval and Natural Language Processing in Software Engineering , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[18]  Phil Blunsom,et al.  Recurrent Convolutional Neural Networks for Discourse Compositionality , 2013, CVSM@ACL.

[19]  Martin P. Robillard,et al.  Patterns of Knowledge in API Reference Documentation , 2013, IEEE Transactions on Software Engineering.

[20]  Klaus-Robert Müller,et al.  Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models , 2017, ArXiv.

[21]  Erik Cambria,et al.  Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..

[22]  Rainer Koschke,et al.  On the Comprehension of Program Comprehension , 2014, TSEM.

[23]  Zachary Eberhart,et al.  The Apiza Corpus: API Usage Dialogues with a Simulated Virtual Assistant , 2020, ArXiv.

[24]  Emily Hill,et al.  Towards automatically generating summary comments for Java methods , 2010, ASE.

[25]  Graham Neubig,et al.  Retrieval-Based Neural Code Generation , 2018, EMNLP.

[26]  Mohammed J. Zaki,et al.  GraphFlow: Exploiting Conversation Flow with Graph Neural Networks for Conversational Machine Comprehension , 2019, IJCAI.

[27]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[28]  Derek Chen,et al.  Decoupling Strategy and Generation in Negotiation Dialogues , 2018, EMNLP.

[29]  Oliver Lemon,et al.  Reinforcement Learning for Adaptive Dialogue Systems - A Data-driven Methodology for Dialogue Management and Natural Language Generation , 2011, Theory and Applications of Natural Language Processing.

[30]  Uri Alon,et al.  code2vec: learning distributed representations of code , 2018, Proc. ACM Program. Lang..

[31]  Martin P. Robillard,et al.  A field study of API learning obstacles , 2011, Empirical Software Engineering.

[32]  Collin McMillan,et al.  Recommendations for Datasets for Source Code Summarization , 2019, NAACL.

[33]  Omer Levy,et al.  code2seq: Generating Sequences from Structured Representations of Code , 2018, ICLR.

[34]  David Lo,et al.  APIBot: Question answering bot for API documentation , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[35]  Brad A. Myers,et al.  Designing the whyline: a debugging interface for asking questions about program behavior , 2004, CHI.

[36]  Marcel van Gerven,et al.  Explanation Methods in Deep Learning: Users, Values, Concerns and Challenges , 2018, ArXiv.

[37]  Ying Zhang,et al.  Task-Oriented Conversation Generation Using Heterogeneous Memory Networks , 2019, EMNLP.

[38]  Lori L. Pollock,et al.  Automatically detecting and describing high level actions within methods , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[39]  Thomas Fritz,et al.  Context-Aware Conversational Developer Assistants , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[40]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[41]  Gabriele Bavota,et al.  A Large-Scale Empirical Study on Linguistic Antipatterns Affecting APIs , 2018, 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[42]  Oliver Lemon,et al.  Learning what to say and how to say it: Joint optimisation of spoken dialogue management and natural language generation , 2011, Comput. Speech Lang..

[43]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[44]  Kenji Doya,et al.  Recurrent Networks : Learning Algorithms ∗ , 2002 .

[45]  Jiamou Liu,et al.  Finding Answers from the Word of God: Domain Adaptation for Neural Networks in Biblical Question Answering , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[46]  Collin McMillan,et al.  Improved Code Summarization via a Graph Neural Network , 2020, 2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC).

[47]  Harshit Kumar,et al.  Dialogue Act Sequence Labeling using Hierarchical encoder with CRF , 2017, AAAI.

[48]  Jane Cleland-Huang,et al.  TiQi: answering unstructured natural language trace queries , 2015, Requirements Engineering.

[49]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[50]  Michael A. Arbib,et al.  The handbook of brain theory and neural networks , 1995, A Bradford book.

[51]  Nicole Beringer,et al.  PROMISE - A Procedure for Multimodal Interactive System Evaluation , 2002 .

[52]  Sushil Krishna Bajracharya,et al.  Sourcerer: mining and searching internet-scale software repositories , 2008, Data Mining and Knowledge Discovery.

[53]  Richard Socher,et al.  Dynamic Memory Networks for Visual and Textual Question Answering , 2016, ICML.

[54]  Qingkai Zeng,et al.  Crossing Variational Autoencoders for Answer Retrieval , 2020, ACL.

[55]  Stephanie B. Steinhardt,et al.  Application Programming Interface Documentation: What Do Software Developers Want? , 2018 .

[56]  Ehud Reiter,et al.  Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[57]  Antonette M. Logar,et al.  A comparison of recurrent neural network learning algorithms , 1993, IEEE International Conference on Neural Networks.

[58]  Jiliang Tang,et al.  A Survey on Dialogue Systems: Recent Advances and New Frontiers , 2017, SKDD.

[59]  Xin Jiang,et al.  Neural Generative Question Answering , 2015, IJCAI.

[60]  Collin McMillan,et al.  Automatic Source Code Summarization of Context for Java Methods , 2016, IEEE Transactions on Software Engineering.

[61]  Lihong Li,et al.  Neural Approaches to Conversational AI , 2019, Found. Trends Inf. Retr..

[62]  Avirup Sil,et al.  The TechQA Dataset , 2019, ACL.

[63]  Gabriele Bavota,et al.  On-demand Developer Documentation , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[64]  Florian Schiel,et al.  Verbmobil Data Collection and Annotation , 2000 .

[65]  Robert K. Yin,et al.  Case Study Research and Applications: Design and Methods , 2017 .

[66]  Emerson R. Murphy-Hill,et al.  When Not to Comment: Questions and Tradeoffs with API Documentation for C++ Projects , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[67]  Mohammed J. Zaki,et al.  Bidirectional Attentive Memory Networks for Question Answering over Knowledge Bases , 2019, NAACL.