Detecting speech act types in developer question/answer conversations during bug repair

This paper targets the problem of speech act detection in conversations about bug repair. We conduct a ``Wizard of Oz'' experiment with 30 professional programmers, in which the programmers fix bugs for two hours, and use a simulated virtual assistant for help. Then, we use an open coding manual annotation procedure to identify the speech act types in the conversations. Finally, we train and evaluate a supervised learning algorithm to automatically detect the speech act types in the conversations. In 30 two-hour conversations, we made 2459 annotations and uncovered 26 speech act types. Our automated detection achieved 69% precision and 50% recall. The key application of this work is to advance the state of the art for virtual assistants in software engineering. Virtual assistant technology is growing rapidly, though applications in software engineering are behind those in other areas, largely due to a lack of relevant data and experiments. This paper targets this problem in the area of developer Q/A conversations about bug repair.

[1]  W. R. Howard Conversational Informatics: An Engineering Approach , 2008 .

[2]  Bogdan Dit,et al.  Integrated impact analysis for managing software changes , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[3]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[4]  Collin McMillan,et al.  Do Programmers do Change Impact Analysis in Debugging? , 2016, Empirical Software Engineering.

[5]  Tim Menzies,et al.  Automated severity assessment of software defect reports , 2008, 2008 IEEE International Conference on Software Maintenance.

[6]  Marilyn A. Walker,et al.  Quantitative and Qualitative Evaluation of Darpa Communicator Spoken Dialogue Systems , 2001, ACL.

[7]  Gail C. Murphy,et al.  Summarizing software artifacts: a case study of bug reports , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[8]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[9]  Norbert Reithinger,et al.  Utilizing Statistical Dialogue Act Processing in Verbrnobil , 1995, ACL.

[10]  Johanna D. Moore,et al.  Dynamic generation of follow up question menus: facilitating interactive natural language dialogues , 1995, CHI '95.

[11]  Tanja Schultz,et al.  Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers , 2007, HLT-NAACL 2007.

[12]  Johanna D. Moore Participating in explanatory dialogues , 1994 .

[13]  Stevan Harnad,et al.  Symbol grounding problem , 1990, Scholarpedia.

[14]  David W. Binkley,et al.  Understanding LDA in source code analysis , 2014, ICPC 2014.

[15]  Barry Boehm,et al.  A view of 20th and 21st century software engineering , 2006, ICSE.

[16]  Javier Escobar-Avila,et al.  Text Retrieval-Based Tagging of Software Engineering Video Tutorials , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C).

[17]  P. Have Doing conversation analysis , 2007 .

[18]  Janice Singer,et al.  How software engineers use documentation: the state of the practice , 2003, IEEE Software.

[19]  Johanna D. Moore,et al.  Fish or Fowl:A Wizard of Oz Evaluation of Dialogue Strategies in the Restaurant Domain , 2002, LREC.

[20]  Artur Dubrawski,et al.  Classification of Time Sequences using Graphs of Temporal Constraints , 2017, J. Mach. Learn. Res..

[21]  Collin McMillan,et al.  Improving automated source code summarization via an eye-tracking study of programmers , 2014, ICSE.

[22]  Ursula Faber,et al.  Sequence Organization In Interaction A Primer In Conversation Analysis , 2016 .

[23]  Thomas D. LaToza,et al.  Maintaining mental models: a study of developer work habits , 2006, ICSE.

[24]  Other Contributors Are Indicated Where They Contribute The Eclipse Foundation , 2017 .

[25]  Sandra A. Thompson,et al.  The language of turn and sequence , 2002 .

[26]  Jane Cleland-Huang,et al.  What Requirements Knowledge Do Developers Need to Manage Change in Safety-Critical Systems? , 2017, 2017 IEEE 25th International Requirements Engineering Conference (RE).

[27]  Laurel D. Riek,et al.  Wizard of Oz studies in HRI , 2012, J. Hum. Robot Interact..

[28]  James D. Herbsleb,et al.  Social coding in GitHub: transparency and collaboration in an open software repository , 2012, CSCW.

[29]  Gabriele Bavota,et al.  On-demand Developer Documentation , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[30]  Joelle Pineau,et al.  A Survey of Available Corpora for Building Data-Driven Dialogue Systems , 2015, Dialogue Discourse.

[31]  Jane Cleland-Huang,et al.  TiQi: answering unstructured natural language trace queries , 2015, Requirements Engineering.

[32]  Yi Zhang,et al.  Classifying Software Changes: Clean or Buggy? , 2008, IEEE Transactions on Software Engineering.

[33]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[34]  Arthur C. Graesser,et al.  AutoTutor and affective autotutor: Learning by talking with cognitively and emotionally intelligent computers that talk back , 2012, TIIS.

[35]  Susan Bull,et al.  Bringing chatbots into education: Towards natural language negotiation of open learner models , 2006, Knowl. Based Syst..

[36]  K. Bach,et al.  Linguistic Communication and Speech Acts , 1983 .

[37]  Claude Petitpierre,et al.  Querypoint: moving backwards on wrong values in the buggy execution , 2011, ESEC/FSE '11.

[38]  Jonathan I. Maletic,et al.  An eye-tracking study on the role of scan time in finding source code defects , 2012, ETRA.

[39]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[40]  D. Hosmer,et al.  Logistic Regression, Conditional , 2005 .

[41]  A. Cawsey Book Reviews: Participating in Explanatory Dialogues: Interpreting and Responding to Questions in Context , 1995, CL.

[42]  Erik M. Altmann,et al.  Near-term memory in programming: a simulation-based analysis , 2001, Int. J. Hum. Comput. Stud..

[43]  Hua Ai,et al.  Comparing User Simulation Models For Dialog Strategy Learning , 2007, HLT-NAACL.

[44]  Mary McGee Wood,et al.  Squibs and Discussions: Evaluating Discourse and Dialogue Coding Schemes , 2005, CL.

[45]  Giuseppe Carenini,et al.  Summarizing Spoken and Written Conversations , 2008, EMNLP.

[46]  J. Searle What is a Speech Act , 1996 .

[47]  Timothy Lethbridge,et al.  The relevance of software documentation, tools and technologies: a survey , 2002, DocEng '02.

[48]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[49]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[50]  Collin McMillan,et al.  A Case Study of Automated Feature Location Techniques for Industrial Cost Estimation , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[51]  Charu C. Aggarwal,et al.  Mining Text Data , 2012, Springer US.

[52]  Oliver Lemon,et al.  Adaptive Natural Language Generation , 2011 .

[53]  Giuliano Antoniol,et al.  The Use of Text Retrieval and Natural Language Processing in Software Engineering , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[54]  Gail C. Murphy,et al.  Automatic Summarization of Bug Reports , 2014, IEEE Transactions on Software Engineering.

[55]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[56]  Hui Ye,et al.  Agenda-Based User Simulation for Bootstrapping a POMDP Dialogue System , 2007, NAACL.

[57]  Jonathan Sillito,et al.  Searching and skimming: An exploratory study , 2009, 2009 IEEE International Conference on Software Maintenance.

[58]  Robert J. Walker,et al.  Systematizing pragmatic software reuse , 2012, TSEM.

[59]  Srinivas Bangalore,et al.  Natural Language Generation in Interactive Systems , 2014 .

[60]  Angelo C. Loula,et al.  S Symbol Grounding Problem , 2019 .

[61]  Collin McMillan,et al.  Detecting Vague Words & Phrases in Requirements Documents in a Multilingual Environment , 2017, 2017 IEEE 25th International Requirements Engineering Conference (RE).

[62]  Oliver Lemon,et al.  Reinforcement Learning for Adaptive Dialogue Systems - A Data-driven Methodology for Dialogue Management and Natural Language Generation , 2011, Theory and Applications of Natural Language Processing.

[63]  James Noble,et al.  Using grounded theory to study the human aspects of software engineering , 2010, HAoSE '10.

[64]  Gail C. Murphy,et al.  Asking and Answering Questions during a Programming Change Task , 2008, IEEE Transactions on Software Engineering.

[65]  E. Schegloff Sequence Organization in Interaction: Contents , 2007 .

[66]  Collin McMillan,et al.  Detecting User Story Information in Developer-Client Conversations to Generate Extractive Summaries , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[67]  J. Searle,et al.  Speech act theory and pragmatics , 1980 .

[68]  Fernando Nogueira,et al.  Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning , 2016, J. Mach. Learn. Res..

[69]  Wayne G. Lutters,et al.  Behind the curtain: lessons learned from a Wizard of Oz field experiment , 2003, SIGG.

[70]  N. Timasheff,et al.  On Methods in the Social Sciences , 1945 .

[71]  Oliver Lemon,et al.  Learning what to say and how to say it: Joint optimisation of spoken dialogue management and natural language generation , 2011, Comput. Speech Lang..

[72]  Arne Jönsson,et al.  Wizard of Oz studies: why and how , 1993, IUI '93.

[73]  Robert DeLine,et al.  Information Needs in Collocated Software Development Teams , 2007, 29th International Conference on Software Engineering (ICSE'07).

[74]  Martin P. Robillard,et al.  Recommendation Systems for Software Engineering , 2010, IEEE Software.

[75]  Arne Jönsson,et al.  Wizard of Oz studies -- why and how , 1993, Knowl. Based Syst..

[76]  Thomas Fritz,et al.  Context-Aware Conversational Developer Assistants , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[77]  Brad A. Myers,et al.  Extracting and answering why and why not questions about Java program output , 2010, TSEM.

[78]  Diane J. Litman,et al.  The relative impact of student affect on performance models in a spoken dialogue tutoring system , 2008, User Modeling and User-Adapted Interaction.

[79]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[80]  Brad A. Myers,et al.  Designing the whyline: a debugging interface for asking questions about program behavior , 2004, CHI.

[81]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[82]  Rainer Koschke,et al.  How do professional developers comprehend software? , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[83]  Rebecca J. Passonneau,et al.  Intention-Based Segmentation: Human Reliability and Correlation with Linguistic Cues , 1993, ACL.

[84]  Charles L. A. Clarke,et al.  Archetypal source code searches: a survey of software developers and maintainers , 1998, Proceedings. 6th International Workshop on Program Comprehension. IWPC'98 (Cat. No.98TB100242).