Automating App Review Response Generation

Previous studies showed that replying to a user review usually has a positive effect on the rating that is given by the user to the app. For example, Hassan et al. found that responding to a review increases the chances of a user updating their given rating by up to six times compared to not responding. To alleviate the labor burden in replying to the bulk of user reviews, developers usually adopt a template-based strategy where the templates can express appreciation for using the app or mention the company email address for users to follow up. However, reading a large number of user reviews every day is not an easy task for developers. Thus, there is a need for more automation to help developers respond to user reviews. Addressing the aforementioned need, in this work we propose a novel approach RRGen that automatically generates review responses by learning knowledge relations between reviews and their responses. RRGen explicitly incorporates review attributes, such as user rating and review length, and learns the relations between reviews and corresponding responses in a supervised way from the available training data. Experiments on 58 apps and 309,246 review-response pairs highlight that RRGen outperforms the baselines by at least 67.4% in terms of BLEU-4 (an accuracy measure that is widely used to evaluate dialogue response generation systems). Qualitative analysis also confirms the effectiveness of RRGen in generating relevant and accurate responses.

[1]  Xinya Du,et al.  Harvesting Paragraph-level Question-Answer Pairs from Wikipedia , 2018, ACL.

[2]  S. Ejaz Ahmed Effect Sizes for Research: A Broad Application Approach , 2006, Technometrics.

[3]  Collin McMillan,et al.  Towards Automatic Generation of Short Summaries of Commits , 2017, 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC).

[4]  Per Runeson,et al.  Four commentaries on the use of students and professionals in empirical software engineering experiments , 2018, Empirical Software Engineering.

[5]  Nicole Novielli,et al.  A Benchmark Study on Sentiment Analysis for Software Engineering Research , 2018, 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR).

[6]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[7]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[8]  Samy Bengio,et al.  Links between perceptrons, MLPs and SVMs , 2004, ICML.

[9]  Nebojsa Jojic,et al.  Steering Output Style and Topic in Neural Response Generation , 2017, EMNLP.

[10]  Xiaodong Gu,et al.  Deep API learning , 2016, SIGSOFT FSE.

[11]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[12]  Mirella Lapata,et al.  Language to Logical Form with Neural Attention , 2016, ACL.

[13]  Harald C. Gall,et al.  What would users change in my app? summarizing app reviews for recommending software changes , 2016, SIGSOFT FSE.

[14]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[15]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[16]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[17]  Tung Thanh Nguyen,et al.  Mining User Opinions in Mobile App Reviews: A Keyword-Based Approach (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[18]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[19]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[20]  Yuanyuan Zhang,et al.  A Survey of App Store Analysis for Software Engineering , 2017, IEEE Transactions on Software Engineering.

[21]  Xiaodong Gu,et al.  "What Parts of Your Apps are Loved by Users?" (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[22]  Jing Li,et al.  Topic Memory Networks for Short Text Classification , 2018, EMNLP.

[23]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[24]  Lawrence Davis,et al.  Training Feedforward Neural Networks Using Genetic Algorithms , 1989, IJCAI.

[25]  Jianfeng Gao,et al.  A Neural Network Approach to Context-Sensitive Generation of Conversational Responses , 2015, NAACL.

[26]  Rachel Harrison,et al.  What are you complaining about?: a study of online reviews of mobile applications , 2013, BCS HCI.

[27]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[28]  Quoc V. Le,et al.  A Neural Conversational Model , 2015, ArXiv.

[29]  Walid Maalej,et al.  Bug report, feature request, or simply praise? On automatically classifying app reviews , 2015, 2015 IEEE 23rd International Requirements Engineering Conference (RE).

[30]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[31]  Cor-Paul Bezemer,et al.  Studying the dialogue between users and developers of free apps in the Google Play Store , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[32]  Hang Li,et al.  An Information Retrieval Approach to Short Text Conversation , 2014, ArXiv.

[33]  Dongyan Zhao,et al.  An Ensemble of Retrieval-Based and Generation-Based Human-Computer Conversation Systems , 2018, IJCAI.

[34]  Arvid Kappas,et al.  Sentiment in short strength detection informal text , 2010, J. Assoc. Inf. Sci. Technol..

[35]  Harald C. Gall,et al.  Exploring the integration of user feedback in automated testing of Android applications , 2018, 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[36]  Ning Chen,et al.  AR-miner: mining informative reviews for developers from mobile app marketplace , 2014, ICSE.

[37]  Natalia Juristo Juzgado,et al.  Are Students Representatives of Professionals in Software Engineering Experiments? , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[38]  Ahmed E. Hassan,et al.  Is It Worth Responding to Reviews? Studying the Top Free Apps in Google Play , 2017, IEEE Software.

[39]  Collin McMillan,et al.  Automatically generating commit messages from diffs using neural machine translation , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[40]  Michael R. Lyu,et al.  What You Say and How You Say it: Joint Modeling of Topics and Discourse in Microblog Conversations , 2019, TACL.

[41]  Walid Maalej,et al.  How Do Users Like This Feature? A Fine Grained Sentiment Analysis of App Reviews , 2014, 2014 IEEE 22nd International Requirements Engineering Conference (RE).

[42]  Zhenchang Xing,et al.  Neural-Machine-Translation-Based Commit Message Generation: How Far Are We? , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[43]  Michael R. Lyu,et al.  Emerging App Issue Identification from User Feedback: Experience on WeChat , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[44]  Zhizheng Wu,et al.  Investigating gated recurrent neural networks for speech synthesis , 2016, ArXiv.

[45]  Maleknaz Nayebi,et al.  ESSMArT Way to Manage User Requests , 2018, ArXiv.

[46]  Jae-Gil Lee,et al.  Facilitating developer-user interactions with mobile app review digests , 2013, CHI Extended Abstracts.

[47]  Harry Shum,et al.  The Design and Implementation of XiaoIce, an Empathetic Social Chatbot , 2018, CL.

[48]  Yang Feng,et al.  Knowledge Diffusion for Neural Dialogue Generation , 2018, ACL.

[49]  Michael R. Lyu,et al.  Experience Report: Understanding Cross-Platform App Issues from User Reviews , 2016, 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE).

[50]  Yuanyuan Zhang,et al.  App store mining and analysis: MSR for app stores , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[51]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[52]  Mir Riyanul Islam Numeric rating of Apps on Google Play Store by sentiment analysis on user reviews , 2014, 2014 International Conference on Electrical Engineering and Information & Communication Technology.

[53]  David Lo,et al.  Practitioners' expectations on automated fault localization , 2016, ISSTA.

[54]  Jingyuan Li,et al.  A Syntactically Constrained Bidirectional-Asynchronous Approach for Emotional Conversation Generation , 2018, EMNLP.

[55]  King-Ip Lin,et al.  Review spam detector with rating consistency check , 2013, ACMSE '13.

[56]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[57]  Michael R. Lyu,et al.  Online App Review Analysis for Identifying Emerging Issues , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[58]  Harald C. Gall,et al.  Recommending and Localizing Change Requests for Mobile Apps Based on User Reviews , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[59]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[60]  Anton Leuski,et al.  Evaluating Conversational Characters Created through Question Generation , 2011, FLAIRS.

[61]  Joelle Pineau,et al.  How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation , 2016, EMNLP.

[62]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[63]  Hui Xu,et al.  AR-Tracker: Track the Dynamics of Mobile Apps via User Review Mining , 2015, 2015 IEEE Symposium on Service-Oriented System Engineering.

[64]  Alan Ritter,et al.  Data-Driven Response Generation in Social Media , 2011, EMNLP.

[65]  Christian Bird,et al.  Deep learning type inference , 2018, ESEC/SIGSOFT FSE.

[66]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[67]  Rachel Harrison,et al.  Retrieving and analyzing mobile apps feature requests from online reviews , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[68]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.