AppFlow: using machine learning to synthesize robust, reusable UI tests

UI testing is known to be difficult, especially as today’s development cycles become faster. Manual UI testing is tedious, costly and error- prone. Automated UI tests are costly to write and maintain. This paper presents AppFlow, a system for synthesizing highly robust, highly reusable UI tests. It leverages machine learning to automatically recognize common screens and widgets, relieving developers from writing ad hoc, fragile logic to use them in tests. It enables developers to write a library of modular tests for the main functionality of an app category (e.g., an “add to cart” test for shopping apps). It can then quickly test a new app in the same category by synthesizing full tests from the modular ones in the library. By focusing on the main functionality, AppFlow provides “smoke testing” requiring little manual work. Optionally, developers can customize AppFlow by adding app-specific tests for completeness. We evaluated AppFlow on 60 popular apps in the shopping and the news category, two case studies on the BBC news app and the JackThreads shopping app, and a user-study of 15 subjects on the Wish shopping app. Results show that AppFlow accurately recognizes screens and widgets, synthesizes highly robust and reusable tests, covers 46.6% of all automatable tests for Jackthreads with the tests it synthesizes, and reduces the effort to test a new app by up to 90%. Interestingly, it found eight bugs in the evaluated apps, including seven functionality bugs, despite that they were publicly released and supposedly went through thorough testing.

[1]  Vijay Janapa Reddi,et al.  Mosaic: cross-platform user-interaction record and replay for the fragmented android ecosystem , 2015, 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[2]  Junfeng Yang,et al.  Efficiently, effectively detecting mobile app bugs with AppDoctor , 2014, EuroSys '14.

[3]  Chanchal Kumar Roy,et al.  CloneWorks: A Fast and Flexible Large-Scale Near-Miss Clone Detection Tool , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C).

[4]  Cristina V. Lopes,et al.  SourcererCC: Scaling Code Clone Detection to Big-Code , 2015, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[5]  Sergio Segura,et al.  A Survey on Metamorphic Testing , 2016, IEEE Transactions on Software Engineering.

[6]  Atif M. Memon,et al.  An event‐flow model of GUI‐based applications for testing , 2007, Softw. Test. Verification Reliab..

[7]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[8]  Todd D. Millstein,et al.  RERAN: Timing- and touch-sensitive record and replay for Android , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[9]  J Guillaume,et al.  The Home Depot , 2014 .

[10]  Yang Liu,et al.  Guided, stochastic model-based GUI testing of Android apps , 2017, ESEC/SIGSOFT FSE.

[11]  Mayur Naik,et al.  Dynodroid: an input generation system for Android apps , 2013, ESEC/FSE 2013.

[12]  Wei-Tek Tsai,et al.  Mobile Application Testing: A Tutorial , 2014, Computer.

[13]  Myra B. Cohen,et al.  Repairing GUI Test Suites Using a Genetic Algorithm , 2010, 2010 Third International Conference on Software Testing, Verification and Validation.

[14]  Mario Linares Vásquez,et al.  On automatically detecting similar Android apps , 2016, 2016 IEEE 24th International Conference on Program Comprehension (ICPC).

[15]  Tao Xie,et al.  Record and replay for Android: are we there yet in industrial cases? , 2017, ESEC/SIGSOFT FSE.

[16]  John Micco,et al.  Taming Google-Scale Continuous Testing , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[17]  Alireza Sadeghi,et al.  Reducing Combinatorics in GUI Testing of Android Applications , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[18]  Qun Li,et al.  MobiPlay: A Remote Execution Based Record-and-Replay Tool for Mobile Applications , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[19]  Sam Malek,et al.  EvoDroid: segmented evolutionary testing of Android apps , 2014, SIGSOFT FSE.

[20]  Alessandro Orso,et al.  WATER: Web Application TEst Repair , 2011, ETSE '11.

[21]  Iulian Neamtiu,et al.  Targeted and depth-first exploration for systematic testing of android apps , 2013, OOPSLA.

[22]  Alessandro Orso,et al.  Understanding myths and realities of test-suite evolution , 2012, SIGSOFT FSE.

[23]  Rudolf Ramler,et al.  Economic perspectives in test automation: balancing automated and manual testing with opportunity cost , 2006, AST '06.

[24]  Zhenmin Li,et al.  PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code , 2005, ESEC/FSE-13.

[25]  Suman Nath,et al.  Automatic and scalable fault detection for mobile applications , 2014, MobiSys.

[26]  Dave Astels,et al.  The RSpec Book: Behaviour Driven Development with RSpec, Cucumber, and Friends , 2010 .

[27]  Gail E. Kaiser,et al.  Identifying functionally similar code in complex codebases , 2016, 2016 IEEE 24th International Conference on Program Comprehension (ICPC).

[28]  Mary Lou Soffa,et al.  Hierarchical GUI Test Case Generation Using Automated Planning , 2001, IEEE Trans. Software Eng..

[29]  Mary Lou Soffa,et al.  Using a goal-driven approach to generate test cases for GUIs , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[30]  Florence March,et al.  2016 , 2016, Affair of the Heart.

[31]  Rob Miller,et al.  GUI testing using computer vision , 2010, CHI.

[32]  Miryung Kim,et al.  Automated Transplantation and Differential Testing for Clones , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[33]  Yongjian Hu,et al.  Versatile yet lightweight record-and-replay for Android , 2015, OOPSLA.

[34]  Amin Milani Fard,et al.  Leveraging existing tests in automated test generation for web applications , 2014, ASE.

[35]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[36]  Gregg Rothermel,et al.  WATERFALL: an incremental approach for repairing record-replay tests of web applications , 2016, SIGSOFT FSE.

[37]  Michael Pradel,et al.  Monkey see, monkey do: effective generation of GUI tests with inferred macro events , 2017, Software Engineering.

[38]  Andreas Zeller,et al.  Poster: Efficient GUI Test Generation by Learning from Tests of Other Apps , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion).

[39]  Mark Harman,et al.  Crowd intelligence enhances automated mobile testing , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[40]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[41]  Mary Lou Soffa,et al.  Regression testing of GUIs , 2003, ESEC/FSE-11.

[42]  Eran Yahav,et al.  Code completion with statistical language models , 2014, PLDI.

[43]  Yuanyuan Zhou,et al.  aComment: mining annotations from comments and code to detect interrupt related concurrency bugs , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[44]  Dongmei Zhang,et al.  XIAO: tuning code clones at hands of engineers in practice , 2012, ACSAC '12.

[45]  Walter F. Tichy,et al.  Automated test-case generation by cloning , 2012, 2012 7th International Workshop on Automation of Software Test (AST).

[46]  Alessandro Orso,et al.  Automated test migration for mobile apps , 2018, ICSE.

[47]  Porfirio Tramontana,et al.  MobiGUITAR: Automated Model-Based Testing of Mobile Apps , 2015, IEEE Software.

[48]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[49]  Leonardo Mariani,et al.  GK-Tail+ An Efficient Approach to Learn Software Models , 2017, IEEE Transactions on Software Engineering.

[50]  Saurabh Sinha,et al.  Automated Modularization of GUI Test Cases , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[51]  Maximilian Junker,et al.  Utilizing user interface models for automated instantiation and execution of system tests , 2011, ETSE '11.

[52]  Yue Jia,et al.  Sapienz: multi-objective automated testing for Android applications , 2016, ISSTA.

[53]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[54]  Miguel Nabuco,et al.  Pattern Based GUI Testing for Mobile Applications , 2014, 2014 9th International Conference on the Quality of Information and Communications Technology.

[55]  Bor-Yuh Evan Chang,et al.  ChimpCheck: property-based randomized test generation for interactive apps , 2017, Onward!.

[56]  A. Azzouz 2011 , 2020, City.

[57]  Tao Xie,et al.  UnitPlus: assisting developer testing in Eclipse , 2007, eclipse '07.

[58]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[59]  Martin White,et al.  Deep learning code fragments for code clone detection , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[60]  Scott N. Woodfield,et al.  The effect of modularization and comments on program comprehension , 1981, ICSE '81.

[61]  Daniel Jackson,et al.  Alloy: a lightweight object modelling notation , 2002, TSEM.

[62]  Mika Katara,et al.  Experiences of System-Level Model-Based GUI Testing of an Android Application , 2011, 2011 Fourth IEEE International Conference on Software Testing, Verification and Validation.

[63]  Danny Dig,et al.  API code recommendation using statistical learning from fine-grained changes , 2016, SIGSOFT FSE.

[64]  Rajesh Subramanyan,et al.  Automation of GUI testing using a model-driven approach , 2006, AST '06.

[65]  R. Smith,et al.  An Overview of the Tesseract OCR Engine , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[66]  George C. Necula,et al.  Guided GUI testing of android apps with minimal restart and approximate learning , 2013, OOPSLA.

[67]  John K. Ousterhout,et al.  In Search of an Understandable Consensus Algorithm , 2014, USENIX ATC.

[68]  Leonardo Mariani,et al.  Augusto: Exploiting Popular Functionalities for the Generation of Semantic GUI Tests with Oracles , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).