Thesis for The Degree of Doctor of Philosophy Visual GUI Testing : Automating High-Level Software Testing in Industrial Practice

Software Engineering is at the verge of a new era where continuous releases are becoming more common than planned long-term projects. In this context test automation will become essential on all levels of system abstraction to meet the market's demands on time-to-market and quality. Hence, automated tests are required from low-level software components, tested with unit tests, up to the pictorial graphical user interface (GUI), tested with user emulated system and acceptance tests. Thus far, research has provided industry with a plethora of automation solutions for lower level testing but GUI level testing is still primarily a manual, and therefore costly and tedious, activity in practice. We have identfied three generations of automated GUI-based testing. The first (1st) generation relies on GUI coordinates but is not used in practice due to unfeasible maintenance costs caused by fragility to GUI change. Second (2nd) generation tools instead operate against the system's GUI architecture, libraries or application programming interfaces. Whilst this approach is successfully used in practice, it does not verify the GUI's appearance and it is restricted to specific GUI technologies, programming languages and platforms. The third (3rd) generation, referred to as Visual GUI Testing (VGT), is an emerging technique in industrial practice with properties that mitigate the challenges experienced with previous techniques. VGT is defined as a tool-driven test technique where image recognition is used to interact with, and assert, a system's behavior through its pictorial GUI as it is shown to the user in user-emulated, automated, system or acceptance tests. Automated tests that produce results of quality on par with a human tester and is therefore an effective complement to reduce the aforementioned challenges with manual testing. However, despite its benefits, the technique is only sparsely used in industry and the academic body of knowledge contains little empirical support for the technique's industrial viability. This thesis presents a broad evaluation of VGT's capabilities, obtained through a series of case studies and experiments performed in academia and Swedish industry. The research follows an incremental methodology that began with experimentation with VGT, followed by industrial studies that were concluded with a study of VGT's use at a company over several years. Results of the research show that VGT is viable for use in industrial practice with better defect-finding ability than manual tests, ability to test any GUI based system, high learnability, feasible maintenance costs and both short and longterm company benefits. However, there are still challenges associated with the successful adoption, use and long-term use of VGT in a company, the most crucial that suitable development and maintenance practices are used. This thesis thereby concludes that VGT can be used in industrial practice and aims to provides guidance to practitioners that seek to do so. Additionally, this work aims to be a stepping stone for academia to explore new test solutions that build on image recognition technology to improve the state-of-art.

[1]  Mark Harman,et al.  A Comprehensive Survey of Trends in Oracles for Software Testing , 2013 .

[2]  Gregg Rothermel,et al.  An empirical study of regression test selection techniques , 1998, Proceedings of the 20th International Conference on Software Engineering.

[3]  Atif M. Memon,et al.  GUITAR: an innovative tool for automated testing of GUI-driven software , 2014, Automated Software Engineering.

[4]  Qian Yang,et al.  A survey of coverage based testing tools , 2006, AST '06.

[5]  Per Runeson,et al.  Guidelines for conducting and reporting case study research in software engineering , 2009, Empirical Software Engineering.

[6]  Malte Finsterwalder Automating Acceptance Tests for GUI Applications in an Extreme Programming Environment , 2001 .

[7]  Mark Harman,et al.  An Analysis and Survey of the Development of Mutation Testing , 2011, IEEE Transactions on Software Engineering.

[8]  Lionel C. Briand,et al.  Stress testing real-time systems with genetic algorithms , 2005, GECCO '05.

[9]  Luke S. Zettlemoyer,et al.  A visual medium for programmatic control of interactive applications , 1999, CHI '99.

[10]  Rob Miller,et al.  Sikuli: using GUI screenshots for search and automation , 2009, UIST '09.

[11]  Mary Lou Soffa,et al.  Hierarchical GUI Test Case Generation Using Automated Planning , 2001, IEEE Trans. Software Eng..

[12]  Robert Feldt,et al.  Alignment of Requirements Specification and Testing: A Systematic Mapping Study , 2011, 2011 IEEE Fourth International Conference on Software Testing, Verification and Validation Workshops.

[13]  Claes Wohlin,et al.  Faults-slip-through - a concept for measuring the efficiency of the test process , 2006, Softw. Process. Improv. Pract..

[14]  J. Jacoby,et al.  Is There an Optimal Number of Alternatives for Likert Scale Items? Study I: Reliability and Validity , 1971 .

[15]  Shari Lawrence Pfleeger,et al.  Preliminary Guidelines for Empirical Research in Software Engineering , 2002, IEEE Trans. Software Eng..

[16]  Boris Beizer,et al.  Software Testing Techniques , 1983 .

[17]  Sandro Morasca,et al.  On the application of measurement theory in software engineering , 2004, Empirical Software Engineering.

[18]  Wasif Afzal,et al.  An experiment on the effectiveness and efficiency of exploratory testing , 2014, Empirical Software Engineering.

[19]  Elaine J. Weyuker,et al.  Testing Component-Based Software: A Cautionary Tale , 1998, IEEE Softw..

[20]  Jan Bosch,et al.  Climbing the "Stairway to Heaven" -- A Mulitiple-Case Study Exploring Barriers in the Transition from Agile Development towards Continuous Deployment of Software , 2012, 2012 38th Euromicro Conference on Software Engineering and Advanced Applications.

[21]  Sigrid Eldh,et al.  Analysis of Mistakes as a Method to Improve Test Case Design , 2011, 2011 Fourth IEEE International Conference on Software Testing, Verification and Validation.

[22]  A. Zeller Isolating cause-effect chains from computer programs , 2002, SIGSOFT '02/FSE-10.

[23]  Edgars Diebelis,et al.  Self-Testing Approach and Testing Tools , 2012 .

[24]  Chen Fu,et al.  Maintaining and evolving GUI-directed test scripts , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[25]  Stefan Wagner A model and sensitivity analysis of the quality economics of defect-detection techniques , 2006, ISSTA '06.

[26]  James A. Whittaker,et al.  Model‐Based Software Testing , 2002 .

[27]  Lars Pareto,et al.  Costs and Benefits of Structure-aware Capture/Replay tools , 2006 .

[28]  Claes Wohlin,et al.  Towards a decision-making structure for selecting a research design in empirical software engineering , 2014, Empirical Software Engineering.

[29]  L. Gofton,et al.  Developing Focus Group Research. Politics, Theory and Practice. , 2000 .

[30]  Richard Potter,et al.  Triggers: guiding automation with pixels to achieve data access , 1993 .

[31]  Peter Fröhlich,et al.  Automated Test Case Generation from Dynamic Models , 2000, ECOOP.

[32]  Liming Zhu,et al.  Software quality and agile methods , 2004, Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004..

[33]  Alistair Cockburn,et al.  Agile Software Development: The Business of Innovation , 2001, Computer.

[34]  Janice Singer,et al.  Studying Software Engineers: Data Collection Techniques for Software Field Studies , 2005, Empirical Software Engineering.

[35]  Atif M. Memon,et al.  An event‐flow model of GUI‐based applications for testing , 2007, Softw. Test. Verification Reliab..

[36]  Robert V. Binder,et al.  Testing Object-Oriented Systems: Models, Patterns, and Tools , 1999 .

[37]  Barbara Kitchenham,et al.  Procedures for Performing Systematic Reviews , 2004 .

[38]  Northrop Grumman,et al.  Recommended Requirements Gathering Practices , 2002 .

[39]  Wei-Tek Tsai,et al.  Regression testing in an industrial environment , 1998, CACM.

[40]  Macario Polo,et al.  Mutation at System and Functional Levels , 2010, 2010 Third International Conference on Software Testing, Verification, and Validation Workshops.

[41]  Katja Karhu,et al.  Empirical Observations on Software Testing Automation , 2009, 2009 International Conference on Software Testing Verification and Validation.

[42]  Sumaira Kausar,et al.  Guidelines for the selection of elicitation techniques , 2010, 2010 6th International Conference on Emerging Technologies (ICET).

[43]  Andy Hopper,et al.  Virtual Network Computing , 1998, IEEE Internet Comput..

[44]  Mengqi Wu,et al.  Effective GUI Testing Automation: Developing an Automated GUI Testing Tool , 2004 .

[45]  Austen Rainer,et al.  Case Study Research in Software Engineering - Guidelines and Examples , 2012 .

[46]  Kent L. Beck,et al.  Test-driven Development - by example , 2002, The Addison-Wesley signature series.

[47]  A. Andrews,et al.  4 Requirements Prioritization , .

[48]  Jesús Torres Valderrama,et al.  Generation of test cases from functional requirements. A survey , 2006 .

[49]  Fadi A. Zaraket,et al.  GUICOP: Specification-Based GUI Testing , 2012, 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation.

[50]  Kristin Decker,et al.  Uml Distilled A Brief Guide To The Standard Object Modeling Language , 2016 .

[51]  Jeffrey C. Carver The Use of Grounded Theory in Empirical Software Engineering , 2006, Empirical Software Engineering Issues.

[52]  Chen Fu,et al.  Experimental assessment of manual versus tool-based maintenance of GUI-directed test scripts , 2009, 2009 IEEE International Conference on Software Maintenance.

[53]  David Chenho Kung,et al.  Software requirements and acceptance testing , 1997, Ann. Softw. Eng..

[54]  Tariq M. King,et al.  Enabling automated integration testing of cloud application services in virtualized environments , 2011, CASCON.

[55]  Lionel C. Briand,et al.  A practical guide for using statistical tests to assess randomized algorithms in software engineering , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[56]  R. Harrison,et al.  A Meta-analysis of Multidisciplinary Research , 2012 .

[57]  Atif M. Memon,et al.  Conceptualization and Evaluation of Component-Based Testing Unified with Visual GUI Testing: An Empirical Study , 2015, 2015 IEEE 8th International Conference on Software Testing, Verification and Validation (ICST).

[58]  D. Greenwood,et al.  Why Action Research? , 2003 .

[59]  Rajesh Subramanyan,et al.  A survey on model-based testing approaches: a systematic review , 2007, WEASELTech '07.

[60]  John Hughes,et al.  Testing telecoms software with quviq QuickCheck , 2006, ERLANG '06.

[61]  W. Kruskal Historical Notes on the Wilcoxon Unpaired Two-Sample Test , 1957 .

[62]  John Paul,et al.  Automated Software Testing: Introduction, Management, and Performance , 1999 .

[63]  Colin Robson,et al.  Real World Research: A Resource for Social Scientists and Practitioner-Researchers , 1993 .

[64]  Johnny Saldaña,et al.  The Coding Manual for Qualitative Researchers , 2009 .

[65]  Ian Sommerville,et al.  Software engineering (6th ed.) , 2001 .

[66]  Stefan Berner,et al.  Observations and lessons learned from automated testing , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[67]  Miryung Kim,et al.  A field study of refactoring challenges and benefits , 2012, SIGSOFT FSE.

[68]  Atif M. Memon,et al.  Automated gui testing guided by usage profiles , 2007, ASE.

[69]  Emil Alegroth On the Industrial Applicability of Visual GUI Testing , 2013 .

[70]  Per Runeson,et al.  Combining Scenario-based Requirements with Static Verification and Dynamic Testing , 1998, REFSQ.

[71]  Robert Feldt,et al.  Automated System Testing Using Visual GUI Testing Tools: A Comparative Study in Industry , 2012, 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation.

[72]  Atif M. Memon GUI Testing: Pitfalls and Process , 2002, Computer.

[73]  Helena Holmström Olsson,et al.  JAutomate: A Tool for System- and Acceptance-test Automation , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[74]  Andrew J. Kornecki,et al.  Certification of software for real-time safety-critical systems: state of the art , 2009, Innovations in Systems and Software Engineering.

[75]  Kai Petersen,et al.  Benefits and limitations of automated software testing: Systematic literature review and practitioner survey , 2012, 2012 7th International Workshop on Automation of Software Test (AST).

[76]  Atif M. Memon,et al.  An Observe-Model-Exercise* Paradigm to Test Event-Driven Systems with Undetermined Input Spaces , 2014, IEEE Transactions on Software Engineering.

[77]  Lisa G. Johnston,et al.  An Empirical Comparison of Respondent-driven Sampling, Time Location Sampling, and Snowball Sampling for Behavioral Surveillance in Men Who Have Sex with Men, Fortaleza, Brazil , 2008, AIDS and Behavior.

[78]  Brent Hailpern,et al.  Software debugging, testing, and verification , 2002, IBM Syst. J..

[79]  Hong Zhu,et al.  Software unit test coverage and adequacy , 1997, ACM Comput. Surv..

[80]  Chang Liu Platform-independent and tool-neutral test descriptions for automated software testing , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[81]  A. Bryman,et al.  The debate about quantitative and qualitative research , 2012 .

[82]  Woei-Kae Chen,et al.  Integration of specification-based and CR-based approaches for GUI testing , 2005, 19th International Conference on Advanced Information Networking and Applications (AINA'05) Volume 1 (AINA papers).

[83]  Emil Börjesson Multi-Perspective Analysis of Software Development: a method and an Industrial Case Study , 2010 .

[84]  Tore Dybå,et al.  A systematic review of quasi-experiments in software engineering , 2009, Inf. Softw. Technol..

[86]  Joseph A. Maxwell,et al.  Qualitative Research Design: An Interactive Approach , 1996 .

[87]  Atif M. Memon,et al.  GUI ripping: reverse engineering of graphical user interfaces for testing , 2003, 10th Working Conference on Reverse Engineering, 2003. WCRE 2003. Proceedings..

[88]  T. Cook,et al.  Quasi-experimentation: Design & analysis issues for field settings , 1979 .

[89]  Pekka Abrahamsson,et al.  Long-Term Effects of Test-Driven Development A Case Study , 2009, XP.

[90]  Gérard Lachapelle,et al.  Testing GPS L5 Acquisition and Tracking Algorithms Using a Hardware Simulator , 2006 .

[91]  Robert Feldt,et al.  Transitioning Manual System Test Suites to Automated Testing: An Industrial Case Study , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[92]  Todd D. Millstein,et al.  RERAN: Timing- and touch-sensitive record and replay for Android , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[93]  Michael Ellims,et al.  The Economics of Unit Testing , 2006, Empirical Software Engineering.

[94]  Tony Gorschek,et al.  A Model for Technology Transfer in Practice , 2006, IEEE Software.

[95]  B. Paech,et al.  Criteria for Software Testing Tool Evaluation – A Task Oriented View , 2005 .

[96]  Charles Lowell,et al.  Successful Automation of GUI Driven Acceptance Testing , 2003, XP.

[97]  Zongxiang Zhang,et al.  Using graphical representation of user interfaces as visual references , 2012 .

[98]  Walter F. Tichy,et al.  Status of Empirical Research in Software Engineering , 2006, Empirical Software Engineering Issues.

[99]  Ellis Horowitz,et al.  Graphical User Interface Testing , 2012 .

[100]  Christof Ebert,et al.  The impacts of software product management , 2007, J. Syst. Softw..

[101]  Theodore D. Hellmann,et al.  An Exploratory Study of Automated GUI Testing: Goals, Issues, and Best Practices , 2014 .

[102]  Bertrand Meyer,et al.  Reconciling Manual and Automated Testing: The AutoTest Experience , 2007, 2007 40th Annual Hawaii International Conference on System Sciences (HICSS'07).

[103]  Vaughn T. Rokosz Long-Term Testing in a Short-Term World , 2003, IEEE Softw..

[104]  Jeff Sutherland,et al.  Manifesto for Agile Software Development , 2013 .

[105]  Chen Fu,et al.  Creating GUI Testing Tools Using Accessibility Technologies , 2009, 2009 International Conference on Software Testing, Verification, and Validation Workshops.

[106]  D. M. Hutton,et al.  Software Test Automation: Effective Use of Test Execution Tools , 2000 .

[107]  G Wickström,et al.  The "Hawthorne effect"--what did the original Hawthorne studies actually show? , 2000, Scandinavian journal of work, environment & health.

[108]  Michael Olan,et al.  Unit testing: test early, test often , 2003 .

[109]  Dave Astels,et al.  The RSpec Book: Behaviour Driven Development with RSpec, Cucumber, and Friends , 2010 .

[110]  Marc Kellogg,et al.  Automating functional tests using Selenium , 2006, AGILE 2006 (AGILE'06).

[111]  Mika V. Mäntylä,et al.  Defect Detection Efficiency: Test Case Based vs. Exploratory Testing , 2007, ESEM 2007.

[112]  Matthias Hauswirth,et al.  Automated GUI performance testing , 2011, Software Quality Journal.

[113]  James Miller,et al.  A practical approach to testing GUI systems , 2006, Empirical Software Engineering.

[114]  Juha Itkonen,et al.  Exploratory testing: a multiple case study , 2005, 2005 International Symposium on Empirical Software Engineering, 2005..

[115]  Paolo Tonella,et al.  Capture-replay vs. programmable web testing: An empirical assessment during test case evolution , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[116]  Kent L. Beck,et al.  Extreme programming explained - embrace change , 1990 .

[117]  Gregg Rothermel,et al.  Prioritizing test cases for regression testing , 2000, ISSTA '00.

[118]  Laurie A. Williams,et al.  On the Effectiveness of Unit Test Automation at Microsoft , 2009, 2009 20th International Symposium on Software Reliability Engineering.

[119]  Sarfraz Khurshid,et al.  Symbolic execution for software testing in practice: preliminary assessment , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[120]  Carolyn B. Seaman,et al.  Qualitative Methods in Empirical Studies of Software Engineering , 1999, IEEE Trans. Software Eng..

[121]  Mary Lou Soffa,et al.  Regression testing of GUIs , 2003, ESEC/FSE-11.

[122]  Rob Miller,et al.  GUI testing using computer vision , 2010, CHI.

[123]  Richard J. Lipton,et al.  Hints on Test Data Selection: Help for the Practicing Programmer , 1978, Computer.

[124]  Johan Andersson,et al.  The Video Store Revisited Yet Again: Adventures in GUI Acceptance Testing , 2004, XP.

[125]  Thomas Ericson,et al.  TIM—a test improvement model , 1997 .

[126]  Johannes RyserÊÊÊÊÊÊÊÊÊÊÊÊÊÊMartin Glinz,et al.  A Scenario-Based Approach to Validating and Testing Software Systems Using Statecharts , 1999 .

[127]  David Chenho Kung,et al.  Behavior-based acceptance testing of software systems: a formal scenario approach , 1994, Proceedings Eighteenth Annual International Computer Software and Applications Conference (COMPSAC 94).

[128]  Nils J. Nilsson,et al.  Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[129]  Dorothy R. Graham,et al.  Requirements and Testing: Seven Missing-Link Myths , 2002, IEEE Softw..

[130]  P. L. Schipani End User Involvement in Exploratory Test Automation for Web Applications , 2011 .

[131]  Matthias Hauswirth,et al.  Automating performance testing of interactive Java applications , 2010, AST '10.

[132]  Robert Feldt,et al.  Visual GUI testing in practice: challenges, problemsand limitations , 2015, Empirical Software Engineering.

[133]  Paolo Tonella,et al.  Visual vs. DOM-Based Web Locators: An Empirical Study , 2014, ICWE.

[134]  Tore Dybå,et al.  Evidence-based software engineering , 2016, Perspectives on Data Science for Software Engineering.

[135]  Emil Alégroth Random Visual GUI Testing: Proof of Concept , 2013, SEKE.

[136]  Atif M. Memon,et al.  Test case generator for GUITAR , 2008, ICSE Companion '08.

[137]  D R Shearer,et al.  Acceptance testing. , 1998, Radiology management.

[138]  A. Bowling Techniques of questionnaire design , 2005 .

[139]  Gary T. Leavens,et al.  A Simple and Practical Approach to Unit Testing: The JML and JUnit Way , 2002, ECOOP.

[140]  Jocelyn Armarego,et al.  CASE Tools: Constructivism and its Application to Learning and Usability of Software Engineering Tools , 2001, Comput. Sci. Educ..