Apply computer vision in GUI automation for industrial applications.

Technology has reshaped the workplace and the rapid improvements have transformed how we work nowadays. In the pursuit of industry 4.0, we build smart machines and robots to replace manual labor. While the manual labor is replaced by machines, in many cases, humans are trans-formed into desktop software users. Jobs such as testing, quality inspection, data monitoring, data entry, and routine editing remain to be done by humans in front of desktop computers. The operations to software applications in principle can be reduced to screen output understanding and mouse and keyboard operations. When the characteristics of these jobs are repetitive, tedious, and monotonous, they can be replaced by GUI automation techniques. GUI automation can be achieved by different un-derlying technologies, each has its pros and cons. In this paper, we describe a tool-Korat, which uses computer-vision to achieve maximum cross-platform capability for industrial applications, including test automation and robotic process automation. Although Korat has been successfully adopted by several industrial customers, difficult problems remain to be addressed. The problems and difficulties in applying computer vision for GUI automation are discussed and studied in this paper, particularly the experiences of applying open source OCR to GUI automation over color screenshots. By intro-ducing critical pre-processing stages and algorithms, the recognition rate is significantly increased and becomes feasible for practical usage.

[1]  R. Smith,et al.  An Overview of the Tesseract OCR Engine , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[2]  Gerard Meszaros,et al.  xUnit Test Patterns: Refactoring Test Code , 2007 .

[3]  Yung-Pin Cheng,et al.  Intrusive Test Automation with Failed Test Case Clustering , 2011, 2011 18th Asia-Pacific Software Engineering Conference.

[4]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[5]  Ali Mesbah,et al.  Automated cross-browser compatibility testing , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[6]  Jun Zhang,et al.  Image Segmentation Based on 2D Otsu Method with Histogram Analysis , 2008, 2008 International Conference on Computer Science and Software Engineering.

[7]  Atif M. Memon,et al.  An Observe-Model-Exercise* Paradigm to Test Event-Driven Systems with Undetermined Input Spaces , 2014, IEEE Transactions on Software Engineering.

[8]  Rob Miller,et al.  Sikuli: using GUI screenshots for search and automation , 2009, UIST '09.

[9]  Vitaly Shmatikov,et al.  Fooling OCR Systems with Adversarial Text Images , 2018, ArXiv.

[10]  J. Kumar,et al.  Font and Background Color Independent Text , 2007 .

[11]  Yung-Pin Cheng,et al.  A Non-intrusive, Platform-Independent Capture/Replay Test Automation System , 2015, 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems.