From UI Design Image to GUI Skeleton: A Neural Machine Translator to Bootstrap Mobile GUI Implementation

A GUI skeleton is the starting point for implementing a UI design image. To obtain a GUI skeleton from a UI design image, developers have to visually understand UI elements and their spatial layout in the image, and then translate this understanding into proper GUI components and their compositions. Automating this visual understanding and translation would be beneficial for bootstraping mobile GUI implementation, but it is a challenging task due to the diversity of UI designs and the complexity of GUI skeletons to generate. Existing tools are rigid as they depend on heuristically-designed visual understanding and GUI generation rules. In this paper, we present a neural machine translator that combines recent advances in computer vision and machine translation for translating a UI design image into a GUI skeleton. Our translator learns to extract visual features in UI images, encode these features' spatial layouts, and generate GUI skeletons in a unified neural network framework, without requiring manual rule development. For training our translator, we develop an automated GUI exploration method to automatically collect large-scale UI data from real-world applications. We carry out extensive experiments to evaluate the accuracy, generality and usefulness of our approach.

[1]  R. Smith,et al.  An Overview of the Tesseract OCR Engine , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[2]  Naila Murray,et al.  Generalized Max Pooling , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Léon Bottou,et al.  Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.

[4]  Xiaodong Gu,et al.  DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning , 2017, IJCAI.

[5]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[6]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[8]  Greg Nudelman Android Design Patterns: Interaction Design Solutions for Developers , 2013 .

[9]  Tony Beltramelli,et al.  pix2code: Generating Code from a Graphical User Interface Screenshot , 2017, EICS.

[10]  Yang Liu,et al.  AndroVault: Constructing Knowledge Graph from Millions of Android Apps for Automated Analysis , 2017, ArXiv.

[11]  D. J. Felleman,et al.  Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.

[12]  Daniel Rodríguez,et al.  Adobe Photoshop 7 , 2002 .

[13]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[14]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Xiaodong Gu,et al.  Deep API learning , 2016, SIGSOFT FSE.

[16]  Zhendong Su,et al.  A Survey on Data-Flow Testing , 2017, ACM Comput. Surv..

[17]  Sebastian Nowozin,et al.  DeepCoder: Learning to Write Programs , 2016, ICLR.

[18]  Ying Zou,et al.  An Exploratory Study on the Relation between User Interface Complexity and the Perceived Quality , 2014, ICWE.

[19]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[20]  Jacques Klein,et al.  Dexpler: converting Android Dalvik bytecode to Jimple for static analysis with Soot , 2012, SOAP '12.

[21]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[22]  Martin White,et al.  Toward Deep Learning Software Repositories , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[23]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[24]  Hong Zhu,et al.  Software unit test coverage and adequacy , 1997, ACM Comput. Surv..

[25]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[26]  Qi Xin,et al.  Seeking the user interface , 2014, Automated Software Engineering.

[27]  Ying Zou,et al.  Spotting working code examples , 2014, ICSE.

[28]  Alexander M. Rush,et al.  Image-to-Markup Generation with Coarse-to-Fine Attention , 2016, ICML.

[29]  Collin McMillan,et al.  Automatically generating commit messages from diffs using neural machine translation , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[30]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[32]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[33]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[34]  Yang Liu,et al.  Guided, stochastic model-based GUI testing of Android apps , 2017, ESEC/SIGSOFT FSE.

[35]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[36]  Ting Su,et al.  FSMdroid: Guided GUI Testing of Android Apps , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[37]  Philipp Koehn,et al.  Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models , 2004, AMTA.

[38]  Tuan Anh Nguyen,et al.  Reverse Engineering Mobile Application User Interfaces with REMAUI (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[39]  Charles A. Sutton,et al.  A Convolutional Attention Network for Extreme Summarization of Source Code , 2016, ICML.

[40]  Zhi Jin,et al.  On End-to-End Program Generation from User Intention by Deep Neural Networks , 2015, ArXiv.

[41]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[42]  Alessandra Gorla,et al.  Automated Test Input Generation for Android: Are We There Yet? (E) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[43]  Morgan Dixon,et al.  Prefab: implementing advanced behaviors using pixel-based reverse engineering of interface structure , 2010, CHI.

[44]  Rob Day,et al.  Adobe Photoshopデザイナーズ・バイブル , 1996 .

[45]  Hong Cheng,et al.  Searching connected API subgraph via text phrases , 2012, SIGSOFT FSE.

[46]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[47]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[48]  Ruozi Huang,et al.  Automaticly Generating Web Page From A Mockup , 2016, SEKE.

[49]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[51]  Collin McMillan,et al.  Portfolio: finding relevant functions and their usage , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[52]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[53]  M. Fay,et al.  Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules. , 2010, Statistics surveys.

[54]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.