Screen2Vec: Semantic Embedding of GUI Screens and GUI Components

Representing the semantics of GUI screens and components is crucial to data-driven computational methods for modeling user-GUI interactions and mining GUI designs. Existing GUI semantic representations are limited to encoding either the textual content, the visual design and layout patterns, or the app contexts. Many representation techniques also require significant manual data annotation efforts. This paper presents Screen2Vec, a new self-supervised technique for generating representations in embedding vectors of GUI screens and components that encode all of the above GUI features without requiring manual annotation using the context of user interaction traces. Screen2Vec is inspired by the word embedding method Word2Vec, but uses a new two-layer pipeline informed by the structure of GUIs and interaction traces and incorporates screenand app-specific metadata. Through several sample downstream tasks, we demonstrate Screen2Vec’s key useful properties: representing between-screen similarity through nearest neighbors, composability, and capability to represent user tasks.

[1]  Jeffrey Nichols,et al.  Trailblazer: enabling blind users to blaze trails through the web , 2009, IUI.

[2]  Mira Dontcheva,et al.  Rewire: Interface Design Assistance from Examples , 2018, CHI.

[3]  Brad A. Myers,et al.  Making End User Development More Natural , 2017, New Perspectives in End-User Development.

[4]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[5]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[6]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[7]  Rob Miller,et al.  Sikuli: using GUI screenshots for search and automation , 2009, UIST '09.

[8]  P. Schrimpf,et al.  Dynamic Programming , 2011 .

[9]  Jeffrey Nichols,et al.  Swire: Sketch-based User Interface Retrieval , 2019, CHI.

[10]  Brad A. Myers,et al.  Privacy-Preserving Script Sharing in GUI-based Programming-by-Demonstration Systems , 2020, Proc. ACM Hum. Comput. Interact..

[11]  James Fogarty,et al.  Robust Annotation of Mobile Application Interfaces in Methods for Accessibility Repair and Enhancement , 2018, UIST.

[12]  Toby Jia-Jun Li,et al.  PUMICE: A Multi-Modal Agent that Learns Concepts and Conditionals from Natural Language and Demonstrations , 2019, UIST.

[13]  Zhiwei Guan,et al.  Widget Captioning: Generating Natural Language Description for Mobile User Interface Elements , 2020, EMNLP.

[14]  Ming-Hsuan Yang,et al.  Neural Design Network: Graphic Layout Generation with Constraints , 2019, European Conference on Computer Vision.

[15]  Tom M. Mitchell,et al.  Multi-Modal Repairs of Conversational Breakdowns in Task-Oriented Dialogs , 2020, UIST.

[16]  Suman Nath,et al.  Appstract: on-the-fly app content semantics with better privacy , 2016, MobiCom.

[17]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[18]  Percy Liang,et al.  Mapping natural language commands to web elements , 2018, EMNLP.

[19]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[20]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[21]  Sungahn Ko,et al.  GUIComp: A GUI Design Assistant with Real-Time, Multi-Faceted Feedback , 2020, CHI.

[22]  Tingfa Xu,et al.  LayoutGAN: Synthesizing Graphic Layouts With Vector-Wireframe Adversarial Networks , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Gierad Laput,et al.  CommandSpace: modeling the relationships between tasks, descriptions and features , 2014, UIST.

[24]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[25]  Liming Zhu,et al.  Unblind Your Apps: Predicting Natural-Language Labels for Mobile GUI Components by Deep Learning , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[26]  Fanglin Chen,et al.  MessageOnTap: A Suggestive Interface to Facilitate Messaging-related Tasks , 2019, CHI.

[27]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[28]  Thomas F. Liu,et al.  Learning Design Semantics for Mobile Apps , 2018, UIST.

[29]  Xin Zhou,et al.  Mapping Natural Language Instructions to Mobile UI Action Sequences , 2020, ACL.

[30]  Ranjitha Kumar,et al.  ERICA: Interaction Mining Mobile Apps , 2016, UIST.

[31]  Ranjitha Kumar,et al.  Webzeitgeist: design mining the web , 2013, CHI.

[32]  Suman Nath,et al.  uLink: Enabling User-Defined Deep Linking to App Content , 2016, MobiSys.

[33]  Tom M. Mitchell,et al.  Interactive Task Learning from GUI-Grounded Natural Language Instructions and Demonstrations , 2020, ACL.

[34]  Oriana Riva,et al.  Kite: Building Conversational Bots from Mobile Apps , 2018, MobiSys.

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[37]  Trong Duc Nguyen,et al.  Exploring API Embedding for API Usages and Applications , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[38]  Jeffrey Nichols,et al.  Rico: A Mobile App Dataset for Building Data-Driven Design Applications , 2017, UIST.

[39]  Jacob O. Wobbrock,et al.  Interaction Proxies for Runtime Repair and Enhancement of Mobile Application Accessibility , 2017, CHI.

[40]  Krish Perumal,et al.  VASTA: a vision and language-assisted smartphone task automation system , 2019, IUI.

[41]  Amos Azaria,et al.  SUGILITE: Creating Multimodal Smartphone Automation by Demonstration , 2017, CHI.

[42]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[43]  Brent J. Hecht,et al.  Leveraging advances in natural language processing to better understand Tobler's first law of geography , 2014, SIGSPATIAL/GIS.

[44]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.