ActionNet: Vision-Based Workflow Action Recognition From Programming Screencasts

Programming screencasts have two important applications in software engineering context: study developer behaviors, information needs and disseminate software engineering knowledge. Although programming screencasts are easy to produce, they are not easy to analyze or index due to the image nature of the data. Existing techniques extract only content from screencasts, but ignore workflow actions by which developers accomplish programming tasks. This significantly limits the effective use of programming screencasts in downstream applications. In this paper, we are the first to present a novel technique for recognizing workflow actions in programming screencasts. Our technique exploits image differencing and Convolutional Neural Network (CNN) to analyze the correspondence and change of consecutive frames, based on which nine classes of frequent developer actions can be recognized from programming screencasts. Using programming screencasts from Youtube, we evaluate different configurations of our CNN model and the performance of our technique for developer action recognition across developers, working environments and programming languages. Using screencasts of developers’ real work, we demonstrate the usefulness of our technique in a practical application for actionaware extraction of key-code frames in developers’ work.

[1]  Feng Liu,et al.  Making Software Tutorial Video Responsive , 2015, CHI.

[2]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Thomas Fritz,et al.  Developers' code context models for change tasks , 2014, SIGSOFT FSE.

[5]  Bruce Phillips,et al.  Tracking real-time user experience (TRUE): a comprehensive instrumentation solution for complex systems , 2008, CHI.

[6]  Andrea Vedaldi,et al.  Understanding deep image representations by inverting them , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Bram Adams,et al.  Feature Location Using Crowd-Based Screencasts , 2018, 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR).

[8]  Jing Li,et al.  Extracting and analyzing time-series HCI data from screen-captured task videos , 2016, Empirical Software Engineering.

[9]  Javier Escobar-Avila,et al.  Text Retrieval-Based Tagging of Software Engineering Video Tutorials , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C).

[10]  Zhenchang Xing,et al.  Inference of development activities from interaction with uninstrumented applications , 2018, ICSE.

[11]  Jing Li,et al.  Learning to answer programming questions with software documentation through social context embedding , 2018, Inf. Sci..

[12]  Sathyashrisharmilha Pushparaj,et al.  Using 3D convolutional neural network in surveillance videos for recognizing human actions , 2018, Int. Arab J. Inf. Technol..

[13]  Brad A. Myers,et al.  Design requirements for more flexible structured editors from a study of programmers' text editing , 2005, CHI Extended Abstracts.

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  David F. Redmiles,et al.  Extracting usability information from user interface events , 2000, CSUR.

[16]  Zhenchang Xing,et al.  ActivitySpace: A Remembrance Framework to Support Interapplication Information Needs , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[17]  Yang Liu,et al.  From UI Design Image to GUI Skeleton: A Neural Machine Translator to Bootstrap Mobile GUI Implementation , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[18]  Zhenchang Xing,et al.  What do developers search for on the web? , 2017, Empirical Software Engineering.

[19]  Bonita Sharif,et al.  iTrace: enabling eye tracking on software artifacts within the IDE to support software engineering tasks , 2015, ESEC/SIGSOFT FSE.

[20]  Erik Linstead,et al.  A Deep Learning Approach to Identifying Source Code in Images and Video , 2018, 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR).

[21]  Krzysztof Z. Gajos,et al.  Data-driven interaction techniques for improving navigation of educational videos , 2014, UIST.

[22]  Scott E. Hudson,et al.  Automatically identifying targets users interact with during real world tasks , 2010, IUI '10.

[23]  Philip J. Guo,et al.  Codemotion: expanding the design space of learner interactions with computer programming tutorial videos , 2018, L@S.

[24]  Zhenchang Xing,et al.  An exploratory study of feature location process: Distinct phases, recurring patterns, and elementary actions , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[25]  Thomas Leich,et al.  Understanding understanding source code with functional magnetic resonance imaging , 2014, ICSE.

[26]  Brad A. Myers,et al.  Eliciting design requirements for maintenance-oriented IDEs: a detailed study of corrective and perfective maintenance tasks , 2005, ICSE.

[27]  Qi Luo,et al.  Mining Performance Regression Inducing Code Changes in Evolving Software , 2019 .

[28]  Zhenchang Xing,et al.  Predicting semantically linkable knowledge in developer online forums via convolutional neural network , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[29]  Zhenchang Xing,et al.  Measuring Program Comprehension: A Large-Scale Field Study with Professionals , 2018, IEEE Transactions on Software Engineering.

[30]  Brian D. Fisher,et al.  Managing software change tasks: an exploratory study , 2005, 2005 International Symposium on Empirical Software Engineering, 2005..

[31]  Aditya K. Ghose,et al.  Poster: Predicting Components for Issue Reports Using Deep Learning with Information Retrieval , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion).

[32]  Junji Yamato,et al.  Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  Denys Poshyvanyk,et al.  Detecting and Summarizing GUI Changes in Evolving Mobile Apps , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[34]  Gabriele Bavota,et al.  Automatic Identification and Classification of Software Development Video Tutorial Fragments , 2017, IEEE Transactions on Software Engineering.

[35]  Martin White,et al.  Deep learning code fragments for code clone detection , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[36]  Gabriele Bavota,et al.  Too Long; Didn't Watch! Extracting Relevant Fragments from Software Development Video Tutorials , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[37]  Zhenchang Xing,et al.  VT-revolution: interactive programming tutorials made possible , 2018, ESEC/SIGSOFT FSE.

[38]  Rachel K. E. Bellamy,et al.  Modeling programmer navigation: A head-to-head empirical evaluation of predictive models , 2011, 2011 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[39]  Hedvig Kjellstrom,et al.  Unsupervised Surveillance Video Retrieval Based on Human Action and Appearance , 2014, 2014 22nd International Conference on Pattern Recognition.

[40]  Juan Carlos Niebles,et al.  Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.

[41]  Rachel K. E. Bellamy,et al.  How Programmers Debug, Revisited: An Information Foraging Theory Perspective , 2013, IEEE Transactions on Software Engineering.

[42]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[43]  Collin McMillan,et al.  Improving automated source code summarization via an eye-tracking study of programmers , 2014, ICSE.

[44]  Zhenchang Xing,et al.  What help do developers seek, when and how? , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[45]  Tovi Grossman,et al.  Waken: reverse engineering usage information and interface structure from software videos , 2012, UIST '12.

[46]  Brad A. Myers,et al.  An Exploratory Study of How Developers Seek, Relate, and Collect Relevant Information during Software Maintenance Tasks , 2006, IEEE Transactions on Software Engineering.

[47]  Hiromitsu Yamada,et al.  Optical Character Recognition , 1999 .

[48]  Boyang Li,et al.  Automated Reporting of GUI Design Violations for Mobile Apps , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[49]  Mira Dontcheva,et al.  Pause-and-play: automatically linking screencast video tutorials with applications , 2011, UIST.

[50]  Ling Shao,et al.  Efficient Search and Localization of Human Actions in Video Databases , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[51]  Tao Wang,et al.  Convolutional Neural Networks over Tree Structures for Programming Language Processing , 2014, AAAI.

[52]  Tom Yeh,et al.  Associating the visual representation of user interfaces with their internal structures and metadata , 2011, UIST.

[53]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Basman M. Hasan Alhafidh,et al.  Design and Simulation of a Smart Home managed by an Intelligent Self-Adaptive System , 2016 .

[55]  Gautam Shroff,et al.  Distributed side-by-side programming , 2009, 2009 ICSE Workshop on Cooperative and Human Aspects on Software Engineering.

[56]  Eran Yahav,et al.  Extracting code from programming tutorial videos , 2016, Onward!.

[57]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Gabriele Bavota,et al.  CodeTube: Extracting Relevant Fragments from Software Development Video Tutorials , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[59]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[60]  William H. Allen,et al.  Poster Abstract: Comparison of Classifiers for Prediction of Human Actions in a Smart Home , 2018, 2018 IEEE/ACM Third International Conference on Internet-of-Things Design and Implementation (IoTDI).

[61]  Andrew Begel,et al.  A Study of the Organizational Dynamics of Software Teams , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[62]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[63]  Zhenchang Xing,et al.  VT-Revolution: Interactive Programming Video Tutorial Authoring and Watching System , 2019, IEEE Transactions on Software Engineering.