Recognizing Handwritten Source Code

Supporting programming on touchscreen devices requires effective text input and editing methods. Unfortunately, the virtual keyboard can be inefficient and uses valuable screen space on already small devices. Recent advances in stylus input make handwriting a potentially viable text input solution for programming on touchscreen devices. The primary barrier, however, is that handwriting recognition systems are built to take advantage of the rules of natural language, not those of a programming language. In this paper, we explore this particular problem of handwriting recognition for source code. We collect and make publicly available a dataset of handwritten Python code samples from 15 participants and we characterize the typical recognition errors for this handwritten Python source code when using a state-of-the-art handwriting recognition tool. We present an approach to improve the recognition accuracy by augmenting a handwriting recognizer with the programming language grammar rules. Our experiment on the collected dataset shows an 8.6% word error rate and a 3.6% character error rate which outperforms standard handwriting recognition systems and compares favorably to typing source code on virtual keyboards.

[1]  Henning Fernau Regulated Grammars with Leftmost Derivation , 1998, SOFSEM.

[2]  Sargur N. Srihari,et al.  Recognition of handwritten and machine-printed text for postal address interpretation , 1993, Pattern Recognit. Lett..

[3]  Ching Y. Suen,et al.  The State of the Art in Online Handwriting Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Emmanuel Augustin,et al.  RIMES evaluation campaign for handwritten mail processing , 2006 .

[5]  Alain Désilets,et al.  VoiceCode: an innovative speech interface for programming-by-voice , 2006, CHI Extended Abstracts.

[6]  Nikolai Tillmann,et al.  TouchDevelop: programming cloud-connected mobile devices via touchscreen , 2011, Onward! 2011.

[7]  Collin McMillan,et al.  Improving automated source code summarization via an eye-tracking study of programmers , 2014, ICSE.

[8]  Mary LaLomia User acceptance of handwritten recognition accuracy , 1994, CHI '94.

[9]  Emmanuel Augustin,et al.  A2iA Check Reader: a family of bank check recognition systems , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[10]  Gregory D. Abowd,et al.  Cirrin: a word-level unistroke keyboard for pen input , 1998, UIST '98.

[11]  M. A. P. Alonso Metacognition and Sensorimotor Components Underlying the Process of Handwriting and Keyboarding and Their Impact on Learning. An Analysis from the Perspective of Embodied Psychology , 2015 .

[12]  Alexander H. Waibel,et al.  Online handwriting recognition: the NPen++ recognizer , 2001, International Journal on Document Analysis and Recognition.

[13]  Christian Wolff,et al.  RefactorPad: editing source code on touchscreens , 2013, EICS '13.

[14]  Sargur N. Srihari,et al.  On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Volker Märgner,et al.  On-line Arabic handwriting recognition competition , 2011, 2011 International Conference on Document Analysis and Recognition.

[16]  Daniel M. Oppenheimer,et al.  The Pen Is Mightier Than the Keyboard , 2014, Psychological science.

[17]  Mark D. Dunlop,et al.  Pickup Usability Dominates: A Brief History of Mobile Text Entry Research and Adoption , 2009, Int. J. Mob. Hum. Comput. Interact..

[18]  Benjamin M Gordon Improving Spoken Programming Through Language Design and the Incorporation of Dynamic Context , 2013 .

[19]  Volkmar Frinken,et al.  Neural network language models for off-line handwriting recognition , 2014, Pattern Recognition.

[20]  Atanas Radenski "Python first": a lab-based digital introduction to computer science , 2006, ITICSE '06.

[21]  Nikolai Tillmann,et al.  The future of teaching programming is on mobile devices , 2012, ITiCSE '12.

[22]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[23]  D. B. Devoe Alternatives to Handprinting in the Manual Entry of Data , 1967 .

[24]  Sargur N. Srihari,et al.  Interpretation of handwritten addresses in US mailstream , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[25]  A. Kundu,et al.  Recognition of handwritten script: a hidden Markov model based approach , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[26]  Lior Wolf,et al.  CNN-N-Gram for HandwritingWord Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Victor Carbune,et al.  Multi-Language Online Handwriting Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[29]  Clive Frankish,et al.  Recognition accuracy and user acceptance of pen interfaces , 1995, CHI '95.

[30]  Hermann Ney,et al.  Improvements in RWTH's System for Off-Line Handwriting Recognition , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[31]  Ronald A. Metoyer,et al.  A syntax-directed keyboard extension for writing source code on touchscreen devices , 2015, 2015 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[32]  Arun Agarwal,et al.  BANK CHECK ANALYSIS AND RECOGNITION BY COMPUTERS , 1997 .

[33]  Anthony J. Dos Reis Recursive-Descent Parsing , 2012 .

[34]  Mary Shaw,et al.  The state of the art in end-user software engineering , 2011, ACM Comput. Surv..

[35]  Kin Hong Wong,et al.  Script recognition using hidden Markov models , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[36]  Collin McMillan,et al.  Exemplar: A Source Code Search Engine for Finding Highly Relevant Applications , 2012, IEEE Transactions on Software Engineering.

[37]  Volker Märgner,et al.  On-line Arabic handwriting recognition competition , 2010, International Journal on Document Analysis and Recognition (IJDAR).

[38]  Giuliano Antoniol,et al.  Comparison and Evaluation of Clone Detection Tools , 2007, IEEE Transactions on Software Engineering.

[39]  Björn Franke,et al.  PDP: pen driven programming , 2008, BCS HCI.

[40]  Premkumar T. Devanbu,et al.  On the naturalness of software , 2016, Commun. ACM.

[41]  Edouard Geoffrois,et al.  Results of the RIMES Evaluation Campaign for Handwritten Mail Processing , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[42]  Allen Newell,et al.  The psychology of human-computer interaction , 1983 .