Tools for the efficient generation of hand-drawn corpora based on context-free grammars

In sketch recognition systems, ground-truth data sets serve to both train and test recognition algorithms. Unfortunately, generating data sets that are sufficiently large and varied is frequently a costly and time-consuming endeavour. In this paper, we present a novel technique for creating a large and varied ground-truthed corpus for hand drawn math recognition. Candidate math expressions for the corpus are generated via random walks through a context-free grammar, the expressions are transcribed by human writers, and an algorithm automatically generates ground-truth data for individual symbols and inter-symbol relationships within the math expressions. While the techniques we develop in this paper are illustrated through the creation of a ground-truthed corpus of mathematical expressions, they are applicable to any sketching domain that can be described by a formal grammar.

[1]  Sébastien Adam,et al.  Automatic Ground-truth Generation for Document Image Analysis and Understanding , 2007 .

[2]  Thomas M. Breuel,et al.  Automated OCR Ground Truth Generation , 2008, 2008 The Eighth IAPR International Workshop on Document Analysis Systems.

[3]  Henry M. Levy,et al.  On the use of benchmarks for measuring system performance , 1982, CARN.

[4]  Joseph J. LaViola,et al.  MathPad2: a system for the creation and exploration of mathematical sketches , 2004, ACM Trans. Graph..

[5]  Matti Pietikäinen,et al.  Automatic ground-truth generation for skew-tolerance evaluation of document layout analysis methods , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[6]  George Labahn,et al.  MathBrush: A System for Doing Math on Pen-Based Devices , 2008, 2008 The Eighth IAPR International Workshop on Document Analysis Systems.

[7]  Joseph J. LaViola,et al.  MathPad2: a system for the creation and exploration of mathematical sketches , 2004, SIGGRAPH 2004.

[8]  Sébastien Adam,et al.  Automatic Ground-truth Generation for Document Image Analysis and Understanding , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[9]  Robert H. Anderson,et al.  An on-line symbolic mathematics system using hand-printed two-dimensional notation , 1969, ACM '69.

[10]  Andrea Bunt,et al.  MathBrush: a case study for pen-based interactive mathematics , 2008, SBM'08.

[11]  Richard Zanibbi,et al.  Applying compiler techniques to diagram recognition , 2002, Object recognition supported by user interaction for service robots.

[12]  Joseph J. LaViola,et al.  An initial evaluation of a pen-based tool for creating dynamic mathematical illustrations , 2007, SBM.

[13]  James Arvo,et al.  A Handwritting-Based Equation Editor , 1999, Graphics Interface.

[14]  C. V. Jawahar,et al.  Model-Based Annotation of Online Handwritten Datasets , 2006 .