Sketch interpretation using multiscale stochastic models of temporal patterns

Sketching is a natural mode of interaction used in a variety of settings. For example, people sketch during early design and brainstorming sessions to guide the thought process; when we communicate certain ideas, we use sketching as an additional modality to convey ideas that can not be put in words. The emergence of hardware such as PDAs and Tablet PCs has enabled capturing freehand sketches, enabling the routine use of sketching as an additional human-computer interaction modality. But despite the availability of pen based information capture hardware, relatively little effort has been put into developing software capable of understanding and reasoning about sketches. To date, most approaches to sketch recognition have treated sketches as images (i.e., static finished products) and have applied vision algorithms for recognition. However, unlike images, sketches are produced incrementally and interactively, one stroke at a time and their processing should take advantage of this. This thesis explores ways of doing sketch recognition by extracting as much information as possible from temporal patterns that appear during sketching. We present a sketch recognition framework based on hierarchical statistical models of temporal patterns. We show that in certain domains, stroke orderings used in the course of drawing individual objects contain temporal patterns that can aid recognition. We build on this work to show how sketch recognition systems can use knowledge of both common stroke orderings and common object orderings. We describe a statistical framework based on Dynamic Bayesian Networks that can learn temporal models of object-level and stroke-level patterns for recognition. Our framework supports multi-object strokes, multi-stroke objects, and allows interspersed drawing of objects---relaxing the assumption that objects are drawn one at a time. Our system also supports real-valued feature representations using a numerically stable recognition algorithm. We present recognition results for hand-drawn electronic circuit diagrams. The results show that modeling temporal patterns at multiple scales provides a significant increase in correct recognition rates, with no added computational penalties. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

[1]  James V. Mahoney,et al.  Handling ambiguity in constraint-based recognition of stick figure sketches , 2001, IS&T/SPIE Electronic Imaging.

[2]  Tony Lindeberg,et al.  Edge Detection and Ridge Detection with Automatic Scale Selection , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Randall Davis,et al.  A Domain Description Language for Sketch Recognition , 2002 .

[4]  Jirí Vomlel,et al.  Soft evidential update for probabilistic multiagent systems , 2002, Int. J. Approx. Reason..

[5]  Manoj D. Muzumdar ICEMENDR : intelligent capture environment for mechanical engineering drawing , 1999 .

[6]  Robert B. Fisher Performance Comparison of Ten Variations on the Interpretation-Tree Matching Algorithm , 1994, ECCV.

[7]  Kim Marriott,et al.  A survey of visual language specification and recognition , 1998 .

[8]  W. Eric L. Grimson,et al.  Off-Line Planning for On-Line Object Localization , 1986, FJCC.

[9]  Takeo Kanade,et al.  Towards automatic generation of object recognition programs , 1988 .

[10]  Ken Sugawara,et al.  A New Pattern Representation Scheme Using Data Compression , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Geoffrey Zweig,et al.  The graphical models toolkit: An open source software system for speech and time-series processing , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Kevin Murphy,et al.  Dynamic Bayesian Networks , 2002 .

[13]  Jin Hyung Kim,et al.  Recognition of on-line cursive Korean characters combining statistical and structural methods , 1997, Pattern Recognit..

[14]  Michael C. Horsch,et al.  Dynamic Bayesian networks , 1990 .

[15]  Levent Burak Kara,et al.  An Image-Based Trainable Symbol Recognizer for Sketch-Based Interfaces , 2004, AAAI Technical Report.

[16]  Tracy Anne Hammond,et al.  LADDER: a language to describe drawing, display, and editing in sketch recognition , 2003, IJCAI.

[17]  Kenneth D. Forbus,et al.  Towards a computational model of sketching , 2001, IUI '01.

[18]  Liu Wenyin Example-Driven Graphics Recognition , 2002 .

[19]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Christine Alvarado,et al.  A Framework for Multi-Domain Sketch Recognition , 2002 .

[21]  Tevfik Metin Sezgin Recognition efficiency issues for freehand sketches , 2004 .

[22]  Jeff A. Bilmes,et al.  What HMMs Can Do , 2006, IEICE Trans. Inf. Syst..

[23]  H. Dibeklioglu,et al.  A Recognizer for Free-Hand Graph Drawings , 2007, First International Workshop on Pen-Based Learning Technologies (PLT 2007).

[24]  Chris Goad,et al.  Special purpose automatic programming for 3D model-based vision , 1987 .

[25]  James V. Mahoney,et al.  Three main concerns in sketch recognition and an approach to addressing them , 2002 .

[26]  Andrew S. Forsberg,et al.  The music notepad , 1998, UIST '98.

[27]  Kazuhiko Yamamoto,et al.  Structured Document Image Analysis , 1992, Springer Berlin Heidelberg.

[28]  Balaji Krishnapuram,et al.  Generative models and Bayesian model comparison for shape recognition , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[29]  Marjorie Skubic,et al.  Hidden Markov Model Symbol Recognition for Sketch-Based Interfaces , 2004, AAAI Technical Report.

[30]  Ellen Yi-Luen Do,et al.  Ambiguous intentions: a paper-like interface for creative design , 1996, UIST '96.

[31]  Randall Davis,et al.  HMM-based efficient sketch recognition , 2005, IUI.

[32]  Robert B. Fisher Non-wildcard Matching Beats the Interpretation Tree , 1992, BMVC.

[33]  James M. Coughlan,et al.  Finding Deformable Shapes Using Loopy Belief Propagation , 2002, ECCV.

[34]  Horst Bunke,et al.  Syntactic and structural pattern recognition : theory and applications , 1990 .

[35]  Karl Tombre Analysis of Engineering Drawings: State of the Art and Challenges , 1997, GREC.

[36]  Dean Rubine,et al.  Specifying gestures by example , 1991, SIGGRAPH.

[37]  James A. Landay,et al.  Sketching Interfaces: Toward More Human Interface Design , 2001, Computer.

[38]  Andreas D. Blaser User Interaction in a Sketch-Based GIS User Interface , 1997, COSIT.

[39]  Manuel J. Fonseca,et al.  Sketching User Interfaces with Visual Patterns , .

[40]  R. Davis,et al.  Perceptually Based Learning of Shape Descriptions , 2002 .

[41]  James V. Mahoney,et al.  Interpreting Sloppy Stick Figures by Graph Rectification and Constraint-Based Matching , 2001, GREC.

[42]  David Eppstein,et al.  Finding the k shortest paths , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[43]  Jan-Olof Eklundh,et al.  Shape Representation by Multiscale Contour Approximation , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[44]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[45]  Jake K. Aggarwal,et al.  CAD-based vision: object recognition in cluttered range images using recognition strategies , 1993 .

[46]  Xiaogang Wang,et al.  Face sketch synthesis and recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[47]  W. Eric L. Grimson The Combinatorics of Heuristic Search Termination for Object Recognition in Cluttered Environments , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[48]  Randall Davis,et al.  Scale-space Based Feature Point Detection for Noisy Digital Curves , 2001 .

[49]  Yan Luo,et al.  Interactive Recognition of Graphic Objects in Engineering Drawings , 2003, GREC.

[50]  Yoram Singer,et al.  The Hierarchical Hidden Markov Model: Analysis and Applications , 1998, Machine Learning.

[51]  James A. Landay,et al.  "Those look similar!" issues in automating gesture design advice , 2001, PUI '01.

[52]  V. F Kumar,et al.  Image Interpretation Using Bayesian Networks , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[53]  Bin Zhang,et al.  User Adaptation for Online Sketchy Shape Recognition , 2003, GREC.

[54]  Edward Lank,et al.  User Interfaces for On-Line Diagram Recognition , 2001, GREC.

[55]  Sargur N. Srihari,et al.  On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[56]  Aaron Adler Segmentation and Alignment of Speech and Sketching in a Design Environment , 2003 .

[57]  Jeff A. Bilmes,et al.  Part-of-Speech Tagging using Virtual Evidence and Negative Training , 2005, HLT.

[58]  Luke Weisman,et al.  A foundation for intelligent multimodal drawing and sketching programs , 1999 .

[59]  Randall Davis,et al.  Sketch Interpretation Using Multiscale Models of Temporal Patterns , 2007, IEEE Computer Graphics and Applications.

[60]  Joaquim A. Jorge,et al.  Experimental Evaluation of a Trainable Scribble Recognizer for Calligraphic Interfaces , 2001, GREC.

[61]  Jeff A. Bilmes,et al.  On Triangulating Dynamic Graphical Models , 2002, UAI.

[62]  Jianming Liang,et al.  A Framework for Generic Object Recognition with Bayesian Networks , 2005 .

[63]  Randall Davis,et al.  Generating Domain Specific Sketch Recognizers From Object Descriptions , 2002 .

[64]  Kevin Murphy,et al.  Modelling Gene Expression Data using Dynamic Bayesian Networks , 2006 .

[65]  Levent Burak Kara,et al.  Combining geometry and domain knowledge to interpret hand-drawn diagrams , 2005, Comput. Graph..

[66]  David Heckerman,et al.  Knowledge Representation and Inference in Similarity Networks and Bayesian Multinets , 1996, Artif. Intell..

[67]  Stuart Russell,et al.  Statistical Visual Language Models for Ink Parsing , 2002 .

[68]  Robert M. Gray,et al.  Multiresolution image classification by hierarchical modeling with two-dimensional hidden Markov models , 2000, IEEE Trans. Inf. Theory.

[69]  W. Eric L. Grimson,et al.  Localizing Overlapping Parts by Searching the Interpretation Tree , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[70]  Randall Davis,et al.  Scale-space based feature point detection for digital ink , 2007, SIGGRAPH '07.

[71]  Sanjeev Kumar,et al.  A multimodal learning interface for sketch, speak and point creation of a schedule chart , 2004, ICMI '04.

[72]  Robert M. Gray,et al.  Image classification by a two-dimensional hidden Markov model , 2000, IEEE Trans. Signal Process..

[73]  Gregory Dudek,et al.  Sketch Interpretation and Refinement Using Statistical Models , 2004, Rendering Techniques.

[74]  Christine J. Alvarado,et al.  A natural sketching environment : bringing the computer into early stages of mechanical design , 2000 .

[75]  David Craig,et al.  The importance of drawing in the mechanical design process , 1990, Comput. Graph..

[76]  O. Firschein,et al.  Syntactic pattern recognition and applications , 1983, Proceedings of the IEEE.

[77]  Randall Davis,et al.  Handling Overtraced Strokes in Hand-Drawn Sketches , 2004, AAAI Technical Report.

[78]  Mark D. Gross,et al.  Recognizing and interpreting diagrams in design , 1994, AVI '94.

[79]  Elaine Cohen,et al.  Real time spline curves from interactively sketched data , 1990, I3D '90.

[80]  Song Han,et al.  3DSketch: modeling by digitizing with a smart 3D pen , 1997, MULTIMEDIA '97.

[81]  Anil K. Jain,et al.  CAD-Based Computer Vision: From CAD Models to Relational Graphs , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[82]  Reinhard Klette,et al.  A Fuzzy Syntactic Method for On-Line Handwriting Recognition , 1996, SSPR.

[83]  Takeo Kanade,et al.  Automatic generation of object recognition programs , 1988, Proc. IEEE.

[84]  Jin Hyung Kim,et al.  Statistical Character Structure Modeling and Its Application to Handwritten Chinese Character Recognition , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[85]  David Pugh,et al.  Designing solid objects using interactive sketch interpretation , 1992, I3D '92.

[86]  Steffen L. Lauritzen,et al.  Stable local computation with conditional Gaussian distributions , 2001, Stat. Comput..

[87]  B. Tversky,et al.  What Does Drawing Reveal about Thinking? , 1999 .

[88]  James L. Flanagan,et al.  Multimodal interaction on PDA's integrating speech and pen inputs , 2003, INTERSPEECH.

[89]  Christine Alvarado,et al.  SketchREAD: a multi-domain sketch recognition engine , 2004, UIST '04.

[90]  J. H. M. Byne,et al.  A CAD-based computer vision system , 1998, Image Vis. Comput..

[91]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[92]  Mark W. Newman,et al.  Informal PUIs: No Recognition Required , 2002 .

[93]  Eric Lecolinet,et al.  A Survey of Methods and Strategies in Character Segmentation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[94]  Paul A. Viola,et al.  Recognition and grouping of handwritten text in diagrams and equations , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[95]  Lambert Schomaker,et al.  From handwriting analysis to pen-computer applications , 1998 .

[96]  Edward P. K. Tsang,et al.  Foundations of constraint satisfaction , 1993, Computation in cognitive science.

[97]  Daphne Koller,et al.  Using Learning for Approximation in Stochastic Processes , 1998, ICML.

[98]  Philip R. Cohen,et al.  QuickSet: multimodal interaction for distributed applications , 1997, MULTIMEDIA '97.

[99]  Tony Lindeberg,et al.  Feature Detection with Automatic Scale Selection , 1998, International Journal of Computer Vision.

[100]  Craig Boutilier,et al.  Context-Specific Independence in Bayesian Networks , 1996, UAI.

[101]  Christine Alvarado,et al.  Dynamically constructed Bayes nets for multi-domain sketch understanding , 2005, IJCAI.

[102]  Thomas C. Henderson,et al.  CAGD-Based Computer Vision , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[103]  Paul A. Viola,et al.  Spatial recognition and grouping of text and graphics , 2004, SBM'04.

[104]  Peter van Sommers,et al.  Drawing and Cognition: Descriptive and Experimental Studies of Graphic Production Processes , 1984 .

[105]  T. Metin Sezgin,et al.  On the statistical analysis of Feigenbaum constants , 2006, J. Frankl. Inst..

[106]  Jason Hong,et al.  Sketch Recognizers from the End-User's, the Designer's, and the Programmer's Perspective , 2002 .

[107]  Eric J. Golin,et al.  Theory of visual languages , 1991, Journal of Visual Languages and Computing.

[108]  S. Lauritzen Propagation of Probabilities, Means, and Variances in Mixed Graphical Association Models , 1992 .

[109]  Peter T. Ewell Designs for the Future , 2003 .

[110]  James R. Glass,et al.  Real-time probabilistic segmentation for segment-based speech recognition , 1998, ICSLP.

[111]  Kevin P. Murphy,et al.  Linear-time inference in Hierarchical HMMs , 2001, NIPS.

[112]  Satoshi Matsuoka,et al.  Interactive beautification: a technique for rapid geometric design , 2006, SIGGRAPH Courses.

[113]  Todd A. Cass Polynomial-Time Geometric Matching for Object Recognition , 2004, International Journal of Computer Vision.

[114]  Christopher F. Herot Graphical input through machine recognition of sketches , 1976, SIGGRAPH '76.

[115]  Satoshi Matsuoka,et al.  Teddy: A Sketching Interface for 3D Freeform Design , 1999, SIGGRAPH Courses.

[116]  John F. Hughes,et al.  SKETCH: An Interface for Sketching 3D Scenes , 1996, SIGGRAPH.

[117]  Joaquim A. Jorge,et al.  CALI: An Online Scribble Recognizer for Calligraphic Interfaces , 2002 .

[118]  Takayuki Dan Kimura,et al.  Recognizing multistroke geometric shapes: an experimental evaluation , 1993, UIST '93.

[119]  David Heckerman,et al.  Probabilistic similarity networks , 1991, Networks.

[120]  Randall Davis,et al.  Sketch Understanding in Design: Overview of Work at the MIT AI Lab , 2002 .

[121]  Michael Oltmans,et al.  Understanding Naturally Conveyed Explanations of Device Behavior , 2000 .

[122]  Roland T. Chin,et al.  Scale-Based Detection of Corners of Planar Curves , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[123]  Thomas F. Stahovich,et al.  Sketch based interfaces: early processing for sketch understanding , 2001, PUI '01.

[124]  Jeff A. Bilmes,et al.  Dynamic Bayesian Multinets , 2000, UAI.