Generation of synthetic documents for performance evaluation of symbol recognition & spotting systems

This paper deals with the topic of performance evaluation of symbol recognition & spotting systems. We propose here a new approach to the generation of synthetic graphics documents containing non-isolated symbols in a real context. This approach is based on the definition of a set of constraints that permit us to place the symbols on a pre-defined background according to the properties of a particular domain (architecture, electronics, engineering, etc.). In this way, we can obtain a large amount of images resembling real documents by simply defining the set of constraints and providing a few pre-defined backgrounds. As documents are synthetically generated, the groundtruth (the location and the label of every symbol) becomes automatically available. We have applied this approach to the generation of a large database of architectural drawings and electronic diagrams, which shows the flexibility of the system. Performance evaluation experiments of a symbol localization system show that our approach permits to generate documents with different features that are reflected in variation of localization results.

[1]  Salvatore Tabbone,et al.  A Method for Symbol Spotting in Graphical Documents , 2006, Document Analysis Systems.

[2]  Ed Greengrass,et al.  Information Retrieval: A Survey , 2000 .

[3]  Tapas Kanungo,et al.  Attributed point matching for automatic groundtruth generation , 2002, International Journal on Document Analysis and Recognition.

[4]  Wolfram Koepf,et al.  Lecture Notes in Computer Science (LNCS) , 2011 .

[5]  Atul K. Chhabra,et al.  Symbol Recognition : An Overview , 2005 .

[6]  Jean-Yves Ramel,et al.  A Performance Characterization Algorithm for Symbol Localization , 2009, GREC.

[7]  Bart Lamiroy,et al.  Graphics recognition - from re-engineering to retrieval , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[8]  Ernest Valveny,et al.  Symbol Recognition: Current Advances and Perspectives , 2001, GREC.

[9]  Ming Ye,et al.  Algorithm performance contest , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[10]  Thierry Pun,et al.  Performance evaluation in content-based image retrieval: overview and proposals , 2001, Pattern Recognit. Lett..

[11]  Salvatore Tabbone,et al.  Musings on Symbol Recognition , 2005, GREC.

[12]  Robert M. Haralick,et al.  A Statistical, Nonparametric Methodology for Document Degradation Model Validation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Mario Vento,et al.  Symbol and Shape Recognition , 1999, GREC.

[14]  Robert M. Haralick,et al.  Performance Evaluation of Document Image Algorithms , 1999, GREC.

[15]  Tapas Kanungo,et al.  The architecture of TrueViz: a groundTRUth/metadata editing and VIsualiZing ToolKit , 2003, Pattern Recognit..

[16]  Daniel P. Lopresti,et al.  Issues in Ground-Truthing Graphic Documents , 2001, GREC.

[17]  Ernest Valveny,et al.  Report on the Second Symbol Recognition Contest , 2005, GREC.

[18]  Ihsin T. Phillips,et al.  The Second International Graphics Recognition Contest - Raster to Vector Conversion: A Report , 1997, GREC.

[19]  Gyeonghwan Kim,et al.  New paradigm for segmentation and recognition of handwritten numeral string , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[20]  Josep Lladós,et al.  Vectorial Signatures for Symbol Discrimination , 2003, GREC.

[21]  Dov Dori,et al.  A line drawings degradation model for performance characterization , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[22]  Apostolos Antonacopoulos,et al.  Ground Truth for Layout Analysis Performance Evaluation , 2006, Document Analysis Systems.

[23]  Ernest Valveny,et al.  Performance Evaluation of Symbol Recognition and Spotting Systems: An Overview , 2008, 2008 The Eighth IAPR International Workshop on Document Analysis Systems.

[24]  Ernest Valveny,et al.  Performance Characterization of Shape Descriptors for Symbol Representation , 2008, GREC.

[25]  I.T. Phillips,et al.  The implementation methodology for a CD-ROM English document database , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[26]  Sergio Escalera,et al.  Report on the Third Contest on Symbol Recognition , 2008, GREC.

[27]  Josep Lladós,et al.  A performance evaluation protocol for symbol spotting systems in terms of recognition and location indices , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[28]  Josep Lladós,et al.  A Region-Based Hashing Approach for Symbol Spotting in Technical Documents , 2007, GREC.

[29]  Ernest Valveny,et al.  A general framework for the evaluation of symbol recognition methods , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[30]  Laurent Wendling,et al.  A Hybrid Approach to Detect Graphical Symbols in Documents , 2004, Document Analysis Systems.

[31]  Neil A. Thacker,et al.  Performance characterization in computer vision: A guide to best practices , 2008, Comput. Vis. Image Underst..

[32]  Jacques Labiche,et al.  Symbol Spotting using Full Visibility Graph Representation , 2007 .

[33]  Yan Luo,et al.  Interactive Recognition of Graphic Objects in Engineering Drawings , 2003, GREC.

[34]  Luc Vincent,et al.  Pink Panther: A Complete Environment For Ground-Truthing And Benchmarking Document Page Segmentation , 1998, Pattern Recognit..

[35]  Jean-Yves Ramel,et al.  Symbol Spotting in Graphical Documents Using Graph Representations , 2007 .

[36]  Chew Lim Tan,et al.  Semi-automatic Ground Truth Generation for Chart Image Recognition , 2006, Document Analysis Systems.