Camera-Based Document Analysis and Recognition

This paper describes a new method of color text localization from generic scene images containing text of different scripts and with arbitrary orientations. A representative set of colors is first identified using the edge information to initiate an unsupervised clustering algorithm. Text components are identified from each color layer using a combination of a support vector machine and a neural network classifier trained on a set of low-level features derived from the geometric, boundary, stroke and gradient information. Experiments on camera-captured images that contain variable fonts, size, color, irregular layout, non-uniform illumination and multiple scripts illustrate the robustness of the method. The proposed method yields precision and recall of 0.8 and 0.86 respectively on a database of 100 images. The method is also compared with others in the literature using the ICDAR 2003 robust reading competition dataset.

[1]  Michael Rohs,et al.  Target acquisition with camera phones when used as magic lenses , 2008, CHI.

[2]  Tovi Grossman,et al.  MouseLight: bimanual interactions on digital paper using a pen and a spatially-aware mobile projector , 2010, CHI.

[3]  Qiong Liu,et al.  MixPad: augmenting interactive paper with mice & keyboards for fine-grained cross-media interaction with documents , 2011, UbiComp '11.

[4]  Roel Vertegaal,et al.  PaperPhone: understanding the use of bend gestures in mobile devices with flexible electronic paper displays , 2011, CHI.

[5]  W. Effelsberg,et al.  Robust Character Recognition in Low-Resolution Images and Videos , 2005 .

[6]  Paul A. Viola,et al.  Text recognition of low-resolution document images , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[7]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[8]  Manik Varma,et al.  Character Recognition in Natural Images , 2009, VISAPP.

[9]  David L. Hecht,et al.  Embedded data glyph technology for hardcopy digital documents , 1994, Electronic Imaging.

[10]  Allen R. Hanson,et al.  Scene Text Recognition Using Similarity and a Lexicon with Sparse Belief Propagation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Jean-Marc Odobez,et al.  Text detection, recognition in images and video frames , 2004, Pattern Recognit..

[12]  Ching Y. Suen,et al.  Error-Correcting Output Coding for the Convolutional Neural Network for Optical Character Recognition , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[13]  Zohra Saidane,et al.  Automatic Scene Text Recognition using a Convolutional Neural Network , 2007 .

[14]  Masakazu Iwamura,et al.  Camera-based document image retrieval as voting for partial signatures of projective invariants , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[15]  Berna Erol,et al.  HOTPAPER: multimedia interaction with paper using mobile phones , 2008, ACM Multimedia.

[16]  Masakazu Iwamura,et al.  Real-Time Retrieval for Images of Documents in Various Languages Using a Web Camera , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[17]  Lynn Wilcox,et al.  High accuracy and language independent document retrieval with a Fast Invariant Transform , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[18]  Lynn Wilcox,et al.  Embedded media markers: marks on paper that signify associated media , 2010, IUI '10.

[19]  Jonathan J. Hull,et al.  Icandy: a tangible user interface for itunes , 2008, CHI Extended Abstracts.

[20]  Masakazu Iwamura,et al.  Memory-based recognition of camera-captured characters , 2010, DAS '10.

[21]  Tovi Grossman,et al.  PenLight: combining a mobile projector and a digital pen for dynamic visual overlay , 2009, CHI.

[22]  Pierre David Wellner,et al.  Interacting with paper on the DigitalDesk , 1993, CACM.

[23]  Toru Wakahara,et al.  Segmentation and recognition of characters in scene images using selective binarization in color space and GAT correlation , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[24]  Majid Mirmehdi,et al.  Recognising text in real scenes , 2002, International Journal on Document Analysis and Recognition.

[25]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[26]  Yoshinobu Hotta,et al.  Camera based degraded text recognition using grayscale feature , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[27]  Toru Wakahara,et al.  Binarization and Recognition of Degraded Characters Using a Maximum Separability Axis in Color Space and GAT Correlation , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[28]  Christophe Garcia,et al.  Convolutional face finder: a neural architecture for fast and robust face detection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Simon M. Lucas,et al.  ICDAR 2003 robust reading competitions , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[30]  Scott R. Klemmer,et al.  ButterflyNet: a mobile capture and access system for field biology research , 2006, CHI.

[31]  A. James 2010 , 2011, Philo of Alexandria: an Annotated Bibliography 2007-2016.

[32]  Lynn Wilcox,et al.  Pacer: fine-grained interactive paper via camera-touch hybrid gestures on a cell phone , 2010, CHI.

[33]  Michael Rohs,et al.  Real-World Interaction with Camera Phones , 2004, UCS.

[34]  Wendy E. Mackay,et al.  Designing interactive paper: lessons from three augmented reality projects , 1998 .

[35]  Kori Inkpen Quinn,et al.  Marked-up maps: combining paper maps and electronic information resources , 2006, Personal and Ubiquitous Computing.

[36]  Lynn Wilcox,et al.  Embedded media barcode links: optimally blended barcode overlay on paper for linking to associated media , 2010, ICMI-MLMI '10.

[37]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..