论文信息 - Camera-Based Document Analysis and Recognition

Camera-Based Document Analysis and Recognition

This paper describes a new method of color text localization from generic scene images containing text of different scripts and with arbitrary orientations. A representative set of colors is first identified using the edge information to initiate an unsupervised clustering algorithm. Text components are identified from each color layer using a combination of a support vector machine and a neural network classifier trained on a set of low-level features derived from the geometric, boundary, stroke and gradient information. Experiments on camera-captured images that contain variable fonts, size, color, irregular layout, non-uniform illumination and multiple scripts illustrate the robustness of the method. The proposed method yields precision and recall of 0.8 and 0.86 respectively on a database of 100 images. The method is also compared with others in the literature using the ICDAR 2003 robust reading competition dataset.

Takeo Kanade | Alfred Kobsa | Moni Naor | Josef Kittler | John C. Mitchell

[1] Michael Rohs,et al. Target acquisition with camera phones when used as magic lenses , 2008, CHI.

[2] Tovi Grossman,et al. MouseLight: bimanual interactions on digital paper using a pen and a spatially-aware mobile projector , 2010, CHI.

[3] Qiong Liu,et al. MixPad: augmenting interactive paper with mice & keyboards for fine-grained cross-media interaction with documents , 2011, UbiComp '11.

[4] Roel Vertegaal,et al. PaperPhone: understanding the use of bend gestures in mobile devices with flexible electronic paper displays , 2011, CHI.

[5] W. Effelsberg,et al. Robust Character Recognition in Low-Resolution Images and Videos , 2005 .

[6] Paul A. Viola,et al. Text recognition of low-resolution document images , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[7] N. Otsu. A threshold selection method from gray level histograms , 1979 .

[8] Manik Varma,et al. Character Recognition in Natural Images , 2009, VISAPP.

[9] David L. Hecht,et al. Embedded data glyph technology for hardcopy digital documents , 1994, Electronic Imaging.

[10] Allen R. Hanson,et al. Scene Text Recognition Using Similarity and a Lexicon with Sparse Belief Propagation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11] Jean-Marc Odobez,et al. Text detection, recognition in images and video frames , 2004, Pattern Recognit..

[12] Ching Y. Suen,et al. Error-Correcting Output Coding for the Convolutional Neural Network for Optical Character Recognition , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[13] Zohra Saidane,et al. Automatic Scene Text Recognition using a Convolutional Neural Network , 2007 .

[14] Masakazu Iwamura,et al. Camera-based document image retrieval as voting for partial signatures of projective invariants , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[15] Berna Erol,et al. HOTPAPER: multimedia interaction with paper using mobile phones , 2008, ACM Multimedia.

[16] Masakazu Iwamura,et al. Real-Time Retrieval for Images of Documents in Various Languages Using a Web Camera , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[17] Lynn Wilcox,et al. High accuracy and language independent document retrieval with a Fast Invariant Transform , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[18] Lynn Wilcox,et al. Embedded media markers: marks on paper that signify associated media , 2010, IUI '10.

[19] Jonathan J. Hull,et al. Icandy: a tangible user interface for itunes , 2008, CHI Extended Abstracts.

[20] Masakazu Iwamura,et al. Memory-based recognition of camera-captured characters , 2010, DAS '10.

[21] Tovi Grossman,et al. PenLight: combining a mobile projector and a digital pen for dynamic visual overlay , 2009, CHI.

[22] Pierre David Wellner,et al. Interacting with paper on the DigitalDesk , 1993, CACM.

[23] Toru Wakahara,et al. Segmentation and recognition of characters in scene images using selective binarization in color space and GAT correlation , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[24] Majid Mirmehdi,et al. Recognising text in real scenes , 2002, International Journal on Document Analysis and Recognition.

[25] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[26] Yoshinobu Hotta,et al. Camera based degraded text recognition using grayscale feature , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[27] Toru Wakahara,et al. Binarization and Recognition of Degraded Characters Using a Maximum Separability Axis in Color Space and GAT Correlation , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[28] Christophe Garcia,et al. Convolutional face finder: a neural architecture for fast and robust face detection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29] Simon M. Lucas,et al. ICDAR 2003 robust reading competitions , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[30] Scott R. Klemmer,et al. ButterflyNet: a mobile capture and access system for field biology research , 2006, CHI.

[31] A. James. 2010 , 2011, Philo of Alexandria: an Annotated Bibliography 2007-2016.

[32] Lynn Wilcox,et al. Pacer: fine-grained interactive paper via camera-touch hybrid gestures on a cell phone , 2010, CHI.

[33] Michael Rohs,et al. Real-World Interaction with Camera Phones , 2004, UCS.

[34] Wendy E. Mackay,et al. Designing interactive paper: lessons from three augmented reality projects , 1998 .

[35] Kori Inkpen Quinn,et al. Marked-up maps: combining paper maps and electronic information resources , 2006, Personal and Ubiquitous Computing.

[36] Lynn Wilcox,et al. Embedded media barcode links: optimally blended barcode overlay on paper for linking to associated media , 2010, ICMI-MLMI '10.

[37] Patrice Y. Simard,et al. Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..