Camera-based analysis of text and documents: a survey

Abstract.The increasing availability of high-performance, low-priced, portable digital imaging devices has created a tremendous opportunity for supplementing traditional scanning for document image acquisition. Digital cameras attached to cellular phones, PDAs, or wearable computers, and standalone image or video devices are highly mobile and easy to use; they can capture images of thick books, historical manuscripts too fragile to touch, and text in scenes, making them much more versatile than desktop scanners. Should robust solutions to the analysis of documents captured with such devices become available, there will clearly be a demand in many domains. Traditional scanner-based document analysis techniques provide us with a good reference and starting point, but they cannot be used directly on camera-captured images. Camera-captured images can suffer from low resolution, blur, and perspective distortion, as well as complex layout and interaction of the content and background. In this paper we present a survey of application domains, technical challenges, and solutions for the analysis of documents captured by digital cameras. We begin by describing typical imaging devices and the imaging process. We discuss document analysis from a single camera-captured image as well as multiple frames and highlight some sample applications under development and feasible ideas for future development.

[1]  Rama Chellappa,et al.  Multiscale Segmentation of Unstructured Document Pages Using Soft Decision Integration , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  David Doermann,et al.  Text enhancement in digital video , 1999, Electronic Imaging.

[3]  Stefano Messelodi,et al.  Automatic identification and skew estimation of text lines in real scene images , 1999, Pattern Recognition.

[4]  Anil K. Jain,et al.  Automatic text location in images and video frames , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[5]  Pietro Perona,et al.  Visual Input for Pen-Based Computers , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Chew Lim Tan,et al.  Restoration of curved document images through 3D shape modeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[7]  Rangachar Kasturi,et al.  A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Gernot A. Fink,et al.  Toward automatic video-based whiteboard reading , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[9]  Carsten Rother,et al.  A New Approach for Vanishing Point Detection in Architectural Environments , 2000, BMVC.

[10]  David S. Doermann,et al.  Tools and techniques for video performance evaluation , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[11]  Paul Over,et al.  The TREC-2002 Video Track Report , 2002, TREC.

[12]  Majid Mirmehdi,et al.  Extracting Low Resolution Text with an Active Camera for OCR , 2001 .

[13]  Edward K. Wong,et al.  A robust algorithm for text extraction in color video , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[14]  Maurizio Pilu,et al.  Extraction of illusory linear clues in perspectively skewed documents , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[15]  Jin Hyeong Park,et al.  Performance evaluation of object detection algorithms , 2002, Object recognition supported by user interaction for service robots.

[16]  David S. Doermann,et al.  Automatic text detection and tracking in digital video , 2000, IEEE Trans. Image Process..

[17]  Andrew Zisserman,et al.  Super-resolution enhancement of text image sequences , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[18]  Rodolfo Zunino,et al.  Vector quantization for license-plate location and image coding , 2000, IEEE Trans. Ind. Electron..

[19]  JungHyun Han,et al.  Text extraction in real scene images on planar planes , 2002, Object recognition supported by user interaction for service robots.

[20]  Lisa M. Brown,et al.  A survey of image registration techniques , 1992, CSUR.

[21]  Majid Mirmehdi,et al.  Finding Text Regions using Localised Statistical Measures , 2000, British Machine Vision Conference.

[22]  David S. Doermann,et al.  Text enhancement in digital video using multiple frame integration , 1999, MULTIMEDIA '99.

[23]  Hae-Kwang Kim,et al.  Efficient Automatic Text Location Method and Content-Based Indexing and Structuring of Video Database , 1996, J. Vis. Commun. Image Represent..

[24]  Joseph Kittler,et al.  Towards Optimal Zoom for Automatic target Recognition , 1997 .

[25]  Patrick J. Grother,et al.  The First Census Optical Character Recognition Systems Conference | NIST , 1992 .

[26]  T. A. Nartker,et al.  OCR Accuracy: UNLV's third annual test , 1994 .

[27]  Christopher R. Dance,et al.  Binarising camera images for OCR , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[28]  Datong Chen,et al.  Text enhancement with asymmetric filter for video OCR , 2001, Proceedings 11th International Conference on Image Analysis and Processing.

[29]  M. V. Ranganath,et al.  Real time image enhancement for both text and color photo images , 1995, Proceedings., International Conference on Image Processing.

[30]  Christopher R. Dance,et al.  Perspective estimation for document images , 2001, IS&T/SPIE Electronic Imaging.

[31]  Yasuhiko Watanabe,et al.  Translation camera , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[32]  Alex S. Taylor,et al.  CamWorks: a video-based tool for efficient capture from paper source documents , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[33]  Shigeru Akamatsu,et al.  Recognizing Characters in Scene Images , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Paolo Ferragina,et al.  Optical recognition of motor vehicle license plates , 1995 .

[35]  Jean-Michel Jolion,et al.  Text localization, enhancement and binarization in multimedia documents , 2002, Object recognition supported by user interaction for service robots.

[36]  Ellen K. Hughes,et al.  Video OCR for digital news archive , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[37]  Alain Bouju,et al.  Former books digital processing: image warping , 1997, Proceedings Workshop on Document Image Analysis (DIA'97).

[38]  David S. Doermann,et al.  A video text detection system based on automated training , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[39]  Sei-Wang Chen,et al.  Automatic license plate recognition , 2004, IEEE Transactions on Intelligent Transportation Systems.

[40]  Chew Lim Tan,et al.  Correcting document image warping based on regression of curved text lines , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[41]  Zhao Zhang,et al.  Estimation of 3D shape of warped document surface for image restoration , 2004, ICPR 2004.

[42]  Andrea Miene,et al.  Extracting textual inserts from digital videos , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[43]  Robert M. Haralick,et al.  Document image understanding: geometric and logical layout , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Changsong Liu,et al.  Rectifying the bound document image captured by the camera: a model based approach , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[45]  E. Y. Du,et al.  Thresholding video images for text detection , 2002, Object recognition supported by user interaction for service robots.

[46]  Robert C. Bolles,et al.  RECOGNITION OF TEXT IN 3-D SCENES , 2001 .

[47]  Chew Lim Tan,et al.  Restoration of images scanned from thick bound documents , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[48]  Sunghoon Kim,et al.  A robust license-plate extraction method under complex image conditions , 2002, Object recognition supported by user interaction for service robots.

[49]  Majid Mirmehdi,et al.  On the Recovery of Oriented Documents from Single Images , 2002 .

[50]  Jun-Wei Hsieh,et al.  Morphology-based license plate detection from complex scenes , 2002, Object recognition supported by user interaction for service robots.

[51]  Majid Mirmehdi,et al.  Finding Text Regions Using Localised Measures , 2000 .

[52]  T. Gotoh,et al.  A Flexible Vision-Based Algorithm for a Book Sorting System , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[53]  Gernot A. Fink,et al.  Video-based on-line handwriting recognition , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[54]  Patrick J. Grother,et al.  The Second Census Optical Character Recognition Systems Conference , 1994 .

[55]  Christian Wolf Text detection in images and videos for semantic indexing , 2003 .

[56]  Majid Mirmehdi,et al.  Location and recovery of text on oriented surfaces , 1999, Electronic Imaging.

[57]  Rainer Lienhart,et al.  Automatic text recognition in digital videos , 1995, Electronic Imaging.

[58]  Shoji Kurakake,et al.  Recognition and visual feature matching of text region in video for conceptual indexing , 1997, Electronic Imaging.

[59]  Michael R. Lyu,et al.  A new approach for video text detection , 2002, Proceedings. International Conference on Image Processing.

[60]  Maurizio Pilu,et al.  A light-weight text image processing method for handheld embedded cameras , 2002, BMVC.

[61]  V. F. Maergner,et al.  On benchmarking of document analysis systems , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[62]  Majid Mirmehdi,et al.  Text selection by structured light marking for hand-held cameras , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[63]  David S. Doermann,et al.  The Indexing and Retrieval of Document Images: A Survey , 1998, Comput. Vis. Image Underst..

[64]  Xilin Chen,et al.  A robust approach for recognition of text embedded in natural scenes , 2002, Object recognition supported by user interaction for service robots.

[65]  Alessandro Vinciarelli,et al.  A survey on off-line Cursive Word Recognition , 2002, Pattern Recognit..

[66]  W. Brent Seales,et al.  Document restoration using 3D shape: a general deskewing algorithm for arbitrarily warped documents , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[67]  Edward M. Riseman,et al.  Finding text in images , 1997, DL '97.

[68]  Majid Mirmehdi,et al.  Recognising text in real scenes , 2002, International Journal on Document Analysis and Recognition.

[69]  Rainer Lienhart,et al.  Localizing and segmenting text in images and videos , 2002, IEEE Trans. Circuits Syst. Video Technol..

[70]  JungHyun Han,et al.  Text scanner with text detection technology on image sequences , 2002, Object recognition supported by user interaction for service robots.

[71]  Atreyi Kankanhalli,et al.  Automatic Extraction of Characters in Complex Scene Images , 1995, Int. J. Pattern Recognit. Artif. Intell..

[72]  Simon M. Lucas,et al.  ICDAR 2003 robust reading competitions , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[73]  Anil K. Jain,et al.  Automatic caption localization in compressed video , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[74]  Chitra Dorai,et al.  Automatic text extraction from video for content-based annotation and retrieval , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[75]  Daniel P. Lopresti,et al.  Locating and Recognizing Text in WWW Images , 2000, Information Retrieval.

[76]  Quentin Stafford-Fraser,et al.  BrightBoard: a video-augmented environment , 1996, CHI '96.

[77]  Shih-Fu Chang,et al.  General and domain-specific techniques for detecting and recognizing superimposed text in video , 2002, Proceedings. International Conference on Image Processing.

[78]  LiangJian,et al.  Camera-based analysis of text and documents , 2005 .

[79]  George Nagy,et al.  Twenty Years of Document Image Analysis in PAMI , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[80]  Majid Mirmehdi,et al.  Estimating the Orientation and Recovery of Text Planes in a Single Image , 2001, BMVC.

[81]  Larry S. Davis,et al.  A video based interface to textual information for the visually impaired , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[82]  Song Mao,et al.  Empirical Performance Evaluation Methodology and Its Application to Page Segmentation Algorithms , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[83]  Kimberly Moravec A Grayscale Reader for Camera Images of Xerox DataGlyphs , 2002, BMVC.

[84]  Maurizio Pilu,et al.  Undoing page curl distortion using applicable surfaces , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[85]  David S. Doermann,et al.  Binarization of low quality text using a Markov random field model , 2002, Object recognition supported by user interaction for service robots.

[86]  Minoru Mori,et al.  Telop-on-demand: video structuring and retrieval based on text recognition , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[87]  Alex Waibel,et al.  An automatic sign recognition and translation system , 2001, PUI '01.

[88]  Ronald W. Schafer,et al.  A generalized interpolative vector quantization method for jointly optimal quantization, interpolation, and binarization of text images , 2000, IEEE Trans. Image Process..

[89]  S. Lucas,et al.  ICDAR 2003 robust reading competitions: entries, results, and future directions , 2005, International Journal of Document Analysis and Recognition (IJDAR).

[90]  Lina J. Karam,et al.  Morphological text extraction from images , 2000, IEEE Trans. Image Process..

[91]  Edward M. Riseman,et al.  TextFinder: An Automatic System to Detect and Recognize Text In Images , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[92]  Øivind Due Trier,et al.  Evaluation of Binarization Methods for Document Images , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[93]  Hao Wang,et al.  Character-like region verification for extracting text in scene images , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[94]  Michael Elad,et al.  Restoration of a single superresolution image from several blurred, noisy, and undersampled measured images , 1997, IEEE Trans. Image Process..

[95]  Wei W. Cindy Jiang Thresholding and enhancement of text images for character recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[96]  Takeo Kanade,et al.  Limits on super-resolution and how to break them , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[97]  Xian-Sheng Hua,et al.  Automatic performance evaluation for video text detection , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[98]  Robert M. Gray,et al.  Text and picture segmentation by the distribution analysis of wavelet coefficients , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[99]  Francesco Isgrò,et al.  A fast and reliable planar registration method with applications to document stitching , 2002, BMVC.

[100]  Robert M. Haralick,et al.  Global and local document degradation models , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[101]  Xian-Sheng Hua,et al.  Automatic location of text in video frames , 2001, MULTIMEDIA '01.

[102]  H. Kamada,et al.  High-speed, high-accuracy binarization method for recognizing text in images of low spatial resolutions , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[103]  David J. Crandall,et al.  Extraction of special effects caption text events from digital video , 2003, International Journal on Document Analysis and Recognition.

[104]  Michal Irani,et al.  Improving resolution by image registration , 1991, CVGIP Graph. Model. Image Process..

[105]  Pierre David Wellner,et al.  Interacting with paper on the DigitalDesk , 1993, CACM.

[106]  Wolfgang Effelsberg,et al.  Automatic text segmentation and text recognition for video indexing , 2000, Multimedia Systems.

[107]  H.-M. Suen,et al.  Text string extraction from images of colour-printed documents , 1996 .

[108]  Anil K. Jain,et al.  Locating text in complex color images , 1995, Pattern Recognit..

[109]  Sargur N. Srihari,et al.  On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[110]  Maurizio Pilu Undoing paper curl distortion using applicable surfaces , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[111]  Ying Zhang,et al.  Towards Automatic Sign Translation , 2001, HLT.

[112]  Michael J. Taylor,et al.  Enhancement of document images from cameras , 1998, Electronic Imaging.

[113]  Stephen V. Rice,et al.  The Fourth Annual Test of OCR Accuracy , 1995 .

[114]  Chuang Li,et al.  Automatic text location in natural scene images , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.