Natural Scene Text Understanding

In a society driven by visual information and with the drastic expansion of low-priced cameras, vision techniques are more and more considered and text recognition is nowadays a fast changing field, which is included in a large spectrum, named text understanding. Previously, text recognition was dealing with documents only; those which were acquired with flatbed, sheet-fed or mounted imaging devices. Recently, handheld scanners such as pen-scanners appeared to acquire small parts of text on a fairly planar surface such as that of a business card. Issues having an impact on image processing are limited to sensor noise, skewed documents and inherent degradations to the document itself. Based on this classical acquisition method, optical character recognition (OCR) systems have been designed for many years to reach a high level of recognition with constrained documents, meaning those falling into traditional layout, with relatively clean backgrounds such as regular letters, forms, faxes, checks and so on and with a sufficient resolution (at least 300 dots per inch (dpi)). With the recent explosion of handheld imaging devices (HIDs), i.e. digital cameras, standalone or embedded in cellular phones or personal digital assistants (PDAs), research on document image analysis entered a new era where breakthroughs are required: traditional document analysis systems fail against this new and promising acquisition mode and main differences and reasons of failures will be detailed in this section. Small, light, and handy, these devices enable the removal of all constraints and all objects, such as natural scenes (NS) in different situations in streets, at home or in planes may be now acquired! Moreover, recent studies [Kim, 2005] announced a decline in scanner sales while projecting that sales of HIDs will keep increasing over the next 10 years.

[1]  Francisco Martínez-Verdú,et al.  Comparison Between the Number of Discernible Colors in a Digital Camera and the Human Eye , 2004, CGIV.

[2]  William J. Byrne,et al.  A Generative Probabilistic OCR Model for NLP Applications , 2003, NAACL.

[3]  Satoshi Goto,et al.  A robust algorithm for text detection in color images , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[4]  K. Plataniotis,et al.  Color Image Processing and Applications , 2000 .

[5]  Masatoshi Okutomi,et al.  Super-resolution under image deformation , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[6]  Jonathan Rigelsford,et al.  Colour Image Processing and Applications Digital Signal Processing , 2001 .

[7]  Michael Hild Color similarity measures for efficient color classification , 2004 .

[8]  Horst Bunke,et al.  Fast approximate matching of words against a dictionary , 1995, Computing.

[9]  Hae Yong Kim Segmentation-free printed character recognition by relaxed nearest neighbor learning of windowed operator , 1999, XII Brazilian Symposium on Computer Graphics and Image Processing (Cat. No.PR00481).

[10]  J. D. van Ouwerkerk,et al.  Image super-resolution survey , 2006, Image Vis. Comput..

[11]  Mehryar Mohri,et al.  Finite-State Transducers in Language and Speech Processing , 1997, CL.

[12]  Ioannis Pratikakis,et al.  Towards Text Recognition in Natural Scene Images , 2005 .

[13]  Chi Fang,et al.  Deciphering algorithms for degraded document recognition , 1998 .

[14]  A. Murat Tekalp,et al.  Superresolution video reconstruction with arbitrary sampling lattices and nonzero aperture time , 1997, IEEE Trans. Image Process..

[15]  David S. Doermann,et al.  Text enhancement in digital video using multiple frame integration , 1999, MULTIMEDIA '99.

[16]  Zhuowen Tu,et al.  Image Parsing: Unifying Segmentation, Detection, and Recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[17]  Ming-Chao Chiang,et al.  Local blur estimation and super-resolution , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Charles Elkan,et al.  Using the Triangle Inequality to Accelerate k-Means , 2003, ICML.

[19]  Horst Bunke,et al.  Identification of text on colored book and journal covers , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[20]  Daniel P. Lopresti,et al.  OCR for World Wide Web images , 1997, Electronic Imaging.

[21]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[22]  Hang Joon Kim,et al.  Segmentation of touching characters using an MLP , 1998, Pattern Recognit. Lett..

[23]  Majid Mirmehdi,et al.  Super-Resolution Text using the Teager Filter , 2005 .

[24]  Michal Irani,et al.  Improving resolution by image registration , 1991, CVGIP Graph. Model. Image Process..

[25]  Christopher R. Dance,et al.  Binarising camera images for OCR , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[26]  Michael Elad,et al.  Fast and robust multiframe super resolution , 2004, IEEE Transactions on Image Processing.

[27]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[28]  Xiaofan Lin,et al.  Impact of imperfect OCR on part-of-speech tagging , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[29]  Lawrence O. Hall,et al.  Text extraction from color documents-clustering approaches in three and four dimensions , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[30]  David Capel,et al.  Image Mosaicing and Super-resolution , 2004, Distinguished Dissertations.

[31]  Xilin Chen,et al.  Incremental detection of text on road signs from video with application to a driving assistant system , 2004, MULTIMEDIA '04.

[32]  Y. Ohta Knowledge-based interpretation of outdoor natural color scenes , 1998 .

[33]  Chew Lim Tan,et al.  Adaptive Region Growing Color Segmentation for Text Using Irregular Pyramid , 2004, Document Analysis Systems.

[34]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[35]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[36]  Slawomir Wesolkowski,et al.  Color Image Edge Detection and Segmentation: A Comparison of the Vector Angle and the Euclidean Distance Color Similarity Measures , 1999 .

[37]  Bernard Gosselin,et al.  Segmentation-Based Binarization for Color Degraded Images , 2004, ICCVG.

[38]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[39]  Majid Mirmehdi,et al.  An Introduction to Super-Resolution Text , 2007 .

[40]  Takeo Kanade,et al.  Super-Resolution Optical Flow , 1999 .

[41]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[42]  Joydeep Ghosh,et al.  Relationship-based clustering and cluster ensembles for high-dimensional data mining , 2002 .

[43]  Tony F. Chan,et al.  Total variation blind deconvolution , 1998, IEEE Trans. Image Process..

[44]  B. Gosselin,et al.  ROBUST THRESHOLDING BASED ON WAVELETS AND THINNING ALGORITHMS FOR DEGRADED CAMERA IMAGES , 2004 .

[45]  Andrew Zisserman,et al.  Super-resolution enhancement of text image sequences , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[46]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[47]  R. Mukundan,et al.  Discrete vs. Continuous Orthogonal Moments for Image Analysis , 2001 .

[48]  David L. Neuhoff,et al.  The Viterbi algorithm as an aid in text recognition (Corresp.) , 1975, IEEE Trans. Inf. Theory.

[49]  Michael Elad,et al.  Restoration of a single superresolution image from several blurred, noisy, and undersampled measured images , 1997, IEEE Trans. Image Process..

[50]  Michael Droettboom Correcting broken characters in the recognition of historical printed documents , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..

[51]  Bin Wang,et al.  Color text image binarization based on binary texture analysis , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[52]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[53]  Simon M. Lucas,et al.  Web-Based Deployment of Text Locating Algorithms , 2005 .

[54]  Lawrence O'Gorman,et al.  Document Image Analysis , 1996 .

[55]  Bernard Gosselin,et al.  Spatial and Color Spaces Combination for Natural Scene Text Extraction , 2006, 2006 International Conference on Image Processing.

[56]  Toru Wakahara,et al.  Segmentation and recognition of characters in scene images using selective binarization in color space and GAT correlation , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[57]  C. Garcia,et al.  Text detection and segmentation in complex color images , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[58]  G. W. Stewart,et al.  On the Early History of the Singular Value Decomposition , 1993, SIAM Rev..

[59]  Jun Li,et al.  Design and implementation of a card reader based on build-in camera , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[60]  Yasuaki Nakano,et al.  Segmentation methods for character recognition: from segmentation to document structure analysis , 1992, Proc. IEEE.

[61]  Matei Mancas,et al.  Camera-based degraded character segmentation into individual components , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[62]  A.W.M. Smeulders,et al.  An introduction to image processing , 1991 .

[63]  Supun Samarasekera,et al.  Super-fusion: a super-resolution method based on fusion , 2002, Object recognition supported by user interaction for service robots.

[64]  David Salesin,et al.  Image Analogies , 2001, SIGGRAPH.

[65]  Multitel Asbl Compilation de Règles de Réécriture en transducteurs à états finis , 2006 .

[66]  Rainer Lienhart,et al.  Localizing and segmenting text in images and videos , 2002, IEEE Trans. Circuits Syst. Video Technol..

[67]  Daniel P. Lopresti,et al.  Using Consensus Sequence Voting to Correct OCR Errors , 1997, Comput. Vis. Image Underst..

[68]  Bernard Gosselin,et al.  An Embedded Application for Degraded Text Recognition , 2005, EURASIP J. Adv. Signal Process..

[69]  Giovanni Ramponi,et al.  The rational filter for image smoothing , 1996, IEEE Signal Processing Letters.

[70]  Edward K. Wong,et al.  A new robust algorithm for video text extraction , 2003, Pattern Recognit..

[71]  Steven A. Shafer,et al.  Using color to separate reflection components , 1985 .

[72]  Robert C. Bolles,et al.  RECOGNITION OF TEXT IN 3-D SCENES , 2001 .

[73]  Jin Hyung Kim,et al.  An example-based prior model for text image super-resolution , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[74]  Eric Lecolinet,et al.  A Survey of Methods and Strategies in Character Segmentation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[75]  Akira Suzuki,et al.  Kanji recognition in scene images without detection of text fields - robust against variation of viewpoint, contrast, and background texture , 2004, ICPR 2004.

[76]  B. Gosselin,et al.  Combination of binarization and character segmentation using color information , 2004, Proceedings of the Fourth IEEE International Symposium on Signal Processing and Information Technology, 2004..

[77]  Kongqiao Wang,et al.  Character location in scene images from digital camera , 2003, Pattern Recognit..

[78]  David J. Crandall,et al.  Robust extraction of text in video , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[79]  D J Field,et al.  Relations between the statistics of natural images and the response properties of cortical cells. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[80]  Shinichiro Omachi,et al.  Isolated Character Recognition by Searching Features in Scene Images , 2005 .

[81]  Satoshi Naoi,et al.  Low resolution character recognition by dual eigenspace and synthetic degraded patterns , 2004, HDP '04.

[82]  Daniel P. Lopresti,et al.  Locating and Recognizing Text in WWW Images , 2000, Information Retrieval.

[83]  Arturo de la Escalera,et al.  A visual landmark recognition system for topological navigation of mobile robots , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[84]  David S. Doermann,et al.  Camera-based analysis of text and documents: a survey , 2005, International Journal of Document Analysis and Recognition (IJDAR).

[85]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[86]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[87]  Céline Mancas-Thillou,et al.  Embedded reading device for blind people: a user-centered design , 2004, 33rd Applied Imagery Pattern Recognition Workshop (AIPR'04).

[88]  Min C. Shin,et al.  Does colorspace transformation make any difference on skin detection? , 2002, Sixth IEEE Workshop on Applications of Computer Vision, 2002. (WACV 2002). Proceedings..

[89]  Majid Mirmehdi,et al.  Extracting Low Resolution Text with an Active Camera for OCR , 2001 .

[90]  Thomas S. Huang,et al.  Image processing , 1971 .

[91]  Bernard Gosselin,et al.  Mobile Reading Assistant for Blind People , 2004 .

[92]  André Marion,et al.  Introduction to Image Processing , 1990, Springer US.

[93]  In-Jung Kim,et al.  Multi-window binarization of camera image for document recognition , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[94]  Alex Waibel,et al.  Text Detection and Translation from Natural Scenes , 2001 .

[95]  Josef Kittler,et al.  Floating search methods for feature selection with nonmonotonic criterion functions , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[96]  A. G. Ramakrishnan,et al.  Text Localization and Extraction from Complex Color Images , 2005, ISVC.

[97]  Weiqiang Wang,et al.  A Robust Text Segmentation Approach in Complex Background Based on Multiple Constraints , 2005, PCM.

[98]  S. Lucas,et al.  ICDAR 2003 robust reading competitions: entries, results, and future directions , 2005, International Journal of Document Analysis and Recognition (IJDAR).

[99]  Akira Suzuki,et al.  Kanji recognition in scene images without detection of text fields - robust against variation of viewpoint, contrast, and background texture , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[100]  Anil K. Jain,et al.  Text information extraction in images and video: a survey , 2004, Pattern Recognit..

[101]  Stefano Messelodi,et al.  Automatic identification and skew estimation of text lines in real scene images , 1999, Pattern Recognition.

[102]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[103]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[104]  P.V.C. Hough,et al.  Machine Analysis of Bubble Chamber Pictures , 1959 .

[105]  Bernard Gosselin,et al.  A Prolongation-Based Approach for Recognizing Cut Characters , 2004, ICCVG.

[106]  H Stark,et al.  High-resolution image recovery from image-plane arrays, using convex projections. , 1989, Journal of the Optical Society of America. A, Optics and image science.

[107]  Luke Fletcher,et al.  Super-resolving Signs for Classification , 2004 .

[108]  Jean-Michel Jolion,et al.  Text localization, enhancement and binarization in multimedia documents , 2002, Object recognition supported by user interaction for service robots.

[109]  Nobuo Ezaki,et al.  Text detection from natural scene images: towards a system for visually impaired persons , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[110]  Nobuo Ezaki,et al.  Text detection from natural scene images: towards a system for visually impaired persons , 2004, ICPR 2004.

[111]  B. Freisleben,et al.  Finding text in images via local thresholding , 2003, Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology (IEEE Cat. No.03EX795).

[112]  Bülent Sankur,et al.  Survey over image thresholding techniques and quantitative performance evaluation , 2004, J. Electronic Imaging.

[113]  Sanjit K. Mitra,et al.  Nonlinear image processing , 2000 .

[114]  A. Bors,et al.  A variational approach for color image segmentation , 2004, ICPR 2004.

[115]  Bernard Gosselin,et al.  Character Segmentation-by-Recognition Using Log-Gabor Filters , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[116]  Michael Elad,et al.  Advances and challenges in super‐resolution , 2004, Int. J. Imaging Syst. Technol..

[117]  Larry S. Davis,et al.  A video-based framework for the analysis of presentations/posters , 2004, International Journal of Document Analysis and Recognition (IJDAR).

[118]  Maurizio Pilu,et al.  Building cameras for capturing documents , 2005, International Journal of Document Analysis and Recognition (IJDAR).

[119]  Xilin Chen,et al.  A robust approach for recognition of text embedded in natural scenes , 2002, Object recognition supported by user interaction for service robots.

[120]  D. Ruderman,et al.  Statistics of cone responses to natural images: implications for visual coding , 1998 .

[121]  Bernard Gosselin,et al.  Color binarization for complex camera-based images , 2005, IS&T/SPIE Electronic Imaging.

[122]  T. S. Huang,et al.  Advances in computer vision & image processing , 1988 .

[123]  Matei Mancas,et al.  Segmentation en caractères individuels dans des images de scènes naturelles , 2005 .

[124]  Katherine Donaldson,et al.  Bayesian Super-Resolution of Text in Video with a Text-Specific Bimodal Prior , 2005, CVPR.

[125]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[126]  Gaurav Sharma Digital Color Imaging Handbook , 2002 .

[127]  Abdel Belaïd,et al.  Neural based binarization techniques , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[128]  Kazem Taghva,et al.  OCRSpell: an interactive spelling correction system for OCR errors in text , 2001, International Journal on Document Analysis and Recognition.

[129]  Kazuhiko Yamamoto,et al.  Development of a guide dog system for the blind people with character recognition ability , 2004, ICPR 2004.

[130]  Yann LeCun,et al.  Djvu: Un systeme de compression d'images pour la distribution reticulaire de documents numerises (Djvu: An image compression system for distributing scanned document on the internet) , 2000 .

[131]  Mohamed S. Kamel,et al.  Extraction of Binary Character/Graphics Images from Grayscale Document Images , 1993, CVGIP Graph. Model. Image Process..

[133]  Brian A. Barsky,et al.  Advanced Renderman: Creating CGI for Motion Pictures , 1999 .

[134]  Bernard Gosselin,et al.  Sypole: A mobile assistant for the blind , 2005, 2005 13th European Signal Processing Conference.

[135]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[136]  Datong Chen,et al.  Text detection and recognition in images and video sequences , 2003 .

[137]  Hubert Emptoz,et al.  A Recursive Approach For Bleed-Through removal , 2005 .

[138]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[139]  Nicolas Vandenbroucke Segmentation d'images couleur par classification de pixels dans des espaces d'attributs colorimétriques adaptés : application à l'analyse d'images de football , 2000 .

[140]  Bernard Gosselin,et al.  Color text extraction with selective metric-based clustering , 2007, Comput. Vis. Image Underst..

[141]  Toru Wakahara,et al.  Determining optimal filters for binarization of degraded grayscale characters using genetic algorithms , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[142]  Katherine Donaldson,et al.  Bayesian super-resolution of text in videowith a text-specific bimodal prior , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[143]  Apostolos Antonacopoulos,et al.  Text extraction from Web images based on a split-and-merge segmentation method using colour perception , 2004, ICPR 2004.

[144]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[145]  Bui Tuong Phong Illuminat~on for computer generated images , 1973 .

[146]  Chein-I Chang,et al.  Unsupervised approach to color video thresholding , 2004 .

[147]  Dorin Comaniciu,et al.  Nonparametric robust methods for computer vision , 2000 .

[148]  Bin Wang Minimum entropy approach to word segmentation problems , 2000, physics/0008232.

[149]  Bernard Gosselin,et al.  From Picture to Speech: an Innovative Application for Embedded Environment , 2003 .

[150]  Chin-Teng Lin,et al.  Detection and compensation algorithm for backlight images with fuzzy logic and adaptive compensation curve , 2005, Int. J. Pattern Recognit. Artif. Intell..

[151]  Gunther Wyszecki,et al.  Color Science: Concepts and Methods, Quantitative Data and Formulae, 2nd Edition , 2000 .

[152]  Fernando Pereira,et al.  Weighted Automata in Text and Speech Processing , 2005, ArXiv.

[153]  Shmuel Peleg,et al.  Image sequence enhancement using sub-pixel displacements , 1988, Proceedings CVPR '88: The Computer Society Conference on Computer Vision and Pattern Recognition.

[154]  Andrew J. Patti,et al.  Super Resolution Video Reconstruction with Arbitrary Sampling Lattices and Non-zero Aperture Time , 1997 .

[155]  Bernard Gosselin,et al.  Color text extraction from camera-based images: the impact of the choice of the clustering distance , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[156]  K. Martin,et al.  Vector filtering for color imaging , 2005, IEEE Signal Processing Magazine.

[157]  Sang-Cheol Park,et al.  Text Locating from Natural Scene Images Using Image Intensitie , 2005, ICDAR.

[158]  A. K. Rigler,et al.  Accelerating the convergence of the back-propagation method , 1988, Biological Cybernetics.