A Study of the Variability of Very Low Resolution Characters and the Feasibility of Their Discrimination Using Geometrical Features

Current OCR technology does not allow to accurately recognizing small text images, such as those found in web images. Our goal is to investigate new approaches to recognize very low resolution text images containing antialiased character shapes. This paper presents a preliminary study on the variability of such characters and the feasibility to discriminate them by using geometrical features. In a first stage we analyze the distribution of these features. In a second stage we present a study on the discriminative power for recognizing isolated characters, using various rendering methods and font properties. Finally we present interesting results of our evaluation tests leading to our conclusion and future focus. Keywords—World Wide Web, document analysis, pattern recognition, Optical Character Recognition.

[1]  Ieee Xplore,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  George Nagy,et al.  Twenty Years of Document Image Analysis in PAMI , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Daniel P. Lopresti,et al.  Extracting text from WWW images , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[4]  Daniel P. Lopresti,et al.  Document Analysis and the World Wide Web , 1996, DAS.

[5]  Daniel P. Lopresti,et al.  OCR for World Wide Web images , 1997, Electronic Imaging.

[6]  Apostolos Antonacopoulos,et al.  Text Extraction from Web Images Based on Human Perception and Fuzzy Inference , 2001 .

[7]  Apostolos Antonacopoulos,et al.  An Anthropocentric Approach to Text Extraction from WWW Images , 2000 .

[8]  Andreas Dengel,et al.  International Association for Pattern Recognition Workshop on Document Analysis Systems , 1995 .

[9]  Rolf Ingold,et al.  Optical Font Recognition Using Typographical Features , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Apostolos Antonacopoulos,et al.  Text extraction from Web images based on a split-and-merge segmentation method using colour perception , 2004, ICPR 2004.

[11]  Apostolos Antonacopoulos,et al.  Accessing textual information embedded in Internet images , 2000, IS&T/SPIE Electronic Imaging.

[12]  Daniel P. Lopresti,et al.  Locating and Recognizing Text in WWW Images , 2000, Information Retrieval.