Statistical image differences, degradation features, and character distance metrics

Abstract.Document image quality is degraded through processes such as scanning, printing, and photocopying. The resulting bilevel image degradations can be categorized based either on observable degradation features or on degradation model parameters. The image degradation features can be related mathematically to model parameters. In this paper we statistically compare pairs of populations of degraded character images created with different model parameters. The probability that the character populations were degraded by the same model parameters correlates with the relationship between observable degradation features and the model parameters. Two metrics of character difference are used: Hamming distance and moment feature distance. Knowledge about the conditions under which characters will be similar and when they will be different can influence the choice of parameters for future experiments.

[1]  Barney Smith,et al.  Bilevel Image Degradations: Effects and Estimation , 2001 .

[2]  Kazuhiko Yamamoto,et al.  Structured Document Image Analysis , 1992, Springer Berlin Heidelberg.

[3]  Tamás Szirányi,et al.  Overall picture degradation error for scanned images and the efficiency of character recognition , 1991 .

[4]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[5]  Robert M. Haralick,et al.  A Statistical, Nonparametric Methodology for Document Degradation Model Validation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Elisa H. Barney Smith Scanner parameter estimation using bilevel scans of star charts , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[7]  Tin Kam Ho,et al.  Large-Scale Simulation Studies in Image Pattern Recognition , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Elisa H. Barney Smith Characterization of image degradation caused by scanning , 1998, Pattern Recognit. Lett..

[9]  W. R. Throssell,et al.  The measurement of print quality for optical character recognition systems , 1974, Pattern Recognit..

[10]  Elisa H. Barney Smith Estimating scanning characteristics from corners in bilevel images , 2001, Document Recognition and Retrieval.

[11]  Minghua Chen,et al.  Sampling and quantization of bilevel signals , 1993, Pattern Recognit. Lett..

[12]  Daniel P. Lopresti,et al.  Spatial Sampling of Printed Patterns , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Henry S. Baird,et al.  Document image defect models , 1995 .

[14]  Thomas A. Nartker,et al.  Prediction of OCR accuracy using simple image features , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.