A Statistical, Nonparametric Methodology for Document Degradation Model Validation

Printing, photocopying, and scanning processes degrade the image quality of a document. Statistical models of these degradation processes are crucial for document image understanding research. In this paper, we present a statistical methodology that can be used to validate local degradation models. This method is based on a nonparametric, two-sample permutation test. Another standard statistical device, the power function, is then used to choose between algorithm variables such as distance functions. Since the validation and the power function procedures are independent of the model, they can be used to validate any other degradation model. A method for comparing any two models is also described. It uses p-values associated with the estimated models to select the model that is closer to the real world.

[1]  Daniel P. Lopresti,et al.  Validation of Image Defect Models for Optical Character Recognition , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Robert M. Haralick,et al.  Global and local document degradation models , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[3]  Emanuele Trucco,et al.  Computer and Robot Vision , 1995 .

[4]  Robert M. Haralick,et al.  Estimation of morphological degradation parameters , 1995, Electronic Imaging.

[5]  Robert M. Haralick,et al.  Nonlinear global and local document degradation models , 1994, Int. J. Imaging Syst. Technol..

[6]  Tapas Kanungo,et al.  Document degradation models and a methodology for degradation model validation , 1996 .

[7]  Robert M. Haralick,et al.  A quantitative methodology for analyzing the performance of detection algorithms , 1993, 1993 (4th) International Conference on Computer Vision.

[8]  Donald E. Knuth,et al.  TeX: The Program , 1986 .

[9]  P. Good,et al.  Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses , 1995 .

[10]  Gunilla Borgefors,et al.  Distance transformations in digital images , 1986, Comput. Vis. Graph. Image Process..

[11]  R. F.,et al.  Mathematical Statistics , 1944, Nature.

[12]  Leslie Lamport,et al.  Latex : A Document Preparation System , 1985 .

[13]  Henry S. Baird,et al.  Document image defect models , 1995 .

[14]  Robert M. Haralick,et al.  A methodology for quantitative performance evaluation of detection algorithms , 1995, IEEE Trans. Image Process..

[15]  Robert M. Haralick,et al.  Document Degradation Models: Parameter Estimation and Model Validation , 1994, MVA.

[16]  D. T. Wang,et al.  Structured Document Image Analysis, IAPR Workshop on Syntactic and Structural Pattern Recognition, 13-15 June 1990, Murray Hill, NJ, USA , 1990 .

[17]  B. M. Fulk MATH , 1992 .

[18]  Kazuhiko Yamamoto,et al.  Structured Document Image Analysis , 1992, Springer Berlin Heidelberg.

[19]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.