Document image defect models and their uses

The accuracy of today's document recognition algorithms falls abruptly when image quality degrades even slightly. In an effort to surmount this barrier, researchers have in recent years intensified their study of explicit, quantitative, parameterized models of the image defects that occur during printing and scanning. The author reviews the recent literature and discusses the form these models might take. A preview of a large public-domain database of character images, labeled with ground-truth including all defect model parameters, is given. The use of massive pseudo-randomly generated training sets for the construction of high-performance decision trees for preclassification is described. In a more theoretical vein, the author reports preliminary results in the estimation of the intrinsic error of precisely-specified text recognition problems. Finally, the author calls attention to some open problems.<<ETX>>