Character-Based Automated Human Perception Quality Assessment in Document Images

Large degradations in document images impede their readability and deteriorate the performance of automated document processing systems. Document image quality (IQ) metrics have been defined through optical character recognition (OCR) accuracy. Such metrics, however, do not always correlate with human perception of IQ. When enhancing document images with the goal of improving readability, e.g., in historical documents where OCR performance is low and/or where it is necessary to preserve the original context, it is important to understand human perception of quality. The goal of this paper is to design a system that enables the learning and estimation of human perception of document IQ. Such a metric can be used to compare existing document enhancement methods and guide automated document enhancement. Moreover, the proposed methodology is designed as a general framework that can be applied in a wide range of applications.

[1]  Weisi Lin,et al.  Objective Image Quality Assessment Based on Support Vector Regression , 2010, IEEE Transactions on Neural Networks.

[2]  Efstathios Stamatatos,et al.  Improving the quality of degraded document images , 2006, Second International Conference on Document Image Analysis for Libraries (DIAL'06).

[3]  Its'hak Dinstein,et al.  Regulated morphological operations , 1999, Pattern Recognit..

[4]  Ophir Frieder,et al.  Interactive degraded document enhancement and ground truth generation , 2008, Electronic Imaging.

[5]  Alan C. Bovik,et al.  A Two-Step Framework for Constructing Blind Image Quality Indices , 2010, IEEE Signal Processing Letters.

[6]  Steven J. Simske,et al.  An optical character recognition approach to qualifying thresholding algorithms , 2008, DocEng '08.

[7]  Henry S. Baird,et al.  Document image defect models and their uses , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[8]  Apostolos Antonacopoulos,et al.  A Complete Approach to the Conversion of Typewritten Historical Documents for Digital Archives , 2004, Document Analysis Systems.

[9]  Henry S. Baird,et al.  Using synthetic data safely in classification , 2009, Electronic Imaging.

[10]  Wei-Ying Ma,et al.  Learning No-Reference Quality Metric by Examples , 2005, 11th International Multimedia Modelling Conference.

[11]  Peter Zolliker,et al.  Web-based psychometric evaluation of image quality , 2009, Electronic Imaging.

[12]  C. Hale,et al.  Human Image Preference and Document Degradation Models , 2007 .

[13]  Andy C. Downton,et al.  A comparison of binarization methods for historical archive documents , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[14]  R. Smith,et al.  An Overview of the Tesseract OCR Engine , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[15]  Ophir Frieder,et al.  Ensemble LUT classification for degraded document enhancement , 2008, Electronic Imaging.

[16]  Linda G. Shapiro,et al.  Computer Vision , 2001 .

[17]  Yuttapong Rangsanseri,et al.  Removing salt-and-pepper noise in text/graphics images , 1998, IEEE. APCCAS 1998. 1998 IEEE Asia-Pacific Conference on Circuits and Systems. Microelectronics and Integrating Systems. Proceedings (Cat. No.98EX242).

[18]  Patrick Kelly,et al.  Quality assessment and restoration of typewritten document images , 1999, International Journal on Document Analysis and Recognition.

[19]  Ophir Frieder,et al.  Evaluation of human perception of degradation in document images , 2010, Electronic Imaging.

[20]  A. Beghdadi,et al.  Image quality assessment using a neural network approach , 2004, Proceedings of the Fourth IEEE International Symposium on Signal Processing and Information Technology, 2004..

[21]  Bülent Sankur,et al.  Survey over image thresholding techniques and quantitative performance evaluation , 2004, J. Electronic Imaging.

[22]  Ioannis Pratikakis,et al.  Adaptive degraded document image binarization , 2006, Pattern Recognit..

[23]  Elisa H. Barney Smith,et al.  Correlating degradation models and image quality metrics , 2008, Electronic Imaging.

[24]  Maeng-Sub Cho,et al.  Experimental Approach for Human Perception Based Image Quality Assessment , 2006, ICEC.

[25]  S.N. Srihari,et al.  Image quality and readability , 1995, Proceedings., International Conference on Image Processing.

[26]  Edul N. Dalal,et al.  INCITS W1.1 standards for perceptual evaluation of text and line quality , 2009, Electronic Imaging.

[27]  P. Stubberud,et al.  Adaptive image restoration of text images that contain touching or broken characters , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[28]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[29]  Thomas A. Nartker,et al.  Prediction of OCR accuracy using simple image features , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.