Low-resolution photo/drawing classification: metrics, method and archiving optimization

Archiving and re-purposing are automated using zoning analysis that performs segmentation (region boundary definition), classification (region typing) and bit-depth determination. For performance throughput reasons, zoning analysis is often performed on a low-resolution (e.g. 50-100 ppi) representation of the document. At these resolutions, heuristic metrics for classification are required. Reported here are metrics for distinguishing photos and color drawings, and a novel classification technique based solely on the statistics of each heuristic metric. The statistical technique allows ready combination of multiple binary classifiers, and provides a lower classification error than simple voting or metric-confidence techniques. This technique permits new metrics to improve the overall classification. The benefit of this technique on archival optimization is shown.

[1]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[2]  Steven J. Simske,et al.  A ground-truthing engine for proofsetting, publishing, re-purposing and quality assurance , 2003, DocEng '03.

[3]  Rob Day,et al.  Adobe Photoshopデザイナーズ・バイブル , 1996 .

[4]  Friedrich M. Wahl,et al.  Block segmentation and text extraction in mixed text/image documents , 1982, Comput. Graph. Image Process..

[5]  Steven J. Simske,et al.  Creating digital libraries: content generation and re-mastering , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[6]  Steven J. Simske,et al.  Performance analysis of pattern classifier combination by plurality voting , 2003, Pattern Recognit. Lett..

[7]  Anil K. Jain,et al.  Online handwritten script recognition , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.