A color-based layout analysis to process censorship cards of film archives

Processing censorship cards of the 20/sup th/ century in order to support annotation and retrieval processes, leads to a number of challenges for many DIA systems. Problems due to the low layout quality and standard of such a material can be reduced by exploiting information conveyed by color. In this paper, taking into account lessons learned in the context of the 1ST project Collate, we propose a new method for image segmentation and layout analysis that takes full advantage of color information. The method has been implemented in the DIA system WISDOM++ and tested on a corpus of multi-format documents concerning historic film censorships.

[1]  Friedrich M. Wahl,et al.  Document Analysis System , 1982, IBM J. Res. Dev..

[2]  Donato Malerba,et al.  Transforming paper documents into XML format with WISDOM++ , 2001, International Journal on Document Analysis and Recognition.

[3]  Lawrence O'Gorman,et al.  Document Image Analysis , 1996 .

[4]  Henryk Krawczyk,et al.  The lifecycle of a digital historical document: structure and content , 2004, DocEng '04.

[5]  Apostolos Antonacopoulos,et al.  Document image analysis for World War II personal records , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[6]  Ioannis Pratikakis,et al.  A Segmentation-Free Recognition Technique to Assist Old Greek Handwritten Manuscript OCR , 2004, Document Analysis Systems.

[7]  Frank Lebourgeois,et al.  Automatic Metadata Retrieval from Ancient Manuscripts , 2004, Document Analysis Systems.

[8]  Thierry Paquet,et al.  Enriching Historical Manuscripts: The Bovary Project , 2004, Document Analysis Systems.

[9]  Andy C. Downton,et al.  Configurable Text Stamp Identification Tool with Application of Fuzzy Logic , 2004, Document Analysis Systems.

[10]  Ching Y. Suen,et al.  Color segmentation for text extraction , 2003, Document Analysis and Recognition.

[11]  Michael Gervautz,et al.  A simple method for color quantization: octree quantization , 1990 .

[12]  Lawrence O. Hall,et al.  Text extraction from color documents-clustering approaches in three and four dimensions , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[13]  Jing Li Wang,et al.  Color image segmentation: advances and prospects , 2001, Pattern Recognit..

[14]  Sargur N. Srihari,et al.  Document Image Analysis and Recognition , 1992 .

[15]  Michelangelo Ceci,et al.  Document-Centered Collaboration for Scholars in the Humanities - The COLLATE System , 2003, ECDL.