Content-Based Old Documents Indexing

A huge amount of printed documents were published and distributed during the 15th century. In order to protect this inheritance, a digitalization campaign is held on these documents. The mass of documents generated by digitization create a problem to retrieve and to index them. This paper presents a french collaborative project that brings together seven laboratories around ancient documents indexing by content. This project is composed of two steps: 1) Extraction of information from images and 2) Retrieving documents with similar semantic interpretation for user. Two actors of this project present in this paper how they combine their works and their goal.

[1]  Nicole Vincent,et al.  Graphical Drop Caps Indexing , 2009, GREC.

[2]  Daniel P. Lopresti,et al.  A fast technique for comparing graph representations with applications to performance evaluation , 2003, Document Analysis and Recognition.

[3]  Godfried T. Toussaint,et al.  Relative neighborhood graphs and their relatives , 1992, Proc. IEEE.

[4]  Salvatore Tabbone,et al.  Applications des graphes en traitement d'images , 2008 .

[5]  Nicole Vincent,et al.  Use of power law models in detecting region of interest , 2007, Pattern Recognit..

[6]  Antonin Chambolle,et al.  Image Decomposition into a Bounded Variation Component and an Oscillating Component , 2005, Journal of Mathematical Imaging and Vision.

[7]  Nicole Vincent,et al.  Feature selection combining genetic algorithm and Adaboost classifiers , 2008, 2008 19th International Conference on Pattern Recognition.

[8]  Jean-Yves Ramel,et al.  Vector Representation of Graphs: Application to the Classification of Symbols and Letters , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[9]  Nicole Vincent,et al.  Ancient Initial Letters Indexing , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[10]  Nicole Vincent,et al.  Drop Caps Decomposition for Indexing a New Letter Extraction Method , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[11]  Jean-Marc Ogier,et al.  Segmentation and Retrieval of Ancient Graphic Documents , 2005, GREC.

[12]  Sébastien Adam,et al.  Clustering document images using a bag of symbols representation , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[13]  Michel Ménard,et al.  Adding a Noise Component To A Color Decomposition Model For Improving Color Texture Extraction , 2008, CGIV/MCS.

[14]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.