Retrieval of Handwritten Lines in Historical Documents

This study describes methods for the retrieval of handwritten lines of text in a historical administrative collection. The goal is to develop generic methods for bootstrapping the retrieval system from a tabula rasa starting condition, i.e., the virtual absence of labeled samples. By exploiting the currently available computing power and the fact that computation takes place off line, it should be possible to provide a good starting point for statistical learning methods. In this manner, a closed collection can be incrementally indexed. A cross-correlation method on line-strip images is presented and results are compared to feature-based methods.

[1]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[2]  R. Manmatha,et al.  Holistic word recognition for handwritten historical documents , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[3]  Lambert Schomaker,et al.  Automatic writer identification using connected-component contours and edge-based features of uppercase Western script , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Lambert Schomaker,et al.  Automatic writer identification using fragmented connected-component contours , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[5]  Ehud Rivlin,et al.  Offline cursive script word recognition – a survey , 1999, International Journal on Document Analysis and Recognition.

[6]  R. Manmatha,et al.  A search engine for historical manuscript images , 2004, SIGIR '04.

[7]  Lambert Schomaker,et al.  Using codebooks of fragmented connected-component contours in forensic and historic writer identification , 2007, Pattern Recognit. Lett..