Handwritten document age classification based on handwriting styles

Handwriting styles are constantly changing over time. We approach the novel problem of estimating the approximate age of Historical Handwritten Documents using Handwriting styles. This system will have many applications in handwritten document processing engines where specialized processing techniques can be applied based on the estimated age of the document. We propose to learn a distribution over styles across centuries using Topic Models and to apply a classifier over weights learned in order to estimate the approximate age of the documents. We present a comparison of different distance metrics such as Euclidean Distance and Hellinger Distance within this application.

[1]  Rafael Dueire Lins,et al.  Generation of images of historical documents by composition , 2002, DocEng '02.

[2]  R. Manmatha,et al.  Features for word spotting in historical manuscripts , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[3]  R. Manmatha,et al.  Word spotting for historical documents , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[4]  Ioannis Pratikakis,et al.  An Adaptive Binarization Technique for Low Quality Historical Documents , 2004, Document Analysis Systems.

[5]  Lambert Schomaker,et al.  Text-Independent Writer Identification and Verification Using Textural and Allographic Features , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Venu Govindaraju,et al.  Fast handwriting recognition for indexing historical documents , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[7]  Venu Govindaraju,et al.  Historical document image enhancement using background light intensity normalization , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[8]  Alan F. Smeaton,et al.  Word matching using single closed contours for indexing handwritten historical documents , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[9]  Manfred Anders,et al.  Nondestructive analysis and dating of historical paper based on IR spectroscopy and chemometric data evaluation. , 2007, Analytical chemistry.

[10]  Bin Zhang,et al.  Transcript mapping for historic handwritten document images , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[11]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[12]  R. Manmatha,et al.  A search engine for historical manuscript images , 2004, SIGIR '04.

[13]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.