Evaluation of SVM, MLP and GMM Classifiers for Layout Analysis of Historical Documents

This paper presents a comparison between three classifiers based on Support Vector Machines, Multi-Layer Perceptrons and Gaussian Mixture Models respectively to detect physical structure of historical documents. Each classifier segments a scaled image of historical document into four classes, i.e., areas of periphery, background, text and decoration. We evaluate them on three data sets of historical documents. Depending on data sets, the best classification rates obtained vary from 90.35% to 97.47%.

[1]  Angelika Garz,et al.  Binarization-Free Text Line Segmentation for Historical Documents Based on Interest Point Clustering , 2012, 2012 10th IAPR International Workshop on Document Analysis Systems.

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Jean-Yves Ramel,et al.  AGORA: the interactive document image analysis tool of the BVH project , 2006, Second International Conference on Document Image Analysis for Libraries (DIAL'06).

[4]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[5]  C. Clausner,et al.  Historical Document Layout Analysis Competition , 2011, 2011 International Conference on Document Analysis and Recognition.

[6]  Jean-Yves Ramel,et al.  User-driven page layout analysis of historical printed books , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[7]  Laurence Likforman-Sulem,et al.  Text line segmentation of historical documents: a survey , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[8]  Shigeo Abe Support Vector Machines for Pattern Classification , 2010, Advances in Pattern Recognition.

[9]  Carl G. Looney,et al.  Pattern recognition using neural networks: theory and algorithms for engineers and scientists , 1997 .

[10]  Rolf Ingold,et al.  Multi Resolution Layout Analysis of Medieval Manuscripts Using Dynamic MLP , 2011, 2011 International Conference on Document Analysis and Recognition.

[11]  Jean-Luc Bloechle,et al.  Semi-automatic Annotation Tool for Medieval Manuscripts , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[12]  Jean-Yves Ramel,et al.  A Proposition of Retrieval Tools for Historical Document Images Libraries , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[13]  Douglas A. Reynolds Gaussian Mixture Models , 2009, Encyclopedia of Biometrics.

[14]  Frank Lebourgeois,et al.  DEBORA: Digital AccEss to BOoks of the RenAissance , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[15]  Simone Calderara,et al.  "Inside the bible": segmentation, annotation and retrieval for a new browsing experience , 2008, MIR '08.

[16]  Andreas Keller,et al.  Lexicon-free handwritten word spotting using character HMMs , 2012, Pattern Recognit. Lett..

[17]  Adel M. Alimi,et al.  A study on font-family and font-size recognition applied to Arabic word images at ultra-low resolution , 2013, Pattern Recognit. Lett..