Image-based historical manuscript dating using contour and stroke fragments

Historical manuscript dating has always been an important challenge for historians but since countless manuscripts have become digitally available recently, the pattern recognition community has started addressing the dating problem as well. In this paper, we present a family of local contour fragments (kCF) and stroke fragments (kSF) features and study their application to historical document dating. kCF are formed by a number of k primary contour fragments segmented from the connected component contours of handwritten texts and kSF are formed by a segment of length k of a stroke fragment graph. The kCF and kSF are described by scale and rotation invariant descriptors and encoded into trained codebooks inspired by classical bag of words model. We evaluate our methods on the Medieval Paleographical Scale (MPS) data set and perform dating by writer identification and classification. As far as dating by writer identification is concerned, we arrive at the conclusion that features which perform well for writer identification are not necessarily suitable for historical document dating. Experimental results of dating by classification demonstrate that a combination of kCF and kSF achieves optimal results, with a mean absolute error of 14.9years when excluding writer duplicates in training and 7.9years when including writer duplicates in training. HighlightsA new image-based historical manuscript dating problem is proposed.We present a family of local contour fragments and stroke fragments features.Historical manuscript dating is performed by writer identification and classification.

[1]  Josep Lladós,et al.  Efficient segmentation-free keyword spotting in historical document collections , 2015, Pattern Recognit..

[2]  Lambert Schomaker,et al.  Towards Style-Based Dating of Historical Documents , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[3]  Hans-Leo Teulings A Handwriting Recognition System Based on Properties of the Human Motor System , 1990 .

[4]  Cong Zhao,et al.  Plant identification using leaf shapes - A pattern counting approach , 2015, Pattern Recognit..

[5]  Yun Fu,et al.  Image-Based Human Age Estimation by Manifold Learning and Locally Adjusted Robust Regression , 2008, IEEE Transactions on Image Processing.

[6]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Lambert Schomaker,et al.  Using stroke- or character-based self-organizing maps in the recognition of on-line, connected cursive script , 1993, Pattern Recognit..

[8]  Lewis D. Griffin,et al.  Writer identification using oriented Basic Image Features and the Delta encoding , 2014, Pattern Recognit..

[9]  Raúl Rojas,et al.  Transition pixel: A concept for binarization based on edge detection and gray-intensity histograms , 2010, Pattern Recognit..

[10]  M. Pauline Baker,et al.  Computer Graphics , 1986, Springer Japan.

[11]  Reza Safabakhsh,et al.  Offline text-independent writer identification using codebook and efficient code extraction methods , 2013, Image Vis. Comput..

[12]  ZhouZhi-Hua,et al.  Automatic Age Estimation Based on Facial Aging Patterns , 2007 .

[13]  Laxmi Parida,et al.  Junctions: Detection, Classification, and Reconstruction , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Luc Vincent,et al.  Google Book Search: Document Understanding on a Massive Scale , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[15]  Jean-Paul van Oosten,et al.  Separability versus prototypicality in handwritten word-image retrieval , 2014, Pattern Recognit..

[16]  Yi Li,et al.  Language identification for handwritten document images using a shape codebook , 2009, Pattern Recognit..

[17]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[18]  Luc Van Gool,et al.  Object Detection by Contour Segment Networks , 2006, ECCV.

[19]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[20]  Makoto Yasuhara,et al.  Recovery of Drawing Order from Single-Stroke Handwriting Images , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Dimitris Arabadjis,et al.  Identifying the writer of ancient inscriptions and Byzantine codices. A novel approach , 2014, Comput. Vis. Image Underst..

[22]  Lambert Schomaker,et al.  Delta-n Hinge: Rotation-Invariant Features for Writer Identification , 2014, 2014 22nd International Conference on Pattern Recognition.

[23]  Longin Jan Latecki,et al.  Convexity Rule for Shape Decomposition Based on Discrete Contour Evolution , 1999, Comput. Vis. Image Underst..

[24]  Alexei A. Efros,et al.  Dating Historical Color Images , 2012, ECCV.

[25]  Constantin Papaodysseus,et al.  Automatic Writer Identification of Ancient Greek Inscriptions , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  M. Pauline Baker,et al.  Computer Graphics, C Version , 1996 .

[27]  Jian Sun,et al.  Guided Image Filtering , 2010, ECCV.

[28]  Mohamed Cheriet,et al.  A multi-scale framework for adaptive binarization of degraded document images , 2010, Pattern Recognit..

[29]  Mohsen Ebrahimi Moghaddam,et al.  A text-independent Persian writer identification based on feature relation graph (FRG) , 2010, Pattern Recognit..

[30]  Anders Brun,et al.  Large scale style based dating of medieval manuscripts , 2015, HIP@ICDAR.

[31]  Lambert Schomaker,et al.  Automatic writer identification using connected-component contours and edge-based features of uppercase Western script , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Lambert Schomaker,et al.  A comparison of clustering methods for writer identification and verification , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[33]  Nicole Vincent,et al.  Text independent writer recognition using redundant writing patterns with contour-based orientation and curvature features , 2010, Pattern Recognit..

[34]  Lambert Schomaker,et al.  Text-Independent Writer Identification and Verification Using Textural and Allographic Features , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Lambert Schomaker,et al.  A Polar Stroke Descriptor for classification of historical documents , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[36]  Yong Jae Lee,et al.  Style-Aware Mid-level Representation for Discovering Visual Connections in Space and Time , 2013, 2013 IEEE International Conference on Computer Vision.

[37]  Lambert Schomaker,et al.  Handwritten-Word Spotting Using Biologically Inspired Features , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[39]  Mohamed Cheriet,et al.  AdOtsu: An adaptive and parameterless generalization of Otsu's method for document image binarization , 2012, Pattern Recognit..

[40]  Ioannis Pratikakis,et al.  Adaptive degraded document image binarization , 2006, Pattern Recognit..

[41]  Dimitris Arabadjis,et al.  New mathematical and algorithmic schemes for pattern classification with application to the identification of writers of important ancient documents , 2013, Pattern Recognit..

[42]  Louis Vuurpijl,et al.  Writer identification using edge-based directional features , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[43]  Maher Khemakhem,et al.  A model-based approach to offline text-independent Arabic writer identification and verification , 2015, Pattern Recognit..

[44]  Gernot A. Fink,et al.  Semi-supervised learning for character recognition in historical archive documents , 2014, Pattern Recognit..

[45]  R. Manmatha,et al.  Finding words in alphabet soup: Inference on freeform character recognition for historical scripts , 2009, Pattern Recognit..

[46]  Zhi-Hua Zhou,et al.  Automatic Age Estimation Based on Facial Aging Patterns , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[48]  Ching Y. Suen,et al.  Identification of Fork Points on the Skeletons of Handwritten Chinese Characters , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[49]  Lambert Schomaker,et al.  Junction detection in handwritten documents and its application to writer identification , 2015, Pattern Recognit..

[50]  Tieniu Tan,et al.  Personal identification based on handwriting , 2000, Pattern Recognit..

[51]  Lambert Schomaker,et al.  Writer identification using directional ink-trace width measurements , 2012, Pattern Recognit..

[52]  Nicholas R. Howe,et al.  A Character Style Library for Syriac Manuscripts , 2015, HIP@ICDAR.

[53]  Frédéric Jurie,et al.  Groups of Adjacent Contour Segments for Object Detection , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[54]  Dmitriy Genzel,et al.  Publication Date Estimation for Printed Historical Documents using Convolutional Neural Networks , 2015, HIP@ICDAR.

[55]  Raúl Rojas,et al.  An analysis of the transition proportion for binarization in handwritten historical documents , 2014, Pattern Recognit..

[56]  Wenyu Liu,et al.  Bag of contour fragments for robust shape classification , 2014, Pattern Recognit..