Automatic Scribe Attribution for Medieval Manuscripts

We propose an automatic method for attributing manuscript pages to scribes. The system uses digital images as published by libraries. The attribution process involves extracting from each query page approximately letter-size components. This is done by means of binarization (ink-background separation), connected component labelling, and further segmentation, guided by the estimated typical stroke width. Components are extracted in the same way from the pages of known scribal origin. This allows us to assign a scribe to each query component by means of nearest-neighbour classification. Distance (dissimilarity) between components is modelled by simple features capturing the distribution of ink in the bounding box defined by the component, together with Euclidean distance. The set of component-level scribe attributions, which typically includes hundreds of components for a page, is then used to predict the page scribe by means of a voting procedure. The scribe who receives the largest number of votes from the 120 strongest component attributions is proposed as its scribe. The scribe attribution process allows the argument behind an attribution to be visualized for a human reader. The writing components of the query page are exhibited along with the matching components of the known pages. This report is thus open to inspection and analysis using the methods and intuitions of traditional palaeography. The present system was evaluated on a data set covering 46 medieval scribes, writing in Carolingian minuscule, Bastarda, and a few other scripts. The system achieved a mean top-1 accuracy of 98.3% as regards the first scribe proposed for each page, when the labelled data comprised one randomly selected page from each scribe and nine unseen pages for each scribe were to be attributed in the validation procedure. The experiment was repeated 50 times to even out random variation effects.

[1]  Richard Gran,et al.  On the Convergence of Random Search Algorithms In Continuous Time with Applications to Adaptive Control , 1970, IEEE Trans. Syst. Man Cybern..

[2]  Youbao Tang,et al.  Offline Text-Independent Writer Identification Based on Scale Invariant Feature Transform , 2014, IEEE Transactions on Information Forensics and Security.

[3]  B. Hannah,et al.  Kodikologie und Paläographie im digitalen Zeitalter 4 / Codicology and Palaeography in the Digital Age 4 , 2009 .

[4]  Mark Stansbury The Computer and the Classification of Script , 2009 .

[5]  Lambert Schomaker,et al.  Automatic writer identification using fragmented connected-component contours , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[6]  Van Cuong Kieu,et al.  ICDAR2017 Competition on the Classification of Medieval Handwritings in Latin Script , 2016, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[7]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[8]  Dominique Stutzmann,et al.  Clustering of medieval scripts through computer image analysis: Towards an evaluation protocol , 2016 .

[9]  Mats Dahllöf Scribe Attribution for Early Medieval Handwriting by Means of Letter Extraction and Classification and a Voting Procedure for Larger Pieces , 2014, 2014 22nd International Conference on Pattern Recognition.

[10]  Huu-Tuan Nguyen,et al.  Combining local features for gender classification , 2015, 2015 2nd National Foundation for Science and Technology Development Conference on Information and Computer Science (NICS).

[11]  Lambert Schomaker,et al.  Text-Independent Writer Identification and Verification Using Textural and Allographic Features , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Arianna Ciula,et al.  Digital palaeography: using the digital representation of medieval script to support palaeographic analysis , 2005 .

[13]  A. Brink Robust and applicable handwriting biometrics , 2011 .

[14]  Claudio De Stefano,et al.  A Method for Scribe Distinction in Medieval Manuscripts Using Page Layout Features , 2011, ICIAP.

[15]  Lambert Schomaker,et al.  Junction detection in handwritten documents and its application to writer identification , 2015, Pattern Recognit..

[16]  Lambert Schomaker,et al.  Co-occurrence Features for Writer Identification , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[17]  N. Dershowitz,et al.  Automatic Palaeographic Exploration ofGenizah Manuscripts , 2011 .

[18]  Basilios Gatos,et al.  ICDAR2017 Competition on Historical Document Writer Identification (Historical-WI) , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[19]  Youbao Tang,et al.  Offline text-independent writer identification using stroke fragment and contour based features , 2013, 2013 International Conference on Biometrics (ICB).

[20]  M. Aussems,et al.  Kodikologie und Paläographie im Digitalen Zeitalter , 2009 .