Automated Image Segmentation Methods for Digitally-Assisted Palaeography of Medieval Manuscripts

We explore methods of automating the digital palaeographic process, using a divide and conquer approach. Firstly, image noise is reduced using a combination of colour removal, and varied blurring and thresholding techniques. Initial values for these processes are calculated by the system based on the average greyscale colour of the image upon initial importation. By combining these algorithms, the system is able to achieve high levels of noise reduction. The process of segmenting the script into letters is also divided. First, blocks of text are detected in the noise-reduced image, by measuring the proportion of black pixels within predefined sized blocks of pixels, comparing these values to the average colour values of not only the entire image, but the surrounding blocks (minimising false positive rates). These blocks of text are split into individual lines through detection of whitespace, and then further segmented into individual letters, through a similar technique. In order to verify the integrity of the letters, the sizing of each segment is compared to the letter average (since most letters within manuscripts are of a similar width). Any letters excessively differential to this average, are then re-checked, by re-performing the segmentation algorithms in these specific locations with thresholding set to both lighter and darker levels. The results of these segmentations are then merged, with each box finally being expanded to fit the letter more precisely.

[1]  Ching Y. Suen,et al.  A recursive thresholding technique for image segmentation , 1998, IEEE Trans. Image Process..

[2]  D Marr,et al.  Theory of edge detection , 1979, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[3]  Hamid R. Tizhoosh,et al.  Image thresholding using type II fuzzy sets , 2005, Pattern Recognit..

[4]  Mary Inaba,et al.  Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering: (extended abstract) , 1994, SCG '94.

[5]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[6]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[7]  Utpal Garain,et al.  Machine Dating of Handwritten Manuscripts , 2007 .

[8]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  N. Ayache,et al.  Fully automatic anatomical, pathological, and functional segmentation from CT scans for hepatic surgery , 2001 .

[10]  Peter Stokes Palaeography and Image-Processing: Some Solutions and Problems , 2007 .

[11]  S. Gull,et al.  Image reconstruction from incomplete and noisy data , 1978, Nature.

[12]  Sankar K. Pal,et al.  A review on image segmentation techniques , 1993, Pattern Recognit..

[13]  Jong-Sen Lee,et al.  Digital image smoothing and the sigma filter , 1983, Comput. Vis. Graph. Image Process..

[14]  Leonard E. Boyle,et al.  Medieval Latin palaeography: A bibliographical introduction , 1984 .