Collecting historical font metrics from Google Books

A system is presented for extracting key metrics from fonts used in historical documents. The system identifies important landmarks on a page, such as margins, paragraphs, and lines, and applies frequency analysis techniques to identify relevant sizes. The system was validated by comparing its measurements to the measurements of a human expert on randomly selected samples, and differed on average from the expert by less than 5% for x-height, body size, and line spacing metrics.

[1]  George Nagy,et al.  Twenty Years of Document Image Analysis in PAMI , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  John M. Foley,et al.  Psychophysics of Reading in Normal and Low Vision , 2008 .

[3]  A. Lawrence Spitz Shape-based word recognition , 1999, International Journal on Document Analysis and Recognition.

[4]  Björn-Olav Dozo,et al.  Quantitative Analysis of Culture Using Millions of Digitized Books , 2010 .

[5]  Georgi Gluhchev,et al.  Handwritten document image segmentation and analysis , 1993, Pattern Recognit. Lett..

[6]  C. Clausner,et al.  Historical Document Layout Analysis Competition , 2011, 2011 International Conference on Document Analysis and Recognition.

[7]  Rolf Ingold,et al.  Optical Font Recognition Using Typographical Features , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Hendrik D. L. Vervliet French Renaissance printing types : a conspectus , 2010 .

[9]  George Nagy,et al.  HIERARCHICAL REPRESENTATION OF OPTICALLY SCANNED DOCUMENTS , 1984 .

[10]  Jihad El-Sana,et al.  Text line segmentation for gray scale historical document images , 2011, HIP '11.

[11]  Syed Saqib Bukhari,et al.  Text-Line Extraction Using a Convolution of Isotropic Gaussian Filter with a Set of Line Filters , 2011, 2011 International Conference on Document Analysis and Recognition.

[12]  Laurence Likforman-Sulem,et al.  Text line segmentation of historical documents: a survey , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[13]  Gordon E Legge,et al.  Does print size matter for reading? A review of findings from vision science and typography. , 2011, Journal of vision.