论文信息 - OCR of handwritten transcriptions of Ancient Egyptian hieroglyphic text

OCR of handwritten transcriptions of Ancient Egyptian hieroglyphic text

Encoding hieroglyphic texts is time-consuming. If a text already exists as hand-written transcription, there is an alternative, namely OCR. Off-the-shelf OCR systems seem difficult to adapt to the peculiarities of Ancient Egyptian. Presented is a proof-of-concept tool that was designed to digitize texts of Urkunden IV in the hand-writing of Kurt Sethe. It automatically recognizes signs and produces a normalized encoding, suitable for storage in a database, or for printing on a screen or on paper, requiring little manual correction. The encoding of hieroglyphic text is RES (Revised Encoding Scheme) rather than (common dialects of) MdC (Manuel de Codage). Earlier papers argued against MdC and in favour of RES for corpus development. Arguments in favour of RES include longevity of the encoding, as its semantics are font-independent. The present study provides evidence that RES is also much preferable to MdC in the context of OCR. With a well-understood parsing technique, relative positioning of scanned signs can be straightforwardly mapped to suitable primitives of the encoding.

Mark-Jan Nederhof | M. Nederhof

[1] K. Sethe. ["Urkunden der 18. Dynastie, I"] , 1914 .

[2] M. Nederhof. The Manuel de Codage encoding of hieroglyphs impedes development of corpora , 2012 .

[3] Mark-Jan Nederhof. Automatic Creation of Interlinear Text for Philological Purposes , 2009, Trait. Autom. des Langues.

[4] Hermann Ney,et al. Deformation Models for Image Recognition , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5] Joan-Andreu Sánchez,et al. Recognition of on-line handwritten mathematical expressions using 2D stochastic context-free grammars and hidden Markov models , 2014, Pattern Recognit. Lett..

[6] Mark-Jan Nederhof,et al. A Probabilistic Model of Ancient Egyptian Writing , 2017, FSMNLP.

[7] Jan C. van Gemert,et al. Automatic Egyptian hieroglyph recognition by retrieving images as texts , 2013, ACM Multimedia.

[8] B. Cumming,et al. Egyptian Historical Records of the Later Eighteenth Dynasty , 1983 .