Using a hidden Markov model to transcribe handwritten bushman texts

The Bushman texts in the Bleek and Lloyd Collection contain complex diacritics that make automatic transcription difficult. Transcriptions of these texts would allow for enhanced digital library services to be created for interacting with the collection. In this study, an investigation into automatic transcription of the Bushman texts was performed using the popular method of using a Hidden Markov Model for text line recognition. The results show that while this technique may be well suited to well-constrained and understood scripts, its application to more complex scripts introduces a number of difficulties that need to be overcome.