Translating handwritten bushman texts

The Bleek and Lloyd Collection is a collection of artefacts documenting the life and language of the Bushman people of southern Africa in the 19th century. Included in this collection is a handwritten dictionary that contains English words and their corresponding |xam Bushman language translations. This dictionary allows for the manual translation of |xam words that appear in the notebooks of the Bleek and Lloyd collection. This, however, is not practical due to the size of the dictionary, which contains over 14000 entries. To solve this problem a content-based image retrieval system was built that allows for the selection of a |xam word from a notebook and returns matching words from the dictionary. The system shows promise with some search keys returning relevant results.

[1]  T. Y. Kong,et al.  Topological Algorithms for Digital Image Processing , 1996 .

[2]  Giovanni Soda,et al.  Indexing and retrieval of words in old documents , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[3]  R. Manmatha,et al.  Word spotting for historical documents , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[4]  Özgür Ulusoy,et al.  Ottoman archives explorer: A retrieval system for digital Ottoman archives , 2010, JOCCH.

[5]  Laurence Likforman-Sulem,et al.  Text line segmentation of historical documents: a survey , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[6]  Shih-Fu Chang,et al.  Image Retrieval: Current Techniques, Promising Directions, and Open Issues , 1999, J. Vis. Commun. Image Represent..

[7]  Ioannis Pratikakis,et al.  Text line and word segmentation of handwritten documents , 2009, Pattern Recognit..

[8]  Naphtali Rishe,et al.  Content-based image retrieval , 1995, Multimedia Tools and Applications.

[9]  R. Manmatha,et al.  Word spotting: indexing handwritten manuscripts , 1997 .

[10]  Apostolos Antonacopoulos,et al.  Document image analysis for World War II personal records , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[11]  Gregory L. Heileman,et al.  Protecting digital archives at the Greek Orthodox Archdiocese of America , 2003, DRM '03.

[12]  Hussein Suleman,et al.  Digital Libraries Without Databases: The Bleek and Lloyd Collection , 2007, ECDL.

[13]  Frank Lebourgeois,et al.  Text search for medieval manuscript images , 2007, Pattern Recognit..

[14]  David Salesin,et al.  Fast multiresolution image querying , 1995, SIGGRAPH.

[15]  R. Manmatha,et al.  Scale Space Technique for Word Segmentation in Handwritten Documents , 1999, Scale-Space.

[16]  Jean-Marie Pinon,et al.  Online ancient documents: Armarius , 2008, DocEng '08.