Ottoman archives explorer: A retrieval system for digital Ottoman archives

This article presents Ottoman Archives Explorer, a Content-Based Retrieval (CBR) system based on character recognition for printed and handwritten historical documents. Several methods for character segmentation and recognition stages are investigated. In particular, sliding-window and histogram segmentation methods are coupled with recognition approaches using spatial features, neural networks, and a graph-based model. The prototype system provides CBR of document images using both example-based queries and a virtual keyboard to construct query words.

[1]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[2]  Pinar Duygulu Sahin,et al.  Matching ottoman words: an image retrieval approach to historical document indexing , 2007, CIVR '07.

[3]  A. Alparslan,et al.  Osmanlı hat sanatı tarihi , 1999 .

[4]  May Allam Segmentation versus segmentation-free for recognizing Arabic text , 1995, Electronic Imaging.

[5]  Eric C. Jensen,et al.  Retr ieving OCR Text : A Survey of Current Approaches , 2002 .

[6]  Nafiz Arica,et al.  An overview of character recognition focused on off-line handwriting , 2001, IEEE Trans. Syst. Man Cybern. Syst..

[7]  David S. Doermann,et al.  The Indexing and Retrieval of Document Images: A Survey , 1998, Comput. Vis. Image Underst..

[8]  Alvaro Barreiro,et al.  Revisiting N-Gram Based Models for Retrieval in Degraded Large Collections , 2009, ECIR.

[9]  Horst Bunke,et al.  Error Correcting Graph Matching: On the Influence of the Underlying Cost Function , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Fatos T. Yarman-Vural,et al.  A heuristic algorithm for optical character recognition of Arabic script , 1997, Signal Process..

[11]  Özgür Ulusoy,et al.  Content-based retrieval of historical Ottoman documents stored as textual images , 2004, IEEE Transactions on Image Processing.

[12]  Derrick Coetzee TinyLex: static n-gram index pruning with perfect recall , 2008, CIKM '08.

[13]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[14]  Adnan Amin,et al.  Off-line Arabic character recognition: the state of the art , 1998, Pattern Recognit..

[15]  Eric Lecolinet,et al.  A Survey of Methods and Strategies in Character Segmentation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Mohammad S. Khorsheed,et al.  Off-Line Arabic Character Recognition – A Review , 2002, Pattern Analysis & Applications.

[17]  A. Ozturk,et al.  Multifont Ottoman character recognition , 2000, ICECS 2000. 7th IEEE International Conference on Electronics, Circuits and Systems (Cat. No.00EX445).

[18]  O.N. Ucan,et al.  Multifont Ottoman character recognition using support vector machine , 2008, 2008 3rd International Symposium on Communications, Control and Signal Processing.

[19]  Fredric C. Gey,et al.  The TREC 2002 Arabic/English CLIR Track , 2002, TREC.

[20]  David A. Forsyth,et al.  Searching Off-line Arabic Documents , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[21]  U. Gudukbay,et al.  Content-Based Retrieval (CBR) System for Ottoman Archives , 2006, 2006 IEEE 14th Signal Processing and Communications Applications.

[22]  Pinar Duygulu Sahin,et al.  Retrieval of Ottoman documents , 2006, MIR '06.

[23]  Özgür Ulusoy,et al.  Integrated segmentation and recognition of connected Ottoman script , 2009 .

[24]  Hadar I. Avi-Itzhak,et al.  High Accuracy Optical Character Recognition Using Neural Networks with Centroid Dithering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..