Computer Aided Indexing of Historical Manuscripts

Arabic manuscripts represent a rich source of knowledge that has been highly underutilized. Huge repositories of historical artifacts are yet to be typeset and published in book-form. Given vast content of these manuscripts, it is important to develop indexing systems that support content-based retrieval from historical manuscripts. In this paper, we propose a computer aided retrieval and indexing system for Arabic historical manuscripts. The proposed system extracts meaningful information (features) that is used in indexing. Some preprocessing steps are also implemented in order to enhance the quality of document images. More than one form of a similarity measure has been tested. The developed prototype system has shown encouraging results with respect to the word matching rates achieved

[1]  Bin Zhang,et al.  Transcript mapping for historic handwritten document images , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[2]  Jonathan J. Hull,et al.  Proper noun detection in document images , 1994, Pattern Recognit..

[3]  Francine Chen,et al.  Detection and location of multicharacter sequences in lines of imaged text , 1996, J. Electronic Imaging.

[4]  Atsuhiro Takasu,et al.  Probabilistic Automaton-Based Fuzzy English-Text Retrieval , 2003 .

[5]  S. M. Hardingy,et al.  An Evaluation of Information Retrieval Accuracy with Simulated Ocr Output , 1992 .

[6]  Ming-Kuei Hu,et al.  Visual pattern recognition by moment invariants , 1962, IRE Trans. Inf. Theory.

[7]  Jeff L. DeCurtins,et al.  Keyword spotting via word shape recognition , 1995, Electronic Imaging.

[8]  Henry S. Baird Difficult and urgent open problems in document image analysis for libraries , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[9]  Sabri A. Mahmoud,et al.  Arabic character recognition using fourier descriptors and character contour encoding , 1994, Pattern Recognit..

[10]  Shaolei Feng,et al.  Using Corner Feature Correspondences to Rank Word Images by Similarity , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[11]  Özgür Ulusoy,et al.  Content-based retrieval of historical Ottoman documents stored as textual images , 2004, IEEE Transactions on Image Processing.

[12]  Edward M. Riseman,et al.  Indexing handwriting using word matching , 1996, DL '96.

[13]  George Nagy,et al.  A Means for Achieving a High Degree of Compaction on Scan-Digitized Printed Text , 1974, IEEE Transactions on Computers.

[14]  R. Manmatha,et al.  Indexing of Handwritten Historical Documents - Recent Progress , 2003 .

[15]  Francine Chen,et al.  Spotting phrases in lines of imaged text , 1995, Electronic Imaging.

[16]  R. Manmatha,et al.  Word image matching using dynamic time warping , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[17]  N. Otsu A threshold selection method from gray level histograms , 1979 .