Binarization, character extraction, and writer identification of historical Hebrew calligraphy documents

We present our work on the paleographic analysis and recognition system intended for processing of historical Hebrew calligraphy documents. The main goal is to analyze documents of different writing styles in order to identify the locations, dates, and writers of test documents. Using interactive software tools, a data base of extracted characters has been established. It now contains about 20,000 characters of 34 different writers, and will be distinctly expanded in the near future. Preliminary results of automatic extraction of pre-specified letters using the erosion operator are presented. We further propose and test topological features for handwriting style classification based on a selected subset of the Hebrew alphabet. A writer identification experiment using 34 writers yielded 100% correct classification.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  Mark R. Stevens,et al.  Automatic feature selection with applications to script identification of degraded documents , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[3]  Xiaoqing Ding,et al.  Writer identification using directional element features and linear transform , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[4]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[5]  Robert M. Haralick,et al.  Model-based shape recognition using recursive mathematical morphology , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[6]  Henri Maître,et al.  An expert vision system for analysis of Hebrew characters and authentication of manuscripts , 1991, Pattern Recognit..

[7]  Robert M. Haralick,et al.  A segmentation-free approach to text recognition with application to Arabic text , 1996, International Journal on Document Analysis and Recognition.

[8]  Xinhua Zhuang,et al.  Image Analysis Using Mathematical Morphology , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[10]  J. Kittler,et al.  Feature Set Search Alborithms , 1978 .

[11]  I. Dinstein,et al.  Ancient Hebraic Handwriting Identification with Run-Length Histograms , 1982 .

[12]  Özgür Ulusoy,et al.  Content-based retrieval of historical Ottoman documents stored as textual images , 2004, IEEE Transactions on Image Processing.

[13]  Itay Bar-Yosef Input sensitive thresholding for ancient Hebrew manuscript , 2005 .

[14]  David G. Kirkpatrick,et al.  Linear Time Euclidean Distance Algorithms , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Lluís A. Belanche Muñoz,et al.  Feature selection algorithms: a survey and experimental evaluation , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[16]  Tieniu Tan,et al.  Personal identification based on handwriting , 2000, Pattern Recognit..

[17]  Sargur N. Srihari,et al.  Individuality of handwritten characters , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[18]  Yueting Zhuang,et al.  Retrieval of Chinese Calligraphic Character Image , 2004, PCM.

[19]  Sargur N. Srihari,et al.  Analysis of handwriting individuality using word features , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[20]  Louis Vuurpijl,et al.  Writer identification using edge-based directional features , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[21]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..