Word segmentation in handwritten Korean text lines based on gap clustering techniques

We propose a word segmentation method for handwritten Korean text lines. It uses gap information to separate a text line into word units, where the gap is defined as a white-run obtained after a vertical projection of the line image. Each gap is classified into a between-word gap or a within-word gap using a clustering technique. We take up three gap metrics - the bounding box (BB), run-length/Euclidean (RLE) and convex hull (CH) distances - which are known to have superior performance in Roman-style word segmentation, and three clustering techniques - the average linkage method, the modified MAX method and sequential clustering. An experiment with 498 text-line images extracted from live mail pieces has shown that the best performance is obtained by the sequential clustering technique using all three gap metrics.

[1]  Gyeonghwan Kim,et al.  Handwritten phrase recognition as applied to street name images , 1998, Pattern Recognit..

[2]  Alexander Filatov,et al.  Handwritten ZIP code recognition , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[3]  Giovanni Seni,et al.  External word segmentation of off-line handwritten text lines , 1994, Pattern Recognit..

[4]  Sargur N. Srihari,et al.  Integration of hand-written address interpretation technology into the United States Postal Service Remote Computer Reader system , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[5]  Ching Y. Suen,et al.  Cursive Script Recognition: A Sentence Level Recognition Scheme , 1994 .

[6]  John Illingworth,et al.  The advantage of using an HMM-based approach for faxed word recognition , 1998, International Journal on Document Analysis and Recognition.

[7]  Berrin A. Yanikoglu,et al.  Segmentation of off-line cursive handwriting using linear programming , 1998, Pattern Recognit..

[8]  Uma Mahadevan,et al.  Gap metrics for word separation in handwritten lines , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[9]  Udi Manber,et al.  Introduction to algorithms - a creative approach , 1989 .

[10]  Sargur N. Srihari,et al.  A system to read names and addresses on tax forms , 1996 .