Recognition-based handwritten Chinese character segmentation using a probabilistic Viterbi algorithm

Abstract This paper presents a recognition-based character segmentation method for handwritten Chinese characters. Possible non-linear segmentation paths are initially located using a probabilistic Viterbi algorithm. Candidate segmentation paths are determined by verifying overlapping paths, between-character gaps, and adjacent-path distances. A segmentation graph is then constructed using candidate paths to represent nodes and two nodes with appropriate distances are connected by an arc. The cost in each arc is a function of character recognition distances, squareness of characters and internal gaps in characters. After the shortest path is detected from the segmentation graph, the nodes in the path represent optimal segmentation paths. In addition, 125 text-line images are collected from seven form documents. Cumulatively, these text-lines contain 1132 handwritten Chinese characters. The average segmentation rate in our experiments is 95.58%. Moreover, the probabilistic Viterbi algorithm is modified slightly to extract text-lines from document pages by obtaining non-linear paths while gaps between text-lines are not obvious. This algorithm can also be modified to segment characters from printed text-line images by adjusting parameters used to represent costs of arcs in the segmentation graph.

[1]  Seong-Whan Lee,et al.  A new methodology for gray-scale character segmentation and recognition , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[2]  Hsi-Jian Lee,et al.  PII: S0031-3203(98)00043-0 , 1998 .

[3]  Ellis Horowitz,et al.  Fundamentals of Computer Algorithms , 1978 .

[4]  Paramvir Bahl,et al.  Recognition of handwritten word: first and second order hidden Markov model based approach , 1988, Proceedings CVPR '88: The Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Jin Wang,et al.  Segmentation of merged characters by neural networks and shortest path , 1994, Pattern Recognit..

[6]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[7]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[8]  Eric Lecolinet,et al.  A Survey of Methods and Strategies in Character Segmentation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Rung Ching Chen,et al.  Segmenting handwritten Chinese characters based on heuristic merging of stroke bounding boxes and dynamic programming , 1998, Pattern Recognit. Lett..

[10]  Yi Lu,et al.  Character segmentation in handwritten words - An overview , 1996, Pattern Recognit..

[11]  Yi Lu,et al.  Machine printed character segmentation --; An overview , 1995, Pattern Recognit..