Word segmentation of handwritten text using supervised classification techniques

Recent work on extracting features of gaps in handwritten text allows a classification of these gaps into inter-word and intra-word classes using suitable classification techniques. In this paper, we first analyse the features of the gaps using mutual information. We then investigate the underlying data distribution by using visualisation methods. These suggest that a complicated structure exists, which makes them difficult to be separated into two distinct classes. We apply five different supervised classification algorithms from the machine learning field on both the original dataset and a dataset with the best features selected using mutual information. Moreover, we improve the classification result with the aid of a set of feature variables of strokes preceding and following each gap. The classifiers are compared by employing McNemar's test. We find that SVMs and MLPs outperform the other classifiers and that preprocessing to select features works well. The best classification result attained suggests that the technique we employ is particularly suitable for digital ink manipulation at the level of words.

[1]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[2]  Timothy S. Butler,et al.  Human interaction with digital ink : legibility measurement and structural analysis , 2003 .

[3]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[4]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[5]  Sargur N. Srihari,et al.  On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Ching,et al.  The State of the Art in On-Line Handwriting Recognition , 2000 .

[7]  Dean Rubine,et al.  Specifying gestures by example , 1991, SIGGRAPH.

[8]  Edward K. Blum,et al.  Approximation theory and feedforward networks , 1991, Neural Networks.

[9]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[10]  Ching Y. Suen,et al.  The State of the Art in Online Handwriting Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[12]  Kari Torkkola,et al.  Feature Extraction by Non-Parametric Mutual Information Maximization , 2003, J. Mach. Learn. Res..

[13]  Lambert Schomaker,et al.  Proceedings of the Third International Symposium on Handwriting and Computer Applications , 1987 .

[14]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[15]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[16]  David J. Fleet,et al.  Perceptual Organization as a Foundation for Intelfigent Sketch Editing , 2002 .