An Extended Learning Vector Quantization Algorithm Aiming at Recognition-Based Character Segmentation

Recognition-based segmentation strategies have greatly improved the performance of optical character recognition systems. The key issue of these strategies is to design a classifier that can provide accurate rejection information. Many learning algorithms, such as GLVQ and H2M-LVQ, are not suitable for large category sets and multiple prototypes. More seriously, they often suffer from local minimum state and overtraining. In this paper, we propose an extended learning vector quantization algorithm which can efficiently train the nearest prototype classifier with negative samples. The cost function is based on multiple confusable prototype-pairs so that our algorithm is insensitive to initialization. We also introduce the criterion of safe zone to avoid overtraining. Experimental results show that the classifier trained by our proposed method can achieve good recognition performance and can provide accurate rejection information for segmentation.

[1]  Nei Kato,et al.  A Handwritten Character Recognition System Using Directional Element Feature and Asymmetric Mahalanobis Distance , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Klaus Obermayer,et al.  Soft nearest prototype classification , 2003, IEEE Trans. Neural Networks.

[3]  Cor J. Veenman,et al.  The nearest subclass classifier: a compromise between the nearest mean and nearest neighbor classifier , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Masaki Nakagawa,et al.  Evaluation of prototype learning algorithms for nearest-neighbor classifier in application to handwritten character recognition , 2001, Pattern Recognit..

[5]  Atsushi Sato,et al.  Generalized Learning Vector Quantization , 1995, NIPS.

[6]  Shiro Usui,et al.  Mutation-based genetic neural network , 2005, IEEE Transactions on Neural Networks.

[7]  James C. Bezdek,et al.  Nearest prototype classification: clustering, genetic algorithms, or random search? , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[8]  Qiang Huo,et al.  Improving Chinese/English OCR performance by using MCE-based character-pair modeling and negative training , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[9]  Carlos Eduardo Pedreira,et al.  Learning vector quantization with training data selection , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  A. Kai Qin,et al.  Initialization insensitive LVQ algorithm based on cost-function adaptation , 2005, Pattern Recognit..