Incorporating conditional independence assumption with support vector machines to enhance handwritten character segmentation performance

Learning Bayesian belief networks (BBN) from corpora and incorporating the extracted inferring knowledge with a support vector machines (SVM) classifier has been applied to character segmentation for unconstrained handwritten text. By taking advantage of the plethora of unlabeled data found in image databases in addition to available labeled examples, we overcome the expensive task of annotating the whole set of training data and the performance of the character segmentation learner is increased. In addition to this approach, which has not yet been used for this task, we have experimented with two well-known machine learning methods (learning vector quantization and a simplified version of the transformation-based learning theory). We argue that a classifier generated from BBN and SVM is well suited for learning to identify the correct segment boundaries. Empirical results support this claim. Performance has been methodically evaluated using both English and Modern Greek corpora in order to determine the unbiased behavior of the trained models. Limited training data are proved to have satisfactory results. We have been able to achieve precision exceeding 87.5%.