A New Editing Scheme Based on a Fast Two-String Median Computation Applied to OCR

This paper presents a new fast algorithm to compute an approximation to the median between two strings of characters representing a 2D shape and its application to a new classification scheme to decrease its error rate. The median string results from the application of certain edit operations from the minimum cost edit sequence to one of the original strings. The new dataset editing scheme relaxes the criterion to delete instances proposed by the Wilson Editing Procedure. In practice, not all instances misclassified by its near neighbors are pruned. Instead, an artificial instance is added to the dataset expecting to successfully classify the instance on the future. The new artificial instance is the median from the misclassified sample and its same-class nearest neighbor. The experiments over two widely used datasets of handwritten characters show this preprocessing scheme can reduce the classification error in about 78% of trials.

[1]  Ram n A. Mollineda C rdenas A Learning Model for Multiple-Prototype Classification of Strings , 2004 .

[2]  Juan Ramón Rico-Juan,et al.  Comparison of AESA and LAESA search algorithms using string and tree-edit-distances , 2003, Pattern Recognit. Lett..

[3]  E. Vidal,et al.  COMPARISON OF SEVERAL EDITING AND CONDENSING TECHNIQUES FOR COLOUR IMAGE SEGMENTATION AND OBJECT LOCATION , 1992 .

[4]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[5]  Francisco Casacuberta,et al.  Median strings for k-nearest neighbour classification , 2003, Pattern Recognit. Lett..

[6]  Francesc J. Ferri,et al.  Colour image segmentation and labeling through multiedit-condensing , 1992, Pattern Recognit. Lett..

[7]  R.A.M. Cardenas A learning model for multiple-prototype classification of strings , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[8]  Jack Koplowitz,et al.  On the relation of performance to editing in nearest neighbor rules , 1981, Pattern Recognit..

[9]  José Francisco Martínez Trinidad,et al.  Edition Schemes Based on BSE , 2005, CIARP.

[10]  Filiberto Pla,et al.  A Stochastic Approach to Wilson's Editing Algorithm , 2005, IbPRIA.

[11]  Anil K. Jain,et al.  Automatic Construction of 2D Shape Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[13]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[14]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[15]  Ivan Tomek,et al.  A Generalization of the k-NN Rule , 1976, IEEE Transactions on Systems, Man, and Cybernetics.