Multi-scale feature extraction and nested-subset classifier design for high accuracy handwritten character recognition

Both efficient representation and robust classification are essential to high-performance cursive offline handwritten Chinese character recognition. A novel multi-scale feature extraction method is presented based on the information entropy theory. Feature detection and compression are thus combined into an integrated optimization process. A series of optimal feature-spaces are constructed at varying values of the scale parameter and the best one is obtained with the maximum LDA criterion over the scale interval. For more robust classification, we introduce a structure into the Mahalanobis distance classifier and strike the balance between machine capacity and the performance on the training data in light of the ideas of structural risk minimization. A high accuracy recognition system is developed based on the new methods and for the first time, 4 widely different databases ranging from regular to completely unconstrained with several structural distortions and stroke connections are fully tested. The accuracies of 99.S% on regular database and 88.4% on cursive one at the speed of over 40 characters/s are achieved.