An optimized hill climbing algorithm for feature subset selection: evaluation on handwritten character recognition

This paper presents an optimized Hill-Climbing algorithm to select subset of features for handwritten character recognition. The search is conducted taking into account a random mutation strategy and the initial relevance, of each feature in the recognition process. A first set of experiments have shown a reduction in the original number of features used in an MLP-based character recognizer from 132 to 77 features (reduction of 42%) without a significant loss in terms of recognition rates, which are 99.1% for 30,089 digits and 93.0% for 11,941 uppercase characters, both handwritten samples from the NIST SD19 database. Additional experiments have been done by considering some loss in terms of recognition rate during the feature subset selection. A byproduct of these experiments is a cascade classifier based on feature subsets of different sizes, which is used to reduce the complexity of the classification task by 86.54% on the digit recognition experiment. The proposed feature selection method has shown to be an interesting strategy to implement a wrapper approach without the need of complex and expensive hardware architectures.