Adaptive boosting with leader based learners for classification of large handwritten data

Boosting is a general method for improving the accuracy of a learning algorithm. AdaBoost, short form for adaptive boosting method, consists of repeated use of a weak or a base learning algorithm to find corresponding weak hypothesis by adapting to the error rates of the individual weak hypotheses. A large, complex handwritten data is under study. A repeated use of weak learner on the huge data results in large amount of processing time. In view of this, instead of using the entire training data for learning, we propose to use only prototypes. Further, in the current work, the base learner consists of a nearest neighbour classifier that employs prototypes generated using "leader" clustering algorithm. The leader algorithm is a single pass algorithm and is linear in terms of time as well as computation complexity. The prototype set alone is used as training data. In the process of developing an algorithm, domain knowledge of the Handwritten data, which is under study, is made use of. With the fusion of clustering, prototype selection, AdaBoost and Nearest Neighbour classifier, a very high classification accuracy, which is better than reported earlier on the considered data, is obtained in less number of iterations. The procedure integrates clustering outcome in terms of prototypes with boosting.

[1]  Robert E. Schapire,et al.  Theoretical Views of Boosting and Applications , 1999, ALT.

[2]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[3]  M. Narasimha Murty,et al.  Fusion of multiple approximate nearest neighbor classifiers for fast and efficient classification , 2004, Inf. Fusion.

[4]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[5]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[6]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Robert F. Ling,et al.  Cluster analysis algorithms for data reduction and classification of objects , 1981 .

[8]  T. Ravindra Babu,et al.  Comparison of genetic algorithm based prototype selection schemes , 2001, Pattern Recognit..

[9]  Guido Dedene,et al.  A case study of applying boosting naive Bayes to claim fraud diagnosis , 2004, IEEE Transactions on Knowledge and Data Engineering.

[10]  Leslie G. Valiant,et al.  Cryptographic Limitations on Learning Boolean Formulae and Finite Automata , 1993, Machine Learning: From Theory to Applications.

[11]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[12]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[13]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[14]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[15]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[16]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .