Coherence Functions for Multicategory Margin-based Classification Methods

Margin-based classification methods are typically devised based on a majorizationminimization procedure, which approximately solves an otherwise intractable minimization problem defined with the 0-l loss. The extension of such methods from the binary classification setting to the more general multicategory setting turns out to be nontrivial. In this paper, our focus is to devise margin-based classification methods that can be seamlessly applied to both settings, with the binary setting simply as a special case. In particular, we propose a new majorization loss function that we call the coherence function, and then devise a new multicategory margin-based boosting algorithm based on the coherence function. Analogous to deterministic annealing, the coherence function is characterized by a temperature factor. It is closely related to the multinomial log-likelihood function and its limit at zero temperature corresponds to a multicategory hinge loss function.

[1]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[2]  Rose,et al.  Statistical mechanics and phase transitions in clustering. , 1990, Physical review letters.

[3]  H. Zou,et al.  NEW MULTICATEGORY BOOSTING ALGORITHMS BASED ON MULTICATEGORY FISHER-CONSISTENT LOSSES. , 2008, The annals of applied statistics.

[4]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[5]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[6]  Gang Wang,et al.  Annealed Discriminant Analysis , 2005, ECML.

[7]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[8]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[9]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[10]  Tong Zhang,et al.  Statistical Analysis of Some Multi-Category Large Margin Classification Methods , 2004, J. Mach. Learn. Res..

[11]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[12]  Ambuj Tewari,et al.  On the Consistency of Multiclass Classification Methods , 2007, J. Mach. Learn. Res..

[13]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[14]  Tong Zhang,et al.  Text Categorization Based on Regularized Linear Classification Methods , 2001, Information Retrieval.