On Binary Reduction of Large-Scale Multiclass Classification Problems

In the context of large-scale problems, traditional multiclass classification approaches have to deal with class imbalancement and complexity issues which make them inoperative in some extreme cases. In this paper we study a transformation that reduces the initial multiclass classification of examples into a binary classification of pairs of examples and classes. We present generalization error bounds that exhibit the interdependency between the pairs of examples and which recover known results on binary classification with i.i.d. data. We show the efficiency of the deduced algorithm compared to state-of-the-art multiclass classification strategies on two large-scale document collections especially in the interesting case where the number of classes becomes very large.

[1]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  A review on the combination of binary classifiers in multiclass problems , 2008, Artificial Intelligence Review.

[2]  Ioannis Partalas,et al.  On power law distributions in large-scale taxonomies , 2014, SKDD.

[3]  Johannes Fürnkranz,et al.  Efficient implementation of class-based decomposition schemes for Naïve Bayes , 2013, Machine Learning.

[4]  Jason Weston,et al.  WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.

[5]  John Langford,et al.  Logarithmic Time Online Multiclass prediction , 2015, NIPS.

[6]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[7]  Mehryar Mohri,et al.  Rademacher Complexity Bounds for Non-I.I.D. Processes , 2008, NIPS.

[8]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[9]  Jason Weston,et al.  Multi-Class Support Vector Machines , 1998 .

[10]  John Langford,et al.  Error-Correcting Tournaments , 2009, ALT.

[11]  Douglas A. Wolfe,et al.  Nonparametrics: Statistical Methods Based on Ranks and Its Impact on the Field of Nonparametric Statistics , 2012 .

[12]  Liva Ralaivola,et al.  Chromatic PAC-Bayes Bounds for Non-IID Data , 2009, AISTATS.

[13]  Massih-Reza Amini,et al.  Generalization error bounds for classifiers trained with interdependent data , 2005, NIPS.

[14]  Johannes Fürnkranz,et al.  Efficient prediction algorithms for binary decomposition techniques , 2011, Data Mining and Knowledge Discovery.

[15]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[16]  Svante Janson,et al.  Large deviations for sums of partly dependent random variables , 2004 .

[17]  Andreas Christmann,et al.  Fast Learning from Non-i.i.d. Observations , 2009, NIPS.

[18]  Georgios Paliouras,et al.  LSHTC: A Benchmark for Large-Scale Text Classification , 2015, ArXiv.

[19]  Eyke Hüllermeier,et al.  On Minimizing the Position Error in Label Ranking , 2007, ECML.

[20]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[21]  Dan Roth,et al.  Constraint Classification: A New Approach to Multiclass Classification , 2002, ALT.

[22]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..