论文信息 - Extreme Multi Class Classification

Extreme Multi Class Classification

We consider the multi class classification problem under the setting where the number of labels is very large and hence it is very desirable to efficiently achieve train and test running times which are logarithmic in the label complexity. Additionally the labels are feature dependent in our setting. We propose a reduction of this problem to a set of binary regression problems organized in a tree structure and we introduce a simple top-down criterion for purification of labels that allows for gradient descent style optimization. Furthermore we prove that maximizing the proposed objective function (splitting criterion) leads simultaneously to pure and balanced splits. We use the entropy of the tree leafs, a standard measure used in decision trees, to measure the quality of obtained tree and we show an upperbound on the number of splits required to reduce this measure below threshold . Finally we empirically show that the splits recovered by our algorithm leads to significantly smaller error than random splits.

A. Choromańska

[1] Yishay Mansour,et al. On the Boosting Ability of Top-Down Decision Tree Learning Algorithms , 1999, J. Comput. Syst. Sci..

[2] Yiming Yang,et al. RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[3] Ryan M. Rifkin,et al. In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[4] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .

[5] John Langford,et al. Error-Correcting Tournaments , 2009, ALT.

[6] John Langford,et al. Conditional Probability Tree Estimation Analysis and Algorithms , 2009, UAI.

[7] Jason Weston,et al. Label Embedding Trees for Large Multi-Class Tasks , 2010, NIPS.

[8] Alexander C. Berg,et al. Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition , 2011, NIPS.

[9] Jason Weston,et al. Label Partitioning For Sublinear Ranking , 2013, ICML.

[10] Manik Varma,et al. Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages , 2013, WWW.