Scalable Semi-Supervised Aggregation of Classifiers

We present and empirically evaluate an efficient algorithm that learns to aggregate the predictions of an ensemble of binary classifiers. The algorithm uses the structure of the ensemble predictions on unlabeled data to yield significant performance improvements. It does this without making assumptions on the structure or origin of the ensemble, without parameters, and as scalably as linear learning. We empirically demonstrate these performance gains with random forests.

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[3]  Frédéric Jurie,et al.  Fast Discriminative Visual Codebooks using Randomized Clustering Forests , 2006, NIPS.

[4]  Alex M. Andrew,et al.  Boosting: Foundations and Algorithms , 2012 .

[5]  Yoav Freund,et al.  Boosting: Foundations and Algorithms , 2012 .

[6]  Yuval Kluger,et al.  Ranking and combining multiple predictors without labeled data , 2013, Proceedings of the National Academy of Sciences.

[7]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[8]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[9]  Yoav Freund,et al.  Game theory, on-line prediction and boosting , 1996, COLT '96.

[10]  Yehuda Koren,et al.  The BellKor Solution to the Netflix Grand Prize , 2009 .

[11]  Avrim Blum,et al.  Empirical Support for Winnow and Weighted-Majority Algorithms: Results on a Calendar Scheduling Domain , 2004, Machine Learning.

[12]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[13]  Yi Lin,et al.  Random Forests and Adaptive Nearest Neighbors , 2006 .

[14]  Yoram Singer,et al.  Using and combining predictors that specialize , 1997, STOC '97.

[15]  Jared Nambwenya,et al.  Give Me Some Credit , 2014 .

[16]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[17]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[18]  Yoav Freund,et al.  Optimally Combining Classifiers Using Unlabeled Data , 2015, COLT.

[19]  Yi Liu,et al.  SemiBoost: Boosting for Semi-Supervised Learning , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[21]  Rich Caruana,et al.  An empirical evaluation of supervised learning in high dimensions , 2008, ICML '08.

[22]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[23]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[24]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[25]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.