Unsupervised Ensemble Regression

Consider a regression problem where there is no labeled data and the only observations are the predictions $f_i(x_j)$ of $m$ experts $f_{i}$ over many samples $x_j$. With no knowledge on the accuracy of the experts, is it still possible to accurately estimate the unknown responses $y_{j}$? Can one still detect the least or most accurate experts? In this work we propose a framework to study these questions, based on the assumption that the $m$ experts have uncorrelated deviations from the optimal predictor. Assuming the first two moments of the response are known, we develop methods to detect the best and worst regressors, and derive U-PCR, a novel principal components approach for unsupervised ensemble regression. We provide theoretical support for U-PCR and illustrate its improved accuracy over the ensemble mean and median on a variety of regression problems.

[1]  Xi Chen,et al.  Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing , 2014, J. Mach. Learn. Res..

[2]  Yuval Kluger,et al.  Unsupervised Ensemble Learning with Dependent Classifiers , 2015, AISTATS.

[3]  John C. Platt,et al.  Learning from the Wisdom of Crowds by Minimax Entropy , 2012, NIPS.

[4]  Alípio Mário Jorge,et al.  Ensemble approaches for regression: A survey , 2012, CSUR.

[5]  L. Cooper,et al.  When Networks Disagree: Ensemble Methods for Hybrid Neural Networks , 1992 .

[6]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[7]  Evan O. Paull,et al.  Inferring causal molecular networks: empirical assessment through a community-based effort , 2016, Nature Methods.

[8]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[9]  Yuval Kluger,et al.  Estimating the accuracies of multiple classifiers without labeled data , 2014, AISTATS.

[10]  Paul T. Spellman,et al.  Context Specificity in Causal Signaling Networks Revealed by Phosphoprotein Profiling , 2017, Cell systems.

[11]  Michael J. Pazzani,et al.  A Principal Components Approach to Combining Regression Estimates , 1999, Machine Learning.

[12]  Tom M. Mitchell,et al.  Estimating Accuracy from Unlabeled Data: A Bayesian Approach , 2016, ICML.

[13]  Yuval Kluger,et al.  Ranking and combining multiple predictors without labeled data , 2013, Proceedings of the National Academy of Sciences.

[14]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[15]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[16]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[17]  Tom M. Mitchell,et al.  Estimating Accuracy from Unlabeled Data , 2014, UAI.

[18]  Brent Lance,et al.  Spectral meta-learner for regression (SMLR) model aggregation: Towards calibrationless brain-computer interface (BCI) , 2016, 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[19]  R. Tibshirani,et al.  Combining Estimates in Regression and Classification , 1996 .

[20]  Frédéric Lavancier,et al.  A general procedure to combine estimators , 2014, Comput. Stat. Data Anal..

[21]  Maher Mnif,et al.  Perturbation theory of lower semi-Browder multivalued linear operators , 2011 .

[22]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[23]  Hadi Fanaee-T,et al.  Event labeling combining ensemble detectors and background knowledge , 2014, Progress in Artificial Intelligence.

[24]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[25]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[26]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[27]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[28]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[29]  B. Nadler Finite sample approximation results for principal component analysis: a matrix perturbation approach , 2009, 0901.3245.

[30]  Valen E. Johnson,et al.  On Bayesian Analysis of Multirater Ordinal Data: An Application to Automated Essay Grading , 1996 .

[31]  Krishnakumar Balasubramanian,et al.  Unsupervised Supervised Learning I: Estimating Classification and Regression Errors without Labels , 2010, J. Mach. Learn. Res..

[32]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[33]  A. Timmermann Forecast Combinations , 2005 .

[34]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[35]  Paulo Cortez,et al.  Modeling wine preferences by data mining from physicochemical properties , 2009, Decis. Support Syst..

[36]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[37]  J. Buxbaum,et al.  A SPECTRAL APPROACH INTEGRATING FUNCTIONAL GENOMIC ANNOTATIONS FOR CODING AND NONCODING VARIANTS , 2015, Nature Genetics.