论文信息 - Aggregation using input-output trade-off

Aggregation using input-output trade-off

In this paper, we introduce a new learning strategy based on a seminal idea of Mojirsheibani (1999, 2000, 2002a, 2002b), who proposed a smart method for combining several classifiers, relying on a consensus notion. In many aggregation methods, the prediction for a new observation x is computed by building a linear or convex combination over a collection of basic estimators r1(x),. .. , rm(x) previously calibrated using a training data set. Mojirsheibani proposes to compute the prediction associated to a new observation by combining selected outputs of the training examples. The output of a training example is selected if some kind of consensus is observed: the predictions computed for the training example with the different machines have to be "similar" to the prediction for the new observation. This approach has been recently extended to the context of regression in Biau et al. (2016). In the original scheme, the agreement condition is actually required to hold for all individual estimators, which appears inadequate if there is one bad initial estimator. In practice, a few disagreements are allowed ; for establishing the theoretical results, the proportion of estimators satisfying the condition is required to tend to 1. In this paper, we propose an alternative procedure, mixing the previous consensus ideas on the predictions with the Euclidean distance computed between entries. This may be seen as an alternative approach allowing to reduce the effect of a possibly bad estimator in the initial list, using a constraint on the inputs. We prove the consistency of our strategy in classification and in regression. We also provide some numerical experiments on simulated and real data to illustrate the benefits of this new aggregation method. On the whole, our practical study shows that our method may perform much better than the original combination technique, and, in particular, exhibit far less variance. We also show on simulated examples that this procedure mixing inputs and outputs is still robust to high dimensional inputs.

Aur'elie Fischer | Mathilde Mougeot

[1] Arnak S. Dalalyan,et al. Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity , 2008, Machine Learning.

[2] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[3] M. Wegkamp. Model selection in nonparametric regression , 2003 .

[4] Yuhong Yang. Adaptive Regression by Mixing , 2001 .

[5] L. Devroye,et al. An equivalence theorem for L1 convergence of the kernel regression estimate , 1989 .

[6] Arkadi Nemirovski,et al. Topics in Non-Parametric Statistics , 2000 .

[7] M. Mojirsheibani. Combining Classifiers via Discretization , 1999 .

[8] P. Massart,et al. Concentration inequalities and model selection , 2007 .

[9] Jean-Yves Audibert. Aggregated estimators and empirical complexity for least square regression , 2004 .

[10] Majid Mojirsheibani,et al. An Almost Surely Optimal Combined Classification Rule , 2002 .

[11] Olivier Catoni,et al. Statistical learning theory and stochastic optimization , 2004 .

[12] Yuhong Yang. Combining Different Procedures for Adaptive Regression , 2000, Journal of Multivariate Analysis.

[13] Majid Mojirsheibani,et al. A COMPARISON STUDY OF SOME COMBINED CLASSIFIERS , 2002 .

[14] A. Juditsky,et al. Functional aggregation for nonparametric regression , 2000 .

[15] Yuhong Yang. Aggregating regression procedures to improve performance , 2004 .

[16] Alejandro Cholaquidis,et al. A nonlinear aggregation type classifier , 2015, J. Multivar. Anal..

[17] A. Tsybakov,et al. Sparsity oracle inequalities for the Lasso , 2007, 0705.3308.

[18] A kernel-based combined classification rule , 2000 .

[19] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[20] James D. Malley,et al. COBRA: A combined regression strategy , 2013, J. Multivar. Anal..

[21] Florentina Bunea,et al. Aggregation and sparsity via 1 penalized least squares , 2006 .

[22] Majid Mojirsheibani,et al. A simple method for combining estimates to improve the overall error rates in classification , 2015, Comput. Stat..

[23] A. Tsybakov,et al. Aggregation for Gaussian regression , 2007, 0710.3654.