Rho-estimators revisited: General theory and applications

Following Baraud, Birg\'e and Sart (2017), we pursue our attempt to design a robust universal estimator of the joint ditribution of $n$ independent (but not necessarily i.i.d.) observations for an Hellinger-type loss. Given such observations with an unknown joint distribution $\mathbf{P}$ and a dominated model $\mathscr{Q}$ for $\mathbf{P}$, we build an estimator $\widehat{\mathbf{P}}$ based on $\mathscr{Q}$ and measure its risk by an Hellinger-type distance. When $\mathbf{P}$ does belong to the model, this risk is bounded by some quantity which relies on the local complexity of the model in a vicinity of $\mathbf{P}$. In most situations this bound corresponds to the minimax risk over the model (up to a possible logarithmic factor). When $\mathbf{P}$ does not belong to the model, its risk involves an additional bias term proportional to the distance between $\mathbf{P}$ and $\mathscr{Q}$, whatever the true distribution $\mathbf{P}$. From this point of view, this new version of $\rho$-estimators improves upon the previous one described in Baraud, Birg\'e and Sart (2017) which required that $\mathbf{P}$ be absolutely continuous with respect to some known reference measure. Further additional improvements have been brought as compared to the former construction. In particular, it provides a very general treatment of the regression framework with random design as well as a computationally tractable procedure for aggregating estimators. We also give some conditions for the Maximum Likelihood Estimator to be a $\rho$-estimator. Finally, we consider the situation where the Statistician has at disposal many different models and we build a penalized version of the $\rho$-estimator for model selection and adaptation purposes. In the regression setting, this penalized estimator not only allows to estimate the regression function but also the distribution of the errors.

[1]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory A.

[2]  Y. Cherruault,et al.  Méthodes pour la recherche de points de selle , 1973 .

[3]  L. Lecam Convergence of Estimates Under Dimensionality Restrictions , 1973 .

[4]  Lucien Birgé Approximation dans les espaces métriques et théorie de l'estimation , 1983 .

[5]  D. Pollard Convergence of stochastic processes , 1984 .

[6]  L. L. Cam,et al.  Maximum likelihood : an introduction , 1990 .

[7]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[8]  P. Massart,et al.  Minimum contrast estimators on sieves: exponential bounds and rates of convergence , 1998 .

[9]  S. Geer Applications of empirical process theory , 2000 .

[10]  L. Györfi,et al.  A Distribution-Free Theory of Nonparametric Regression (Springer Series in Statistics) , 2002 .

[11]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[12]  V. Koltchinskii,et al.  Concentration inequalities and asymptotic results for ratio type empirical processes , 2006, math/0606788.

[13]  L. Cam Asymptotic Normality of Experiments , 2006 .

[14]  V. Koltchinskii Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0083.

[15]  L. Birge,et al.  Model selection via testing: an alternative to (penalized) maximum likelihood estimators , 2006 .

[16]  P. Massart,et al.  Concentration inequalities and model selection , 2007 .

[17]  Jean-Yves Audibert,et al.  Robust linear least squares regression , 2010, 1010.0074.

[18]  Y. Baraud Bounding the Expectation of the Supremum of an Empirical Process Over a (Weak) VC-Major Class , 2014, 1411.5571.

[19]  M. Sart Estimating the conditional density by histogram type estimators and model selection , 2015, 1512.07052.

[20]  L. Birge,et al.  Rho-estimators for shape restricted density estimation , 2016 .

[21]  L. Birge,et al.  A new method for estimation and model selection:$$\rho $$ρ-estimation , 2014, 1403.6057.