Mathematisches Forschungsinstitut Oberwolfach Modern Nonparametric Statistics: Going beyond Asymptotic Minimax

During the years 1975 1990 a major emphasis in nonparametric estimation was put on computing the asymptotic minimax risk for many classes of functions. Modern statistical practice indicates some serious limitations of the asymptotic minimax approach and calls for some new ideas and methods which can cope with the numerous challenges brought to statisticians by modern sets of data. Mathematics Subject Classification (2000): 62Gxx. Introduction by the Organisers The workshop took place during the period March 28 April 2 and, as usual, talks were planned from Monday morning to Friday morning (most participants leaving on Friday afternoon) with a break on Wednesday afternoon for the traditional walk to Saint-Roman. There were finally 48 participants, due to some late cancellations. Unfortunately, Iain Johnstone could not attend the meeting since he had a very important committment in the US with the NSF during that week. However, he could participate quite actively in the organization up to the last minute since we, organizers, had the opportunity to meet together during a previous workshop and also exchange extensively by e-mail through which the list of participants and talks and all final details were set up. Therefore we were really three organizers and the success of the meeting should be put on the three of us. Actually, the list of speakers and the schedule of the talks were ready before our arrival and only minor changes 884 Oberwolfach Report 16/2010 were made during that week. This precise schedule can be found at the end of our report. During the years 1975 1990 (roughly speaking) a major emphasis in nonparametric estimation was put on computing the (possibly asymptotic) minimax risk for many classes of functions, starting from the simplest Holder classes to the more sophisticated Besov balls in the beginning of the 90’s. It was clear, at that time, that this minimax point of view was quite pessimistic, since it was directed towards the worse case and also unrealistic, since one never knows to which smoothness class (or other specific class) the true parameter does belong. Nevertheless, this approach allowed to design useful estimators, which could be more or less practically calibrated (by cross-validation for instance) and provided some benchmarks for the performance of a given method. Then, by the beginning of the 90’s (approximately), started an important movement towards what is now called adaptation, either to some smoothness class or to the specific function that was to be estimated. This was made via different tools like Lepski’s method, the use of localized basis and thresholding, model selection . . . More recently, many new methods (aggregation of estimators, Lasso, etc.) appeared in order to cope with the numerous challenges brought to statisticians by modern sets of data and the huge progress of computing : huge data sets or situations where the number of unknown parameters is much larger than the number of data, together with some sparsity assumption. This also coincides with an important renewal of Bayesian methods due to much better and powerful computing facilities. Workshop organization In view of the importance of the numerous new techniques that are presently studied and used to solve the challenges offered by the modern sets of data, we decided that the main purpose of the workshop would be to expose many young researchers to those new techniques. We invited a number of confirmed specialists and experts together with younger professionals, PhD. students, postdocs, new assistant professors, in order to get a mix of generations and experiences. We also selected 5 senior professors to give longer talks (one hour and a half, one each morning) in order to develop their subject. These persons were especially asked, several months before the workshop, to deliver these special conferences. We also spent a lot of time and discussion in order to select the talks among the proposals by the participants in order to keep a maximal coherence between the subjects and keep the level as high as possible, finally limiting the number of talks to 24, including the five major ones mentioned above, and avoiding the multiplication of short talks. All normal talks were of 45mn, with the exception of the last morning since it was asked to us by the MFO organization to shorten the session for an early lunch (apparently for Easter vacation). It was also an occasion for us to invite an unusually large number of participants (mostly young researchers and some more senior French) that visited the MFO for Modern Nonparametric Statistics: Going Beyond Asymptotic Minimax 885 the first time, which gave them an occasion to discover this very nice place, the wonderful library, the numerous working facilities and the excellent MFO organization (as usual). We tried, as much as possible, to organize our 8 sessions around themes like Model Selection, Adaptive Density Estimation, High-dimensional Data and Sparsity, Statistics for Processes, Nonparametric Bayesian Methods, with also some talks by young and promising researchers which were given exactly the same time as the more senior ones. Modern Nonparametric Statistics: Going Beyond Asymptotic Minimax 887 Workshop: Modern Nonparametric Statistics: Going Beyond Asymptotic Minimax

[1]  H. Robbins An Empirical Bayes Approach to Statistics , 1956 .

[2]  B. Efron,et al.  Empirical Bayes on vector observations: An extension of Stein's method , 1972 .

[3]  C. L. Mallows Some comments on C_p , 1973 .

[4]  B. Efron,et al.  Stein's Estimation Rule and Its Competitors- An Empirical Bayes Approach , 1973 .

[5]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[6]  H. Robbins Asymptotically Subminimax Solutions of Compound Statistical Decision Problems , 1985 .

[7]  R. DeVore,et al.  Degree of Adaptive Approximation , 1990 .

[8]  O. Lepskii On a Problem of Adaptive Estimation in Gaussian White Noise , 1991 .

[9]  C. Stein,et al.  Estimation with Quadratic Loss , 1992 .

[10]  A. Kneip Ordered Linear Smoothers , 1994 .

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  M. Nussbaum Asymptotic Equivalence of Density Estimation and Gaussian White Noise , 1996 .

[13]  D. Donoho CART AND BEST-ORTHO-BASIS: A CONNECTION' , 1997 .

[14]  P. Massart,et al.  Risk bounds for model selection via penalization , 1999 .

[15]  B. Vidakovic,et al.  Adaptive wavelet estimator for nonparametric density deconvolution , 1999 .

[16]  A. Juditsky,et al.  Functional aggregation for nonparametric regression , 2000 .

[17]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[18]  Colin L. Mallows,et al.  Some Comments on Cp , 2000, Technometrics.

[19]  Yuhong Yang Combining Different Procedures for Adaptive Regression , 2000, Journal of Multivariate Analysis.

[20]  Jean Jacod,et al.  Diffusions with measurement errors. I. Local Asymptotic Normality , 2001 .

[21]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[22]  S. Geer,et al.  Adaptive estimation with soft thresholding penalties , 2002 .

[23]  M. Nussbaum,et al.  Asymptotic equivalence for nonparametric regression , 2002 .

[24]  T. N. Sriram Asymptotics in Statistics–Some Basic Concepts , 2002 .

[25]  Alexandre B. Tsybakov,et al.  Optimal Rates of Aggregation , 2003, COLT.

[26]  Jiashun Jin Detecting and estimating sparse mixtures , 2003 .

[27]  Olivier Catoni,et al.  Statistical learning theory and stochastic optimization , 2004 .

[28]  I. Johnstone,et al.  Periodic boxcar deconvolution and diophantine approximation , 2004, math/0503663.

[29]  D. Donoho,et al.  Higher criticism for detecting sparse heterogeneous mixtures , 2004, math/0410072.

[30]  Gerard Kerkyacharian,et al.  Wavelet deconvolution in a periodic setting , 2004 .

[31]  I. Johnstone,et al.  Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences , 2004, math/0410088.

[32]  Lan Zhang Efficient Estimation of Stochastic Volatility Using Noisy Observations: A Multi-Scale Approach , 2004, math/0411397.

[33]  I. Johnstone,et al.  Adapting to unknown sparsity by controlling the false discovery rate , 2005, math/0505374.

[34]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[35]  Markus Reiss,et al.  Asymptotic equivalence for nonparametric regression with multivariate and random design , 2006, math/0607342.

[36]  A. Caponnetto Optimal Rates for Regularization Operators in Learning Theory , 2006 .

[37]  L. Cavalier,et al.  Risk hull method and regularization by projections of ill-posed inverse problems , 2006, math/0611228.

[38]  N. Shephard,et al.  Designing realised kernels to measure the ex-post variation of equity prices in the presence of noise , 2006 .

[39]  A. Tsybakov,et al.  Linear and convex aggregation of density estimators , 2006, math/0605292.

[40]  L. Birge,et al.  Model selection via testing: an alternative to (penalized) maximum likelihood estimators , 2006 .

[41]  Jean Jacod,et al.  Microstructure Noise in the Continuous Case: The Pre-Averaging Approach - JLMPV-9 , 2007 .

[42]  A. Tsybakov,et al.  Aggregation for Gaussian regression , 2007, 0710.3654.

[43]  Karim Lounici Generalized mirror averaging and D-convex aggregation , 2007 .

[44]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..

[45]  P. Massart,et al.  Minimal Penalties for Gaussian Model Selection , 2007 .

[46]  A. Munk,et al.  Non‐parametric confidence bands in deconvolution density estimation , 2007 .

[47]  F. Comte,et al.  Adaptive estimation of the conditional density in presence of censoring. , 2007 .

[48]  Cristina Butucea,et al.  Sharp Optimality in Density Deconvolution with Dominating Bias. II , 2008 .

[49]  P. Bickel,et al.  Regularized estimation of large covariance matrices , 2008, 0803.1909.

[50]  A. V. D. Vaart,et al.  Nonparametric Bayesian model selection and averaging , 2008, 0802.0069.

[51]  Malay Ghosh,et al.  Theoretical measures of relative performance of classifiers for high dimensional data with small sample sizes , 2008 .

[52]  D. Donoho,et al.  Higher criticism thresholding: Optimal feature selection when useful features are rare and weak , 2008, Proceedings of the National Academy of Sciences.

[53]  F. Comte,et al.  Adaptive estimation of the conditional intensity of marker-dependent counting processes , 2008, 0810.4263.

[54]  Lucien Birg'e Model selection for density estimation with L2-loss , 2008, 0808.1416.

[55]  Jiashun Jin,et al.  Feature selection by higher criticism thresholding achieves the optimal phase diagram , 2008, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[56]  A. Goldenshluger A universal procedure for aggregating estimators , 2007, 0704.2500.

[57]  Axel Munk,et al.  Nonparametric Estimation of the Volatility Function in a High-Frequency Model corrupted by Noise , 2009, 0908.3163.

[58]  Francis R. Bach,et al.  Data-driven calibration of linear estimators with minimal penalties , 2009, NIPS.

[59]  Yu. I. Ingster,et al.  Classification of sparse high-dimensional vectors , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[60]  M. Lerasle Optimal model selection in density estimation , 2009, 0910.1654.

[61]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[62]  R. Nickl,et al.  Uniform limit theorems for wavelet density estimators , 2008, 0805.1406.

[63]  N. Akakpo Estimation adaptative par sélection de partitions en rectangles dyadiques , 2009 .

[64]  Van Der Vaart,et al.  Adaptive Bayesian estimation using a Gaussian random field with inverse Gamma bandwidth , 2009, 0908.3556.

[65]  Pascal Massart,et al.  Data-driven Calibration of Penalties for Least-Squares Regression , 2008, J. Mach. Learn. Res..

[66]  Jiashun Jin,et al.  Impossibility of successful classification when useful features are rare and weak , 2009, Proceedings of the National Academy of Sciences.

[67]  M. Maathuis,et al.  Estimating high-dimensional intervention effects from observational data , 2008, 0810.4214.

[68]  P. Bickel,et al.  Covariance regularization by thresholding , 2009, 0901.3079.

[69]  Wenhua Jiang,et al.  General maximum likelihood empirical Bayes estimation of normal means , 2009, 0908.1709.

[70]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[71]  Harrison H. Zhou,et al.  Optimal rates of convergence for covariance matrix estimation , 2010, 1010.3866.

[72]  F. Comte,et al.  Minimax estimation of the conditional cumulative distribution function , 2010 .

[73]  A. Tsybakov,et al.  Estimation of high-dimensional low-rank matrices , 2009, 0912.5338.

[74]  Martin J. Wainwright,et al.  Estimation of (near) low-rank matrices with noise and high-dimensional scaling , 2009, ICML.

[75]  A. Tsybakov,et al.  Exponential Screening and optimal rates of sparse estimation , 2010, 1003.2654.

[76]  R. Nickl,et al.  CONFIDENCE BANDS IN DENSITY ESTIMATION , 2010, 1002.4801.

[77]  Emmanuel J. Candès,et al.  Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements , 2010, ArXiv.

[78]  Peter Bühlmann,et al.  Predicting causal effects in large-scale systems from observational data , 2010, Nature Methods.

[79]  Gilles Blanchard,et al.  Kernel Partial Least Squares is Universally Consistent , 2010, AISTATS.

[80]  F. Comte,et al.  Conditional mean residual life estimation , 2011 .

[81]  C. Lacour,et al.  Inhomogeneous and Anisotropic Conditional Density Estimation from Dependent Data , 2011 .

[82]  R. Nickl,et al.  GLOBAL UNIFORM RISK BOUNDS FOR WAVELET DECONVOLUTION ESTIMATORS , 2011, 1103.1489.

[83]  Harrison H. Zhou,et al.  MINIMAX ESTIMATION OF LARGE COVARIANCE MATRICES UNDER ℓ1-NORM , 2012 .

[84]  A. V. D. Vaart,et al.  Needles and Straw in a Haystack: Posterior concentration for possibly sparse sequences , 2012, 1211.1197.

[85]  E. Mammen,et al.  Nonparametric regression with nonparametrically generated covariates , 2012, 1207.5594.

[86]  D. Donoho,et al.  Minimax risk over / p-balls for / q-error , 2022 .