On Uniform Deviations of General Empirical Risks with Unboundedness, Dependence, and High Dimensionality

The statistical learning theory of risk minimization depends heavily on probability bounds for uniform deviations of the empirical risks. Classical probability bounds using Hoeffding's inequality cannot accommodate more general situations with unbounded loss and dependent data. The current paper introduces an inequality that extends Hoeffding's inequality to handle these more general situations. We will apply this inequality to provide probability bounds for uniform deviations in a very general framework, which can involve discrete decision rules, unbounded loss, and a dependence structure that can be more general than either martingale or strong mixing. We will consider two examples with high dimensional predictors: autoregression (AR) with l1-loss, and ARX model with variable selection for sign classification, which uses both lagged responses and exogenous predictors.

[1]  M. Tanner,et al.  RISK MINIMIZATION FOR TIME SERIES BINARY CHOICE WITH VARIABLE SELECTION , 2010, Econometric Theory.

[2]  E. Rio,et al.  Concentration inequalities, large and moderate deviations for self-normalized empirical processes , 2002 .

[3]  ZhangTong,et al.  Generalization error bounds for Bayesian mixture algorithms , 2003 .

[4]  Gábor Lugosi,et al.  Introduction to Statistical Learning Theory , 2004, Advanced Lectures on Machine Learning.

[5]  P. Bartlett,et al.  An inequality for uniform deviations of sample averages from their means , 1999 .

[6]  Tong Zhang,et al.  Information-theoretic upper and lower bounds for statistical estimation , 2006, IEEE Transactions on Information Theory.

[7]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[8]  M. Habib Probabilistic methods for algorithmic discrete mathematics , 1998 .

[9]  M. Tanner,et al.  Gibbs posterior for variable selection in high-dimensional classification and data mining , 2008, 0810.5655.

[10]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[11]  S. Geer Empirical Processes in M-Estimation , 2000 .

[12]  Christian Francq,et al.  MIXING PROPERTIES OF A GENERAL CLASS OF GARCH(1,1) MODELS WITHOUT MOMENT ASSUMPTIONS ON THE OBSERVED PROCESS , 2006, Econometric Theory.

[13]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[14]  D. Pollard Uniform ratio limit theorems for empirical processes , 1995 .

[15]  Bin Yu RATES OF CONVERGENCE FOR EMPIRICAL PROCESSES OF STATIONARY MIXING SEQUENCES , 1994 .

[16]  J. Horowitz A Smoothed Maximum Score Estimator for the Binary Response Model , 1992 .

[17]  W. Newey,et al.  Uniform Convergence in Probability and Stochastic Equicontinuity , 1991 .

[18]  M. Vidyasagar Convergence of Empirical Means with Alpha-Mixing Input Sequences, and an Application to PAC Learning , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[19]  Ron Meir,et al.  Generalization Error Bounds for Bayesian Mixture Algorithms , 2003, J. Mach. Learn. Res..

[20]  Robert M. de Jong,et al.  DYNAMIC TIME SERIES BINARY CHOICE , 2011, Econometric Theory.

[21]  S. R. Jammalamadaka,et al.  Empirical Processes in M-Estimation , 2001 .

[22]  J. Davidson Stochastic Limit Theory , 1994 .

[23]  Sanjeev R. Kulkarni,et al.  Convergence and Consistency of Regularized Boosting Algorithms with Stationary B-Mixing Observations , 2005, NIPS.

[24]  中澤 真,et al.  Devroye, L., Gyorfi, L. and Lugosi, G. : A Probabilistic Theory of Pattern Recognition, Springer (1996). , 1997 .

[25]  Luoqing Li,et al.  The performance bounds of learning machines based on exponentially strongly mixing sequences , 2007, Comput. Math. Appl..