论文信息 - Regularization of Case Specific Parameters: A New Approach for Improving Robustness and/or Efficiency of Statistical Methods

Regularization of Case Specific Parameters: A New Approach for Improving Robustness and/or Efficiency of Statistical Methods

Regularization methods allow one to handle a variety of inferential problems where there are more covariates than cases. This allows one to consider a potentially enormous number of covariates for a problem. We exploit the power of these techniques, supersaturating models by augmenting the “natural” covariates in the problem with an additional indicator for each case in the data set. We attach a penalty term for these case-specific indicators which is designed to produce a desired effect. For regression methods with squared error loss, an 1 type penalty for case-specific parameters produces a regression which is robust to outliers and high leverage cases. Through this modification we have devised a robust LASSO which retains desirable property of the LASSO and performs better when outlying observations exist. For quantile regression methods, an 2 type penalty decreases the variance of the fit enough to overcome an increase in bias. The paradigm thus allows us to robustify procedures which lack robustness and to increase the efficiency of procedures which are robust. Including the case-specific parameters can be viewed as a modification of the current loss function to produce better estimator. For the LASSO with the squared error loss, the modification yields Huber’s loss. The check loss function in quantile regression is adjusted to be quadratic near its minimum. This modification produces an averaging effect near the target quantile thus more efficient quantile estimation in various settings. Applications ii to classification procedures such as logistic regression and support vector machines are also considered. Finally, a modification to cross validation through use of a new validation function in quantile regression is investigated. The new validation function makes use of the same adjusted check loss which is used for estimation.

Yoonsuh Jung

[1] Yufeng Liu,et al. VARIABLE SELECTION IN QUANTILE REGRESSION , 2009 .

[2] Ji Zhu,et al. L1-Norm Quantile Regression , 2008 .

[3] Yufeng Liu,et al. Robust Truncated Hinge Loss Support Vector Machines , 2007 .

[4] Alexander J. Smola,et al. Nonparametric Quantile Estimation , 2006, J. Mach. Learn. Res..

[5] Michael I. Jordan,et al. Convexity, Classification, and Risk Bounds , 2006 .

[6] M. Yuan,et al. Model selection and estimation in regression with grouped variables , 2006 .

[7] M. Yuan,et al. Efficient Empirical Bayes Variable Selection and Estimation in Linear Models , 2005 .

[8] D. Leung. Cross-validation in nonparametric regression with outliers , 2005, math/0602292.

[9] Michael J Cortese,et al. Visual word recognition of single-syllable words. , 2004, Journal of experimental psychology. General.

[10] James Theiler,et al. Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space , 2003, J. Mach. Learn. Res..

[11] Steve R. Gunn,et al. Structural Modelling with Sparse Kernels , 2002, Machine Learning.