Lazy lasso for local regression

Locally weighted regression is a technique that predicts the response for new data items from their neighbors in the training data set, where closer data items are assigned higher weights in the prediction. However, the original method may suffer from overfitting and fail to select the relevant variables. In this paper we propose combining a regularization approach with locally weighted regression to achieve sparse models. Specifically, the lasso is a shrinkage and selection method for linear regression. We present an algorithm that embeds lasso in an iterative procedure that alternatively computes weights and performs lasso-wise regression. The algorithm is tested on three synthetic scenarios and two real data sets. Results show that the proposed method outperforms linear and local models for several kinds of scenarios.

[1]  Yingcun Xia,et al.  Shrinkage Estimation of the Varying Coefficient Model , 2008 .

[2]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[3]  W. Cleveland,et al.  Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting , 1988 .

[4]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[5]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[6]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[7]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[8]  M. Wand,et al.  Multivariate Locally Weighted Least Squares Regression , 1994 .

[9]  Z. Q. John Lu,et al.  Nonparametric Functional Data Analysis: Theory And Practice , 2007, Technometrics.

[10]  Jianqing Fan Design-adaptive Nonparametric Regression , 1992 .

[11]  F. Ferraty,et al.  The Oxford Handbook of Functional Data Analysis , 2011, Oxford Handbooks Online.

[12]  Huajun Chen,et al.  Towards Semantic e-Science for Traditional Chinese Medicine , 2007, BMC Bioinformatics.

[13]  A. E. Hoerl,et al.  Ridge Regression: Applications to Nonorthogonal Problems , 1970 .

[14]  Frédéric Ferraty,et al.  Most-predictive design points for functional data predictors , 2010 .

[15]  E. Fowlkes Some diagnostics for binary logistic regression via smoothing , 1987 .

[16]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[17]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[18]  W. Cleveland,et al.  Smoothing by Local Regression: Principles and Methods , 1996 .

[19]  C. Mallows Some Comments on Cp , 2000, Technometrics.

[20]  Concha Bielza,et al.  Machine Learning in Bioinformatics , 2008, Encyclopedia of Database Systems.

[21]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[22]  David C. Wheeler,et al.  Simultaneous Coefficient Penalization and Model Selection in Geographically Weighted Regression: The Geographically Weighted Lasso , 2009 .

[23]  B. Silverman,et al.  Functional Data Analysis , 1997 .

[24]  Robert E Kass,et al.  Statistical issues in the analysis of neuronal data. , 2005, Journal of neurophysiology.

[25]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[26]  Guohua Pan,et al.  Local Regression and Likelihood , 1999, Technometrics.

[27]  T. Hastie,et al.  Local Regression: Automatic Kernel Carpentry , 1993 .

[28]  G. Seber,et al.  Nonlinear Regression: Seber/Nonlinear Regression , 2005 .

[29]  E. Casetti,et al.  Applications of the Expansion Method , 1991 .

[30]  W. Härdle,et al.  Statistical Theory and Computational Aspects of Smoothing , 1996 .

[31]  Jianqing Fan,et al.  Variable Bandwidth and Local Linear Regression Smoothers , 1992 .

[32]  Chris Brunsdon,et al.  Geographically Weighted Regression: The Analysis of Spatially Varying Relationships , 2002 .

[33]  J. Lafferty,et al.  Rodeo: Sparse, greedy nonparametric regression , 2008, 0803.1709.

[34]  David M. Allen,et al.  The Relationship Between Variable Selection and Data Agumentation and a Method for Prediction , 1974 .

[35]  D. Donoho For most large underdetermined systems of equations, the minimal 𝓁1‐norm near‐solution approximates the sparsest near‐solution , 2006 .

[36]  S. Weisberg Applied Linear Regression , 1981 .

[37]  Frédéric Ferraty,et al.  Locally modelled regression and functional data , 2010 .

[38]  C. Mallows More comments on C p , 1995 .

[39]  T. Hesterberg,et al.  Least angle and ℓ1 penalized regression: A review , 2008, 0802.0964.

[40]  Jian Huang,et al.  BMC Bioinformatics BioMed Central Methodology article Supervised group Lasso with applications to microarray data , 2007 .

[41]  Brian Knutson,et al.  Interpretable Classifiers for fMRI Improve Prediction of Purchases , 2008, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[42]  Jafar A. Khan,et al.  Robust Linear Model Selection Based on Least Angle Regression , 2007 .

[43]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[44]  C. L. Mallows Some comments on C_p , 1973 .

[45]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[46]  Scott D. Foster,et al.  A random model approach for the LASSO , 2008, Comput. Stat..

[47]  Ming Zhang,et al.  Comparing sequences without using alignments: application to HIV/SIV subtyping , 2007, BMC Bioinformatics.

[48]  M. Wand Local Regression and Likelihood , 2001 .