Tikhonov or Lasso Regularization: Which Is Better and When

It is well known that supervised learning problems with ℓ1 (Lasso) and ℓ2 (Tikhonov or Ridge) regularizers will result in very different solutions. For example, the ℓ1 solution vector will be sparser and can potentially be used both for prediction and feature selection. However, given a data set it is often hard to determine which form of regularization is more applicable in a given context. In this paper we use mathematical properties of the two regularization methods followed by detailed experimentation to understand their impact based on four characteristics: non-stationarity of the data generating process, level of noise in the data sensing mechanism, degree of correlation between dependent and independent variables and the shape of the data set. The practical outcome of our research is that it can serve as a guide for practitioners of large scale data mining and machine learning tools in their day-to-day practice.

[1]  Laurent El Ghaoui,et al.  Safe Feature Elimination for the LASSO and Sparse Supervised Learning Problems , 2010, 1009.4219.

[2]  Katharina Morik,et al.  Analysing Customer Churn in Insurance Data - A Case Study , 2004, PKDD.

[3]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[4]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[5]  Joshua M. Stuart,et al.  MICROARRAY EXPERIMENTS : APPLICATION TO SPORULATION TIME SERIES , 1999 .

[6]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[7]  Lei Zhang,et al.  Sparse representation or collaborative representation: Which helps face recognition? , 2011, 2011 International Conference on Computer Vision.

[8]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[9]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[10]  Carla E. Brodley,et al.  Proceedings of the twenty-first international conference on Machine learning , 2004, International Conference on Machine Learning.

[11]  Jieping Ye,et al.  An efficient algorithm for a class of fused lasso problems , 2010, KDD.

[12]  Chia-Hua Ho,et al.  An improved GLMNET for l1-regularized logistic regression , 2011, J. Mach. Learn. Res..

[13]  Hua Li,et al.  Demographic prediction based on user's browsing behavior , 2007, WWW '07.

[14]  Shie Mannor,et al.  Robust Regression and Lasso , 2008, IEEE Transactions on Information Theory.

[15]  John Langford,et al.  Sparse Online Learning via Truncated Gradient , 2008, NIPS.

[16]  R. Tibshirani,et al.  Strong rules for discarding predictors in lasso‐type problems , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[17]  Jing Li,et al.  Heterogeneous data fusion for alzheimer's disease study , 2008, KDD.

[18]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .