Group Regularized Estimation Under Structural Hierarchy

ABSTRACT Variable selection for models including interactions between explanatory variables often needs to obey certain hierarchical constraints. Weak or strong structural hierarchy requires that the existence of an interaction term implies at least one or both associated main effects to be present in the model. Lately, this problem has attracted a lot of attention, but existing computational algorithms converge slow even with a moderate number of predictors. Moreover, in contrast to the rich literature on ordinary variable selection, there is a lack of statistical theory to show reasonably low error rates of hierarchical variable selection. This work investigates a new class of estimators that make use of multiple group penalties to capture structural parsimony. We show that the proposed estimators enjoy sharp rate oracle inequalities, and give the minimax lower bounds in strong and weak hierarchical variable selection. A general-purpose algorithm is developed with guaranteed convergence and global optimality. Simulations and real data experiments demonstrate the efficiency and efficacy of the proposed approach. Supplementary materials for this article are available online.

[1]  Z. Opial Weak convergence of the sequence of successive approximations for nonexpansive mappings , 1967 .

[2]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[3]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[4]  J. Nelder A Reformulation of Linear Models , 1977 .

[5]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[6]  J. Peixoto Hierarchical Variable Selection in Polynomial Regression Models , 1987 .

[7]  J. Peixoto A Property of Well-Formulated Polynomial Regression Models , 1990 .

[8]  Changbao Wu,et al.  Analysis of Designed Experiments with Complex Aliasing , 1992 .

[9]  Hugh Chipman,et al.  Bayesian variable selection with related predictors , 1995, bayes-an/9510001.

[10]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[11]  R. Pace,et al.  Sparse spatial autoregressions , 1997 .

[12]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[13]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[14]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[15]  A. Owen A robust hybrid of lasso and ridge regression , 2006 .

[16]  E. Schadt,et al.  Genetic and Genomic Analysis of a Fat Mass Trait with Complex Inheritance Reveals Marked Sex Specificity , 2006, PLoS genetics.

[17]  E. Davidson,et al.  Response to Comment on "Gene Regulatory Networks and the Evolution of Animal Body Plans" , 2006, Science.

[18]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[19]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[20]  P. L. Combettes,et al.  A Dykstra-like algorithm for two monotone operators , 2007 .

[21]  Larry A. Wasserman,et al.  SpAM: Sparse Additive Models , 2007, NIPS.

[22]  Karim Lounici Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators , 2008, 0801.4610.

[23]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[24]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[25]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[26]  Tong Zhang Some sharp performance bounds for least squares regression with L1 regularization , 2009, 0908.2869.

[27]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[28]  Ji Zhu,et al.  Variable Selection With the Strong Heredity Constraint and Its Oracle Property , 2010 .

[29]  Peter J. Bickel,et al.  Hierarchical selection of variables in sparse high-dimensional regression , 2008, 0801.1158.

[30]  Jian Huang,et al.  Consistent group selection in high-dimensional linear regression. , 2010, Bernoulli : official journal of the Bernoulli Society for Mathematical Statistics and Probability.

[31]  A. Tsybakov,et al.  Exponential Screening and optimal rates of sparse estimation , 2010, 1003.2654.

[32]  K. Roeder,et al.  Screen and clean: a tool for identifying interactions in genome‐wide association studies , 2010, Genetic epidemiology.

[33]  Gareth M. James,et al.  Variable Selection Using Adaptive Nonlinear Interaction Structures in High Dimensions , 2010 .

[34]  S. Geer,et al.  Oracle Inequalities and Optimal Inference under Group Sparsity , 2010, 1007.1771.

[35]  Rajni Singh,et al.  Haploid Insufficiency of Suppressor Enhancer Lin12 1-like (SEL1L) Protein Predisposes Mice to High Fat Diet-induced Hyperglycemia* , 2011, The Journal of Biological Chemistry.

[36]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[37]  Julien Mairal,et al.  Proximal Methods for Hierarchical Sparse Coding , 2010, J. Mach. Learn. Res..

[38]  Mark W. Schmidt,et al.  Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization , 2011, NIPS.

[39]  Bart Deplancke,et al.  Gene Regulatory Networks , 2012, Methods in Molecular Biology.

[40]  Yiyuan She,et al.  Thresholding-based Iterative Selection Procedures for Generalized Linear Models , 2009, 0911.5460.

[41]  S. Geer Weakly decomposable regularization penalties and structured sparsity , 2012, 1204.4813.

[42]  Trevor Hastie,et al.  Learning interactions through hierarchical group-lasso regularization , 2013, 1308.2719.

[43]  Dapeng Wu,et al.  Stationary-sparse causality network learning , 2013, J. Mach. Learn. Res..

[44]  Noah Simon,et al.  A Sparse-Group Lasso , 2013 .

[45]  R. Tibshirani,et al.  A LASSO FOR HIERARCHICAL INTERACTIONS. , 2012, Annals of statistics.

[46]  Ning Hao,et al.  Interaction Screening for Ultra-High Dimensional Data. , 2014, Journal of the American Statistical Association.

[47]  Ning Hao,et al.  Interaction Screening for Ultrahigh-Dimensional Data , 2014, Journal of the American Statistical Association.

[48]  Dapeng Wu,et al.  Learning Topology and Dynamics of Large Recurrent Neural Networks , 2014, IEEE Transactions on Signal Processing.

[49]  T. Hastie,et al.  Learning Interactions via Hierarchical Group-Lasso Regularization , 2015, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[50]  Luo Xiao,et al.  Convex Banding of the Covariance Matrix , 2016, Journal of the American Statistical Association.