Automated calibration for stability selection in penalised regression and graphical models: a multi-OMICs network application exploring the molecular response to tobacco smoking

Stability selection represents an attractive approach to identify sparse sets of features jointly associated with an outcome in high-dimensional contexts. We introduce an automated calibration procedure via maximisation of an in-house stability score and accommodating a priori-known block structure (e.g. multi-OMIC) data. It applies to (LASSO) penalised regression and graphical models. Simulations show our approach outperforms non-stability-based and stability selection approaches using the original calibration. Application to multi-block graphical LASSO on real (epigenetic and transcriptomic) data from the Norwegian Women and Cancer study reveals a central/credible and novel cross-OMIC role of the LRRN3 in the biological response to smoking.

[1]  Christophe Ambroise,et al.  Statistical Applications in Genetics and Molecular Biology Weighted-LASSO for Structured Network Inference from Time Course Data , 2011 .

[2]  Christophe Ambroise,et al.  Inferring sparse Gaussian graphical models with latent structure , 2008, 0810.3177.

[3]  Stéphane Robin,et al.  Variational Inference for sparse network reconstruction from count data , 2018, ICML.

[4]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[5]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[6]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[7]  Christophe Ambroise,et al.  SIMoNe: Statistical Inference for MOdular NEtworks , 2009, Bioinform..

[8]  Wibke Busch,et al.  Prospects and challenges of multi-omics data integration in toxicology , 2020, Archives of Toxicology.

[9]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[10]  Rajen Dinesh Shah,et al.  Variable selection with error control: another look at stability selection , 2011, 1105.5578.

[11]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[12]  C. Giraud Estimation of Gaussian graphs by model selection , 2007, 0710.2044.

[13]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[14]  Mark D. Robinson,et al.  Statistical methods for detecting differentially methylated loci and regions , 2014, Front. Genet..

[15]  Uwe Sauer,et al.  Biological insights through omics data integration , 2019, Current Opinion in Systems Biology.

[16]  Larry A. Wasserman,et al.  Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models , 2010, NIPS.

[17]  Dean P. Jones,et al.  The Exposome: Molecules to Populations. , 2019, Annual review of pharmacology and toxicology.

[18]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[19]  Paolo Vineis,et al.  Deciphering the complex: Methodological overview of statistical models to derive OMICS‐based biomarkers , 2013, Environmental and molecular mutagenesis.

[20]  Mátyás A. Sustik,et al.  GLASSOFAST : An efficient GLASSO implementation , 2012 .

[21]  Paolo Vineis,et al.  Dynamics of smoking-induced genome-wide methylation changes with time since smoking cessation. , 2015, Human molecular genetics.

[22]  Richard Bonneau,et al.  Generalized Stability Approach for Regularized Graphical Models , 2016, 1605.07072.

[23]  G. Wahba,et al.  A NOTE ON THE LASSO AND RELATED PROCEDURES IN MODEL SELECTION , 2006 .

[24]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[25]  B. Lushniak,et al.  The Health consequences of smoking—50 years of progress : a report of the Surgeon General , 2014 .

[26]  Trevor Hastie,et al.  Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent. , 2011, Journal of statistical software.

[27]  M. Peters,et al.  A whole-blood transcriptome meta-analysis identifies gene expression signatures of cigarette smoking. , 2016, Human molecular genetics.

[28]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[29]  Paolo Vineis,et al.  Epigenetic Signatures of Cigarette Smoking , 2016, Circulation. Cardiovascular genetics.

[30]  Peter A. Jones Functions of DNA methylation: islands, start sites, gene bodies and beyond , 2012, Nature Reviews Genetics.

[31]  J. Friedman,et al.  New Insights and Faster Computations for the Graphical Lasso , 2011 .

[32]  Larry A. Wasserman,et al.  The huge Package for High-dimensional Undirected Graph Estimation in R , 2012, J. Mach. Learn. Res..

[33]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[34]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[35]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[36]  Rina Foygel,et al.  Extended Bayesian Information Criteria for Gaussian Graphical Models , 2010, NIPS.