The partitioned LASSO-patternsearch algorithm with application to gene expression data

BackgroundIn systems biology, the task of reverse engineering gene pathways from data has been limited not just by the curse of dimensionality (the interaction space is huge) but also by systematic error in the data. The gene expression barcode reduces spurious association driven by batch effects and probe effects. The binary nature of the resulting expression calls lends itself perfectly to modern regularization approaches that thrive in high-dimensional settings.ResultsThe Partitioned LASSO-Patternsearch algorithm is proposed to identify patterns of multiple dichotomous risk factors for outcomes of interest in genomic studies. A partitioning scheme is used to identify promising patterns by solving many LASSO-Patternsearch subproblems in parallel. All variables that survive this stage proceed to an aggregation stage where the most significant patterns are identified by solving a reduced LASSO-Patternsearch problem in just these variables. This approach was applied to genetic data sets with expression levels dichotomized by gene expression bar code. Most of the genes and second-order interactions thus selected and are known to be related to the outcomes.ConclusionsWe demonstrate with simulations and data analyses that the proposed method not only selects variables and patterns more accurately, but also provides smaller models with better prediction accuracy, in comparison to several alternative methodologies.

[1]  Y. Miller,et al.  Lack of expression of aminoacylase-1 in small cell lung cancer. Evidence for inactivation of genes encoded by chromosome 3p. , 1989, The Journal of clinical investigation.

[2]  C. Pilarsky,et al.  CD24 is expressed in ovarian cancer and is a new independent prognostic marker of patient survival. , 2002, The American journal of pathology.

[3]  I. Petersen,et al.  CD24 is an independent prognostic marker of survival in nonsmall cell lung cancer patients , 2003, British Journal of Cancer.

[4]  M. LeBlanc,et al.  Logic Regression , 2003 .

[5]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[6]  C. Pilarsky,et al.  CD24 expression is a new prognostic marker in breast cancer. , 2003, Clinical cancer research : an official journal of the American Association for Cancer Research.

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  E. Pérez-Nadales,et al.  Essential function for ErbB3 in breast cancer proliferation , 2004, Breast Cancer Research.

[9]  W. Weichert,et al.  Cytoplasmic CD24 Expression in Colorectal Cancer Independently Correlates with Shortened Patient Survival , 2005, Clinical Cancer Research.

[10]  A. Sultan,et al.  Stat5 promotes homotypic adhesion and inhibits invasive characteristics of human breast cancer cells , 2005, Oncogene.

[11]  H. Friess,et al.  FXYD3 is overexpressed in pancreatic ductal adenocarcinoma and influences pancreatic cancer cell growth , 2006, International journal of cancer.

[12]  I. H. Koumakpayi,et al.  Expression and Nuclear Localization of ErbB3 in Prostate Cancer , 2006, Clinical Cancer Research.

[13]  Martin S. Taylor,et al.  Genome-wide genetic association of complex traits in heterogeneous stock mice , 2006, Nature Genetics.

[14]  Grace Wahba,et al.  Detecting disease-causing genes by LASSO-Patternsearch algorithm , 2007, BMC proceedings.

[15]  A. Chhabra,et al.  Expression of transcription factor CREB1 in human breast cancer and its correlation with prognosis. , 2007, Oncology reports.

[16]  T. Dang,et al.  The role of Notch3 signaling pathway in pancreatic cancer , 2007 .

[17]  R. Irizarry,et al.  A gene expression bar code for microarray data , 2007, Nature Methods.

[18]  Y. Furukawa,et al.  CDC20, a potential cancer therapeutic target, is negatively regulated by p53 , 2008, Oncogene.

[19]  Mee Young Park,et al.  Penalized logistic regression for detecting gene interactions. , 2008, Biostatistics.

[20]  Grace Wahba,et al.  LASSO-Patternsearch algorithm with application to ophthalmology and genomic data. , 2006, Statistics and its interface.

[21]  T. Ørntoft,et al.  Lysophosphatidylcholine acyltransferase 1 (LPCAT1) overexpression in human colorectal cancer , 2008, Journal of Molecular Medicine.

[22]  Hirokuni Ikeda,et al.  The estrogen receptor influences microtubule-associated protein tau (MAPT) expression and the selective estrogen receptor inhibitor fulvestrant downregulates MAPT and increases the sensitivity to taxane in breast cancer cells , 2010, Breast Cancer Research.

[23]  Stephen J. Wright Accelerated Block-coordinate Relaxation for Regularized Optimization , 2012, SIAM J. Optim..