Gene Expression Network Reconstruction by Convex Feature Selection when Incorporating Genetic Perturbations

Cellular gene expression measurements contain regulatory information that can be used to discover novel network relationships. Here, we present a new algorithm for network reconstruction powered by the adaptive lasso, a theoretically and empirically well-behaved method for selecting the regulatory features of a network. Any algorithms designed for network discovery that make use of directed probabilistic graphs require perturbations, produced by either experiments or naturally occurring genetic variation, to successfully infer unique regulatory relationships from gene expression data. Our approach makes use of appropriately selected cis-expression Quantitative Trait Loci (cis-eQTL), which provide a sufficient set of independent perturbations for maximum network resolution. We compare the performance of our network reconstruction algorithm to four other approaches: the PC-algorithm, QTLnet, the QDG algorithm, and the NEO algorithm, all of which have been used to reconstruct directed networks among phenotypes leveraging QTL. We show that the adaptive lasso can outperform these algorithms for networks of ten genes and ten cis-eQTL, and is competitive with the QDG algorithm for networks with thirty genes and thirty cis-eQTL, with rich topologies and hundreds of samples. Using this novel approach, we identify unique sets of directed relationships in Saccharomyces cerevisiae when analyzing genome-wide gene expression data for an intercross between a wild strain and a lab strain. We recover novel putative network relationships between a tyrosine biosynthesis gene (TYR1), and genes involved in endocytosis (RCY1), the spindle checkpoint (BUB2), sulfonate catabolism (JLP1), and cell-cell communication (PRM7). Our algorithm provides a synthesis of feature selection methods and graphical model theory that has the potential to reveal new directed regulatory relationships from the analysis of population level genetic and gene expression data.

[1]  Rachel B. Brem,et al.  Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks , 2008, Nature Genetics.

[2]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[3]  A. Burlingame,et al.  The nucleoporin Nup60p functions as a Gsp1p–GTP-sensitive tether for Nup2p at the nuclear pore complex , 2001, The Journal of cell biology.

[4]  Andreas Wagner,et al.  How to reconstruct a large genetic network from n gene perturbations in fewer than n2 easy steps , 2001, Bioinform..

[5]  Howard Riezman,et al.  The F-Box Protein Rcy1p Is Involved in Endocytic Membrane Traffic and Recycling Out of an Early Endosome in Saccharomyces cerevisiae , 2000, The Journal of cell biology.

[6]  J. Lamb,et al.  Elucidating the murine brain transcriptional network in a segregating mouse population to identify core functional modules for obesity and diabetes , 2006, Journal of neurochemistry.

[7]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[8]  Jun Zhu,et al.  Increasing the Power to Detect Causal Associations by Combining Genotypic and Expression Data in Segregating Populations , 2007, PLoS Comput. Biol..

[9]  S. Horvath,et al.  Variations in DNA elucidate molecular networks that cause disease , 2008, Nature.

[10]  Michael I. Jordan Graphical Models , 2003 .

[11]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[12]  M. Culbertson,et al.  The Putative Nucleic Acid Helicase Sen1p Is Required for Formation and Stability of Termini and for Maximal Rates of Synthesis and Levels of Accumulation of Small Nucleolar RNAs inSaccharomyces cerevisiae , 1998, Molecular and Cellular Biology.

[13]  H. Feldmann,et al.  Characterization of the prephenate dehydrogenase-encoding gene, TYR1, from Saccharomyces cerevisiae. , 1989, Gene.

[14]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[15]  Eric E Schadt,et al.  Cis-acting expression quantitative trait loci in mice. , 2005, Genome research.

[16]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[17]  Keith Shockley,et al.  Structural Model Analysis of Multiple Quantitative Traits , 2006, PLoS genetics.

[18]  B. Yandell,et al.  CAUSAL GRAPHICAL MODELS IN SYSTEMS GENETICS: A UNIFIED FRAMEWORK FOR JOINT INFERENCE OF CAUSAL NETWORK AND GENETIC ARCHITECTURE FOR CORRELATED PHENOTYPES. , 2010, The annals of applied statistics.

[19]  Scott T. Weiss,et al.  A graphical model approach for inferring large-scale networks integrating gene expression and genetic polymorphism , 2009, BMC Systems Biology.

[20]  J. Castle,et al.  An integrative genomics approach to infer causal associations between gene expression and disease , 2005, Nature Genetics.

[21]  R. Redon,et al.  Relative Impact of Nucleotide and Copy Number Variation on Gene Expression Phenotypes , 2007, Science.

[22]  Arnaud Doucet,et al.  A boosting approach to structure learning of graphs with and without prior knowledge , 2009, Bioinform..

[23]  Rachel B. Brem,et al.  The landscape of genetic complexity across 5,700 gene expression traits in yeast. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[25]  R. Hausinger,et al.  Cloning and Characterization of a Sulfonate/α-Ketoglutarate Dioxygenase from Saccharomyces cerevisiae , 1999, Journal of bacteriology.

[26]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[27]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[28]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[29]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..

[30]  Heinz Schwarz,et al.  Suppression of coatomer mutants by a new protein family with COPI and COPII binding motifs in Saccharomyces cerevisiae. , 2003, Molecular biology of the cell.

[31]  H. Stefánsson,et al.  Genetics of gene expression and its effect on disease , 2008, Nature.

[32]  Giovanna Lucchini,et al.  Budding Yeast Bub2 Is Localized at Spindle Pole Bodies and Activates the Mitotic Checkpoint via a Different Pathway from Mad2 , 1999, The Journal of cell biology.

[33]  R. Stoughton,et al.  Genetics of gene expression surveyed in maize, mouse and man , 2003, Nature.

[34]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[35]  A. G. de la Fuente,et al.  Gene Network Inference via Structural Equation Modeling in Genetical Genomics Experiments , 2008, Genetics.

[36]  Raphaël Guérois,et al.  20S proteasome assembly is orchestrated by two distinct pairs of chaperones in yeast and in mammals. , 2007, Molecular cell.

[37]  B. Shipley Cause and correlation in biology , 2000 .

[38]  Peter Walter,et al.  Prm1p, a Pheromone-Regulated Multispanning Membrane Protein, Facilitates Plasma Membrane Fusion during Yeast Mating , 2000, The Journal of cell biology.

[39]  Anne-Laure Boulesteix,et al.  Regularized estimation of large-scale gene association networks using graphical Gaussian models , 2009, BMC Bioinformatics.

[40]  Eric E Schadt,et al.  Disentangling molecular relationships with a causal inference test , 2009, BMC Genetics.

[41]  Steve Horvath,et al.  Using genetic markers to orient the edges in quantitative trait networks: The NEO software , 2008, BMC Systems Biology.

[42]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[43]  J. Nap,et al.  Genetical genomics : the added value from segregation , 2001 .

[44]  M. Rockman,et al.  Reverse engineering the genotype–phenotype map with natural genetic variation , 2008, Nature.

[45]  Thomas S. Richardson,et al.  A Discovery Algorithm for Directed Cyclic Graphs , 1996, UAI.

[46]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[47]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[48]  Hao Wu,et al.  R/qtl: QTL Mapping in Experimental Crosses , 2003, Bioinform..

[49]  Nir Friedman,et al.  Inferring subnetworks from perturbed expression profiles , 2001, ISMB.

[50]  Min Zou,et al.  A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data , 2005, Bioinform..

[51]  B. Yandell,et al.  Inferring Causal Phenotype Networks From Segregating Populations , 2008, Genetics.

[52]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..