Inferring causal phenotype networks using structural equation models

Phenotypic traits may exert causal effects between them. For example, on the one hand, high yield in dairy cows may increase the liability to certain diseases and, on the other hand, the incidence of a disease may affect yield negatively. Likewise, the transcriptome may be a function of the reproductive status in mammals and the latter may depend on other physiological variables. Knowledge of phenotype networks describing such interrelationships can be used to predict the behavior of complex systems, e.g. biological pathways underlying complex traits such as diseases, growth and reproduction. Structural Equation Models (SEM) can be used to study recursive and simultaneous relationships among phenotypes in multivariate systems such as genetical genomics, system biology, and multiple trait models in quantitative genetics. Hence, SEM can produce an interpretation of relationships among traits which differs from that obtained with traditional multiple trait models, in which all relationships are represented by symmetric linear associations among random variables, such as covariances and correlations. In this review, we discuss the application of SEM and related techniques for the study of multiple phenotypes. Two basic scenarios are considered, one pertaining to genetical genomics studies, in which QTL or molecular marker information is used to facilitate causal inference, and another related to quantitative genetic analysis in livestock, in which only phenotypic and pedigree information is available. Advantages and limitations of SEM compared to traditional approaches commonly used for the analysis of multiple traits, as well as some indication of future research in this area are presented in a concluding section.

[1]  L R Schaeffer,et al.  Relationships between milk yield and somatic cell score in Canadian Holsteins from simultaneous and recursive random regression models. , 2010, Journal of dairy science.

[2]  R. L. Quaas,et al.  Multiple Trait Evaluation Using Relatives' Records , 1976 .

[3]  P. Spirtes,et al.  Causation, Prediction, and Search, 2nd Edition , 2001 .

[4]  B. Bainbridge,et al.  Genetics , 1981, Experientia.

[5]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[6]  B. Shipley Cause and correlation in biology , 2000 .

[7]  D Gianola,et al.  Reproducing kernel Hilbert spaces regression: a general framework for genetic evaluation. , 2009, Journal of animal science.

[8]  D. Gianola,et al.  Inferring relationships between health and fertility in Norwegian Red cows using recursive models. , 2009, Journal of dairy science.

[9]  Daniel Gianola,et al.  Quantitative Genetic Models for Describing Simultaneous and Recursive Relationships Between Phenotypes This article is dedicated to Arthur B. Chapman, teacher and mentor of numerous animal breeding students and disciple and friend of Sewall Wright. , 2004, Genetics.

[10]  Robin Thompson,et al.  Analysis of Litter Size and Average Litter Weight in Pigs Using a Recursive Model , 2007, Genetics.

[11]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[12]  D Gianola,et al.  Inferring relationships between somatic cell score and milk yield using simultaneous and recursive models. , 2007, Journal of dairy science.

[13]  B. Yandell,et al.  CAUSAL GRAPHICAL MODELS IN SYSTEMS GENETICS: A UNIFIED FRAMEWORK FOR JOINT INFERENCE OF CAUSAL NETWORK AND GENETIC ARCHITECTURE FOR CORRELATED PHENOTYPES. , 2010, The annals of applied statistics.

[14]  A. Vazquez,et al.  Integrating biological information into the statistical analysis and design of microarray experiments. , 2010, Animal : an international journal of animal bioscience.

[15]  D. Gianola,et al.  Exploration of relationships between claw disorders and milk yield in Holstein cows via recursive linear and threshold models. , 2008, Journal of dairy science.

[16]  D. A. Kenny,et al.  Correlation and Causation , 1937, Wilmott.

[17]  A. G. de la Fuente,et al.  Gene Network Inference via Structural Equation Modeling in Genetical Genomics Experiments , 2008, Genetics.

[18]  Rudy Guerra,et al.  Likelihood, Bayesian and MCMC Methods in Quantitative Genetics , 2008 .

[19]  D. Gianola,et al.  Bayesian structural equation models for inferring relationships between phenotypes: a review of methodology, identifiability, and applications. , 2010, Journal of animal breeding and genetics = Zeitschrift fur Tierzuchtung und Zuchtungsbiologie.

[20]  Charles Darwin,et al.  Experiments , 1800, The Medical and physical journal.

[21]  Steve Horvath,et al.  Using genetic markers to orient the edges in quantitative trait networks: The NEO software , 2008, BMC Systems Biology.

[22]  D. Duffy,et al.  Inferring the direction of causation in cross‐sectional twin data: Theoretical and empirical considerations , 1994, Genetic epidemiology.

[23]  D. Gianola,et al.  A structural equation model for describing relationships between somatic cell score and milk yield in dairy goats. , 2006, Journal of animal science.

[24]  T. Haavelmo The Statistical Implications of a System of Simultaneous Equations , 1943 .

[25]  Daniel Gianola,et al.  Additive Genetic Variability and the Bayesian Alphabet , 2009, Genetics.

[26]  D. A. Kenny,et al.  Correlation and Causation. , 1982 .

[27]  Keith Shockley,et al.  Structural Model Analysis of Multiple Quantitative Traits , 2006, PLoS genetics.

[28]  Raphael Mrode,et al.  Linear models for the prediction of animal breeding values , 1996 .

[29]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[30]  Nature Genetics , 1991, Nature.

[31]  C. Deng,et al.  Characterization of porcine ENO3: genomic and cDNA structure, polymorphism and expression , 2008, Genetics Selection Evolution.

[32]  Jingyuan Fu,et al.  Defining gene and QTL networks. , 2009, Current opinion in plant biology.

[33]  R. Kahn,et al.  Multivariate Genetic Analysis of Brain Structure in an Extended Twin Design , 2000, Behavior genetics.

[34]  D Gianola,et al.  A structural equation model for describing relationships between somatic cell score and milk yield in first-lactation dairy cows. , 2006, Journal of dairy science.

[35]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[36]  J. Castle,et al.  An integrative genomics approach to infer causal associations between gene expression and disease , 2005, Nature Genetics.

[37]  D. Gianola,et al.  Exploration of lagged relationships between mastitis and milk yield in dairycows using a Bayesian structural equation Gaussian-threshold model , 2008, Genetics Selection Evolution.

[38]  J. Grace Shipley, B. Cause and correlation in Biology , 2002 .

[39]  K. Weigel,et al.  Exploring Biological Relationships Between Calving Traits in Primiparous Cattle with a Bayesian Recursive Model , 2009, Genetics.

[40]  Guilherme J M Rosa,et al.  Searching for Recursive Causal Structures in Multivariate Quantitative Genetics Mixed Models , 2010, Genetics.

[41]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[42]  David V Conti,et al.  Commentary: the concept of 'Mendelian Randomization'. , 2004, International journal of epidemiology.

[43]  Xiao-Lin Wu,et al.  Modeling relationships between calving traits: a comparison between standard and recursive mixed models , 2010, Genetics Selection Evolution.

[44]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[45]  B. Yandell,et al.  Inferring Causal Phenotype Networks From Segregating Populations , 2008, Genetics.