A New Method to Infer Causal Phenotype Networks Using QTL and Phenotypic Information

In the context of genetics and breeding research on multiple phenotypic traits, reconstructing the directional or causal structure between phenotypic traits is a prerequisite for quantifying the effects of genetic interventions on the traits. Current approaches mainly exploit the genetic effects at quantitative trait loci (QTLs) to learn about causal relationships among phenotypic traits. A requirement for using these approaches is that at least one unique QTL has been identified for each trait studied. However, in practice, especially for molecular phenotypes such as metabolites, this prerequisite is often not met due to limited sample sizes, high noise levels and small QTL effects. Here, we present a novel heuristic search algorithm called the QTL+phenotype supervised orientation (QPSO) algorithm to infer causal directions for edges in undirected phenotype networks. The two main advantages of this algorithm are: first, it does not require QTLs for each and every trait; second, it takes into account associated phenotypic interactions in addition to detected QTLs when orienting undirected edges between traits. We evaluate and compare the performance of QPSO with another state-of-the-art approach, the QTL-directed dependency graph (QDG) algorithm. Simulation results show that our method has broader applicability and leads to more accurate overall orientations. We also illustrate our method with a real-life example involving 24 metabolites and a few major QTLs measured on an association panel of 93 tomato cultivars. Matlab source code implementing the proposed algorithm is freely available upon request.

[1]  Yang Li,et al.  Critical reasoning on causal inference in genome-wide linkage and association studies. , 2010, Trends in genetics : TIG.

[2]  S. Baldermann,et al.  Functional characterization of a carotenoid cleavage dioxygenase 1 and its relation to the carotenoid accumulation and volatile emission during the floral development of Osmanthus fragrans Lour. , 2010, Journal of experimental botany.

[3]  Jason G. Mezey,et al.  Sub-local constraint-based learning of Bayesian networks using a joint dependence criterion , 2013, J. Mach. Learn. Res..

[4]  Peter J. F. Lucas,et al.  Constraint-based probabilistic learning of metabolic pathways from tomato volatiles , 2009, Metabolomics.

[5]  Yang Li,et al.  University of Groningen Identifying Genotype-by-Environment Interactions in the Metabolism of Germinating Arabidopsis Seeds Using Generalized Genetical Genomics , 2012 .

[6]  Xiao-Lin Wu,et al.  Inferring causal phenotype networks using structural equation models , 2011, Genetics Selection Evolution.

[7]  M. Calus,et al.  Accuracy of multi-trait genomic selection using different methods , 2011, Genetics Selection Evolution.

[8]  Andrew P. Hodges,et al.  Bayesian Network Expansion Identifies New ROS and Biofilm Regulators , 2010, PloS one.

[9]  Xiao-Lin Wu,et al.  Is Structural Equation Modeling Advantageous for the Genetic Improvement of Multiple Traits? , 2013, Genetics.

[10]  B. Yandell,et al.  CAUSAL GRAPHICAL MODELS IN SYSTEMS GENETICS: A UNIFIED FRAMEWORK FOR JOINT INFERENCE OF CAUSAL NETWORK AND GENETIC ARCHITECTURE FOR CORRELATED PHENOTYPES. , 2010, The annals of applied statistics.

[11]  Z B Zeng,et al.  Multiple trait analysis of genetic mapping for quantitative trait loci. , 1995, Genetics.

[12]  Z. Fei,et al.  Flavour compounds in tomato fruits: identification of loci and potential pathways affecting volatile composition , 2008, Journal of experimental botany.

[13]  Keith Shockley,et al.  Structural Model Analysis of Multiple Quantitative Traits , 2006, PLoS genetics.

[14]  Achim Tresch,et al.  Selective Phenotyping, Entropy Reduction, and the Mastermind game , 2011, BMC Bioinformatics.

[15]  Benjamin A. Logsdon,et al.  Gene Expression Network Reconstruction by Convex Feature Selection when Incorporating Genetic Perturbations , 2010, PLoS Comput. Biol..

[16]  Steve Horvath,et al.  Using genetic markers to orient the edges in quantitative trait networks: The NEO software , 2008, BMC Systems Biology.

[17]  David Maxwell Chickering,et al.  Learning Equivalence Classes of Bayesian Network Structures , 1996, UAI.

[18]  Marco Grzegorczyk,et al.  Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical gaussian models and bayesian networks , 2006, Bioinform..

[19]  F. A. van Eeuwijk,et al.  Multi-trait and multi-environment QTL analyses of yield and a set of physiological traits in pepper , 2013, Theoretical and Applied Genetics.

[20]  Kathleen M. Gates,et al.  Inferring functional connectivity in MRI using Bayesian network structure learning with a modified PC algorithm , 2013, NeuroImage.

[21]  Yury Tikunov,et al.  A Novel Approach for Nontargeted Data Analysis for Metabolomics. Large-Scale Profiling of Tomato Fruit Volatiles1[w] , 2005, Plant Physiology.

[22]  José Crossa,et al.  A multi-trait multi-environment QTL mixed model with an application to drought and nitrogen stress trials in maize (Zea mays L.) , 2008, Euphytica.

[23]  Rachael Hageman Blair,et al.  What Can Causal Networks Tell Us about Metabolic Pathways? , 2012, PLoS Comput. Biol..

[24]  Fernando Carrari,et al.  Metabolic Profiling of Transgenic Tomato Plants Overexpressing Hexokinase Reveals That the Influence of Hexose Phosphorylation Diminishes during Fruit Development , 2003, Plant Physiology.

[25]  Jingyuan Fu,et al.  Defining gene and QTL networks. , 2009, Current opinion in plant biology.

[26]  Fabian J. Theis,et al.  Gaussian graphical modeling reconstructs pathway reactions from high-throughput metabolomics data , 2011, BMC Systems Biology.

[27]  Prakash P. Shenoy Inference in Hybrid Bayesian Networks Using Mixtures of Gaussians , 2006, UAI.

[28]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[29]  B. Yandell,et al.  Inferring Causal Phenotype Networks From Segregating Populations , 2008, Genetics.

[30]  Yunbo Luo,et al.  Lycopene accumulation affects the biosynthesis of some carotenoid-related volatiles independent of ethylene in tomato. , 2008, Journal of integrative plant biology.

[31]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[32]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[33]  Ute Roessner,et al.  Whole-Genome Mapping of Agronomic and Metabolic Traits to Identify Novel Quantitative Trait Loci in Bread Wheat Grown in a Water-Limited Environment1[W][OA] , 2013, Plant Physiology.

[34]  Qingqiu Gong,et al.  An Arabidopsis gene network based on the graphical Gaussian model. , 2007, Genome research.

[35]  Yury Tikunov,et al.  A correlation network approach to metabolic data analysis for tomato fruits , 2008, Euphytica.

[36]  A. Granell,et al.  The expanded tomato fruit volatile landscape. , 2013, Journal of experimental botany.

[37]  J. Castle,et al.  An integrative genomics approach to infer causal associations between gene expression and disease , 2005, Nature Genetics.