Phylogenetic analysis of gene expression.

Phylogenetic analyses of gene expression have great potential for addressing a wide range of questions. These analyses will, for example, identify genes that have evolutionary shifts in expression that are correlated with evolutionary changes in morphological, physiological, and developmental characters of interest. This will provide entirely new opportunities to identify genes related to particular phenotypes. There are, however, 3 key challenges that must be addressed for such studies to realize their potential. First, data on gene expression must be measured from multiple species, some of which may be field-collected, and parameterized in such a way that they can be compared across species. Second, it will be necessary to develop comparative phylogenetic methods suitable for large multidimensional datasets. In most phylogenetic comparative studies to date, the number n of independent observations (independent contrasts) has been greater than the number p of variables (characters). The behavior of comparative methods for these classic problems is now well understood under a wide variety of conditions. In studies of gene expression, and in studies based on other high-throughput tools, the number n of samples is dwarfed by the number p of variables. The estimated covariance matrices will be singular, complicating their analysis and interpretation, and prone to spurious results. Third, new approaches are needed to investigate the expression of the many genes whose phylogenies are not congruent with species phylogenies due to gene loss, gene duplication, and incomplete lineage sorting. Here we outline general considerations of project design for phylogenetic analyses of gene expression and suggest solutions to these three categories of challenges. These topics are relevant to high-throughput phenotypic data well beyond gene expression.

[1]  J. Lagergren,et al.  Probabilistic orthology analysis. , 2009, Systematic biology.

[2]  R. Doerge,et al.  Statistical Design and Analysis of RNA Sequencing Data , 2010, Genetics.

[3]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[4]  A. Grafen The phylogenetic regression. , 1989, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[5]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[6]  K. Hansen,et al.  Biases in Illumina transcriptome sequencing caused by random hexamer priming , 2010, Nucleic acids research.

[7]  Bengt Sennblad,et al.  The gene evolution model and computing its associated probabilities , 2009, JACM.

[8]  T. Garland,et al.  Within-species variation and measurement error in phylogenetic comparative methods. , 2007, Systematic biology.

[9]  Predrag Radivojac,et al.  Testing the Ortholog Conjecture with Comparative Functional Genomic Data from Mammals , 2011, PLoS Comput. Biol..

[10]  Xi Luo High Dimensional Low Rank and Sparse Covariance Matrix Estimation via Convex Minimization , 2011 .

[11]  R. Vossen,et al.  Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms , 2008, Nucleic acids research.

[12]  J. Lagergren,et al.  Simultaneous Bayesian gene tree reconstruction and reconciliation analysis , 2009, Proceedings of the National Academy of Sciences.

[13]  J. Felsenstein Comparative Methods with Sampling Error and Within‐Species Variation: Contrasts Revisited and Revised , 2008, The American Naturalist.

[14]  P. Bickel,et al.  Covariance regularization by thresholding , 2009, 0901.3079.

[15]  J. Felsenstein Phylogenies and the Comparative Method , 1985, The American Naturalist.

[16]  K. Hansen,et al.  Removing technical variability in RNA-seq data using conditional quantile normalization , 2012, Biostatistics.

[17]  Scott A. Rifkin,et al.  Evolution of gene expression in the Drosophila melanogaster subgroup , 2003, Nature Genetics.

[18]  Scott A. Rifkin,et al.  Natural selection on gene expression. , 2006, Trends in genetics : TIG.

[19]  Mark D. Robinson,et al.  Differential Gene Expression in the Siphonophore Nanomia bijuga (Cnidaria) Assessed with Multiple Next-Generation Sequencing Workflows , 2011, PloS one.

[20]  D. Hartl,et al.  Optimization of gene expression by natural selection , 2009, Proceedings of the National Academy of Sciences.

[21]  John D. Storey,et al.  Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis , 2007, PLoS genetics.

[22]  Manolis Kellis,et al.  TreeFix: Statistically Informed Gene Tree Error Correction Using Species Trees , 2012, Systematic biology.

[23]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[24]  S. Bergmann,et al.  The evolution of gene expression levels in mammalian organs , 2011, Nature.

[25]  A. Force,et al.  Preservation of duplicate genes by complementary, degenerative mutations. , 1999, Genetics.

[26]  Kousha Etessami,et al.  Recursive Markov chains, stochastic grammars, and monotone systems of nonlinear equations , 2005, JACM.

[27]  Manolis Kellis,et al.  A Bayesian Approach for Fast and Accurate Gene Tree Reconstruction , 2010, Molecular biology and evolution.

[28]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[29]  Y. Gilad,et al.  Comparative studies of gene expression and the evolution of gene regulation , 2012, Nature Reviews Genetics.

[30]  J. Townsend,et al.  Evolving gene expression: from G to E to GxE. , 2009, Trends in ecology & evolution.

[31]  Hongyu Zhao,et al.  Regulatory variation within and between species. , 2011, Annual review of genomics and human genetics.