Analysis of comparative data with hierarchical autocorrelation

The asymptotic behavior of estimates and information criteria in linear models are studied in the context of hierarchically correlated sampling units. The work is motivated by biological data collected on species where autocorrelation is based on the species' genealogical tree. Hierarchical autocorrelation is also found in many other kinds of data, such as from microarray experiments or human languages. Similar correlation also arises in ANOVA models with nested effects. I show that the best linear unbiased estimators are almost surely convergent but may not be consistent for some parameters such as the intercept and lineage effects, in the context of Brownian motion evolution on the genealogical tree. For the purpose of model selection I show that the usual BIC does not provide an appropriate approximation to the posterior probability of a model. To correct for this, an effective sample size is introduced for parameters that are inconsistently estimated. For biological studies, this work implies that tree-aware sampling design is desirable; adding more sampling units may not help ancestral reconstruction and only strong lineage effects may be detected with high power.

[1]  R. Mace,et al.  A phylogenetic approach to cultural evolution. , 2005, Trends in ecology & evolution.

[2]  A. King,et al.  Phylogenetic Comparative Analysis: A Modeling Approach for Adaptive Evolution , 2004, The American Naturalist.

[3]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[4]  M. Kupperman Linear Statistical Inference and Its Applications 2nd Edition (C. Radhakrishna Rao) , 1975 .

[5]  F. James Rohlf,et al.  A COMMENT ON PHYLOGENETIC CORRECTION , 2006, Evolution; international journal of organic evolution.

[6]  D. Zimmerman,et al.  Towards reconciling two asymptotic frameworks in spatial statistics , 2005 .

[7]  M. Suchard,et al.  Time squared: repeated measures on phylogenies. , 2006, Molecular biology and evolution.

[8]  Kate E. Jones,et al.  Multiple Causes of High Extinction Risk in Large Mammal Species , 2005, Science.

[9]  T. Garland,et al.  TESTING FOR PHYLOGENETIC SIGNAL IN COMPARATIVE DATA: BEHAVIORAL TRAITS ARE MORE LABILE , 2003, Evolution; international journal of organic evolution.

[10]  R. Dressler Phylogeny and classification of the orchid family , 1993 .

[11]  F. Tajima Evolutionary relationship of DNA sequences in finite populations. , 1983, Genetics.

[12]  N. L. Johnson,et al.  Linear Statistical Inference and Its Applications , 1966 .

[13]  L. Wasserman,et al.  The Selection of Prior Distributions by Formal Rules , 1996 .

[14]  Haiyan Wang,et al.  Rank tests for anova with large number of factor levels , 2004 .

[15]  Wasserman,et al.  Bayesian Model Selection and Model Averaging. , 2000, Journal of mathematical psychology.

[16]  J. Felsenstein Phylogenies and the Comparative Method , 1985, The American Naturalist.

[17]  D. Heckerman,et al.  Founder Effects in the Assessment of HIV Polymorphisms and HLA Allele Associations , 2007, Science.

[18]  H. Akaike A new look at the statistical model identification , 1974 .

[19]  L. Wasserman,et al.  A Reference Bayesian Test for Nested Hypotheses and its Relationship to the Schwarz Criterion , 1995 .

[20]  Michael G. Akritas,et al.  Asymptotics for Analysis of Variance When the Number of Levels is Large , 2000 .

[21]  L. Tierney,et al.  The validity of posterior expansions based on Laplace''s method , 1990 .

[22]  Anthony R. Ives,et al.  Using the Past to Predict the Present: Confidence Intervals for Regression Equations in Phylogenetic Comparative Methods , 2000, The American Naturalist.

[23]  B. Güven The limiting distribution of the F-statistic from nonnormal universes , 2006 .

[24]  Calyampudi R. Rao,et al.  Linear Statistical Inference and Its Applications. , 1975 .

[25]  K. Jønsson,et al.  A phylogenetic supertree of oscine passerine birds (Aves: Passeri) , 2006 .

[26]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[27]  W. R. Buckland,et al.  Distributions in Statistics: Continuous Multivariate Distributions , 1974 .

[28]  P. Rousseeuw,et al.  Wiley Series in Probability and Mathematical Statistics , 2005 .

[29]  X. Gu Statistical Framework for Phylogenomic Analysis of Gene Family Expression Profiles , 2004, Genetics.

[30]  E. Martins The Comparative Method in Evolutionary Biology, Paul H. Harvey, Mark D. Pagel. Oxford University Press, Oxford (1991), vii, + 239 Price $24.95 paperback , 1992 .

[31]  M. Peruggia Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach (2nd ed.) , 2003 .

[32]  M. Verdú,et al.  Adaptive evolution of reproductive and vegetative traits driven by breeding systems. , 2006, The New phytologist.

[33]  M. Pagel The Maximum Likelihood Approach to Reconstructing Ancestral Character States of Discrete Characters on Phylogenies , 1999 .

[34]  Calyampudi Radhakrishna Rao,et al.  Linear Statistical Inference and its Applications , 1967 .

[35]  E. Paradis,et al.  Analysis of comparative data using generalized estimating equations. , 2002, Journal of theoretical biology.

[36]  T. F. Hansen,et al.  Phylogenies and the Comparative Method: A General Approach to Incorporating Phylogenetic Information into the Analysis of Interspecific Data , 1997, The American Naturalist.

[37]  M. Pagel,et al.  Frequency of word-use predicts rates of lexical evolution throughout Indo-European history , 2007, Nature.

[38]  Jonathan P. Bollback,et al.  Empirical and hierarchical Bayesian estimation of ancestral states. , 2001, Systematic biology.

[39]  C. Davis,et al.  Floral Gigantism in Rafflesiaceae , 2007, Science.

[40]  R. Hijmans,et al.  Potato systematics and germplasm collecting, 1989–2000 , 2001, American Journal of Potato Research.

[41]  M. Pagel,et al.  Bayesian estimation of ancestral character states on phylogenies. , 2004, Systematic biology.

[42]  A. Raftery Bayesian Model Selection in Social Research , 1995 .

[43]  T. F. Hansen,et al.  TRANSLATING BETWEEN MICROEVOLUTIONARY PROCESS AND MACROEVOLUTIONARY PATTERNS: THE CORRELATION STRUCTURE OF INTERSPECIFIC DATA , 1996, Evolution; international journal of organic evolution.

[44]  W. Li,et al.  Maximum likelihood estimation of population parameters. , 1993, Genetics.

[45]  Amir Dembo,et al.  Large Deviations Techniques and Applications , 1998 .

[46]  Theodore Garland,et al.  Phylogenetic Analysis of Covariance by Computer Simulation , 1993 .

[47]  T. Garland,et al.  Phylogenetic approaches in comparative physiology , 2005, Journal of Experimental Biology.

[48]  D. Schluter,et al.  LIKELIHOOD OF ANCESTOR STATES IN ADAPTIVE RADIATION , 1997, Evolution; international journal of organic evolution.

[49]  W. R. Buckland,et al.  Distributions in Statistics: Continuous Multivariate Distributions , 1973 .

[50]  A. Rodrigo,et al.  Estimating the Ancestral States of a Continuous-Valued Character Using Squared-Change Parsimony: An Analytical Solution , 1994 .

[51]  Todd H. Oakley,et al.  Reconstructing ancestral character states: a critical reappraisal. , 1998, Trends in ecology & evolution.

[52]  T. F. Hansen STABILIZING SELECTION AND THE COMPARATIVE ANALYSIS OF ADAPTATION , 1997, Evolution; international journal of organic evolution.

[53]  M. Lynch,et al.  The Phylogenetic Mixed Model , 2004, The American Naturalist.

[54]  A. Raftery Approximate Bayes factors and accounting for model uncertainty in generalised linear models , 1996 .

[55]  Martins,et al.  Adaptation and the comparative method. , 2000, Trends in ecology & evolution.

[56]  Andy Purvis,et al.  A higher-level MRP supertree of placental mammals , 2006, BMC Evolutionary Biology.