Estimating Tree-Structured Covariance Matrices via Mixed-Integer Programming

We present a novel method for estimating tree-structured covariance matrices directly from observed continuous data. Specifically, we estimate a covariance matrix from observations of p continuous random variables encoding a stochastic process over a tree with p leaves. A representation of these classes of matrices as linear combinations of rank-one matrices indicating object partitions is used to formulate estimation as instances of well-studied numerical optimization problems.In particular, our estimates are based on projection, where the covariance estimate is the nearest tree-structured covariance matrix to an observed sample covariance matrix. The problem is posed as a linear or quadratic mixed-integer program (MIP) where a setting of the integer variables in the MIP specifies a set of tree topologies of the structured covariance matrix. We solve these problems to optimality using efficient and robust existing MIP solvers.We present a case study in phylogenetic analysis of gene expression and a simulation study comparing our method to distance-based tree estimating procedures.

[1]  W. Fitch,et al.  Construction of phylogenetic trees. , 1967, Science.

[2]  L. Cavalli-Sforza,et al.  PHYLOGENETIC ANALYSIS: MODELS AND ESTIMATION PROCEDURES , 1967, Evolution; international journal of organic evolution.

[3]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[4]  T. W. Anderson Asymptotically Efficient Estimation of Covariance Matrices with Linear Structure , 1973 .

[5]  D. Penny,et al.  The Use of Tree Comparison Metrics , 1985 .

[6]  N. N. Voront︠s︡ov,et al.  The Use of Tree Comparison Metrics , 1985 .

[7]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[8]  Laurence A. Wolsey,et al.  Integer and Combinatorial Optimization , 1988 .

[9]  Charles R. Johnson,et al.  Topics in Matrix Analysis , 1991 .

[10]  Timothy J. Schulz Penalized maximum-likelihood estimation of covariance matrices with linear structure , 1997, IEEE Trans. Signal Process..

[11]  Stephen P. Boyd,et al.  Determinant Maximization with Linear Matrix Inequality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[12]  Jian Li,et al.  Computationally efficient maximum likelihood estimation of structured covariance matrices , 1999, IEEE Trans. Signal Process..

[13]  Z. Gu,et al.  Extent of gene duplication in the genomes of Drosophila, nematode, and yeast. , 2002, Molecular biology and evolution.

[14]  Scott A. Rifkin,et al.  Evolution of gene expression in the Drosophila melanogaster subgroup , 2003, Nature Genetics.

[15]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[16]  Joshua M. Stuart,et al.  Conserved Genetic Modules 5 / 29 / 2003 1 A gene co-expression network for global discovery of conserved genetic modules in H . sapiens , D . melanogaster , C . elegans , and S . cerevisiae , 2003 .

[17]  Kim-Chuan Toh,et al.  Solving semidefinite-quadratic-linear programs using SDPT3 , 2003, Math. Program..

[18]  Lusheng Wang,et al.  Haplotype inference by maximum parsimony , 2003, Bioinform..

[19]  Thomas S. Richardson,et al.  A New Algorithm for Maximum Likelihood Estimation in Gaussian Graphical Models for Marginal Independence , 2002, UAI.

[20]  M. Daly,et al.  Segmental phylogenetic relationships of inbred mouse strains revealed by fine-scale analysis of sequence variation across 4.6 mb of mouse genome. , 2004, Genome research.

[21]  Alan M. Moses,et al.  Conservation and Evolution of Cis-Regulatory Systems in Ascomycete Fungi , 2004, PLoS biology.

[22]  C. Jacq,et al.  PDR3, a new yeast regulatory gene, is homologous toPDR1 and controls the multidrug resistance phenomenon , 1994, Molecular and General Genetics MGG.

[23]  Korbinian Strimmer,et al.  APE: Analyses of Phylogenetics and Evolution in R language , 2004, Bioinform..

[24]  Janice M. Fullerton,et al.  Unexpected complexity in the haplotypes of commonly used inbred strains of laboratory mice. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Thomas S. Richardson,et al.  Iterative Conditional Fitting for Gaussian Ancestral Graph Models , 2004, UAI.

[26]  X. Gu Statistical Framework for Phylogenomic Analysis of Gene Family Expression Profiles , 2004, Genetics.

[27]  Ting Chen,et al.  An approximation algorithm for haplotype inference by maximum parsimony , 2005, SAC '05.

[28]  Todd H. Oakley,et al.  Comparative methods for the analysis of gene-expression evolution: an example using yeast functional genomic data. , 2005, Molecular biology and evolution.

[29]  Dimitris Bertsimas,et al.  Optimization over integers , 2005 .

[30]  A. Whitehead,et al.  Neutral and adaptive variation in gene expression. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Daniel G. Brown,et al.  Integer programming approaches to haplotype inference by pure parsimony , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[32]  P. McCullagh Structured covariance matrices in multivariate regression models , 2006 .

[33]  Alexandre d'Aspremont,et al.  Convex optimization techniques for fitting sparse Gaussian graphical models , 2006, ICML.

[34]  H. Jungwirth,et al.  Yeast ABC transporters – A tale of sex, stress, drugs and aging , 2006, FEBS letters.

[35]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[36]  Jon Lee,et al.  Mixed-integer nonlinear programming: Some modeling and solution issues , 2007, IBM J. Res. Dev..

[37]  Ralf Bundschuh,et al.  Large scale genotype-phenotype correlation analysis based on phylogenetic trees , 2007, Bioinform..

[38]  T. Richardson,et al.  Estimation of a covariance matrix with zeros , 2005, math/0508268.

[39]  Justin C. Fay,et al.  Evaluating the role of natural selection in the evolution of gene regulation , 2008, Heredity.

[40]  Guy E. Blelloch,et al.  Mixed Integer Linear Programming for Maximum-Parsimony Phylogeny Inference , 2008, IEEE ACM Trans. Comput. Biol. Bioinform..