Analysis of a data matrix and a graph: Metagenomic data and the phylogenetic tree

In biological experiments researchers often have information in the form of a graph that supplements observed numerical data. Incorporating the knowledge contained in these graphs into an analysis of the numerical data is an important and nontrivial task. We look at the example of metagenomic data---data from a genomic survey of the abundance of different species of bacteria in a sample. Here, the graph of interest is a phylogenetic tree depicting the interspecies relationships among the bacteria species. We illustrate that analysis of the data in a nonstandard inner-product space effectively uses this additional graphical information and produces more meaningful results.

[1]  T. F. Hansen,et al.  TRANSLATING BETWEEN MICROEVOLUTIONARY PROCESS AND MACROEVOLUTIONARY PATTERNS: THE CORRELATION STRUCTURE OF INTERSPECIFIC DATA , 1996, Evolution; international journal of organic evolution.

[2]  F. James Rohlf,et al.  COMPARATIVE METHODS FOR THE ANALYSIS OF CONTINUOUS VARIABLES: GEOMETRIC INTERPRETATIONS , 2001, Evolution; international journal of organic evolution.

[3]  D. Chessel,et al.  From dissimilarities among species to dissimilarities among communities: a double principal coordinate analysis. , 2004, Journal of theoretical biology.

[4]  Peter F. Stadler,et al.  Laplacian Eigenvectors of Graphs , 2007 .

[5]  Emmanuel Barillot,et al.  Classification of microarray data using gene networks , 2007, BMC Bioinformatics.

[6]  Louis Legendre,et al.  The Importance of Being Digital , 1963 .

[7]  Daniel Chessel,et al.  Non-symmetric correspondence analysis: an alternative for species occurrences data , 1998, Plant Ecology.

[8]  John D. Lafferty,et al.  Diffusion Kernels on Graphs and Other Discrete Input Spaces , 2002, ICML.

[9]  Anne-Béatrice Dufour,et al.  The ade4 Package: Implementing the Duality Diagram for Ecologists , 2007 .

[10]  Daniel Sabatier,et al.  CONSISTENCY BETWEEN ORDINATION TECHNIQUES AND DIVERSITY MEASUREMENTS: TWO STRATEGIES FOR SPECIES OCCURRENCE DATA , 2003 .

[11]  W Ladys Law Skarbek,et al.  Local Principal Components Analysis for Transform Coding , 1996 .

[12]  Sandrine Pavoine,et al.  Testing for phylogenetic signal in phenotypic traits: new matrices of phylogenetic proximities. , 2008, Theoretical population biology.

[13]  M. Greenacre Theory of Correspondence Analysis , 2007 .

[14]  Calyampudi R. Rao Diversity and dissimilarity coefficients: A unified approach☆ , 1982 .

[15]  A Piazza,et al.  Analysis of evolution: evolutionary rates, independence and treeness. , 1975, Theoretical population biology.

[16]  Jean Thioulouse,et al.  Multivariate analysis of spatial patterns: a unified approach to local and global structures , 1995, Environmental and Ecological Statistics.

[17]  Ravindra B. Bapat,et al.  On distance matrices and Laplacians , 2005 .

[18]  L. Excoffier,et al.  Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. , 1992, Genetics.

[19]  D. Massart,et al.  The Mahalanobis distance , 2000 .

[20]  Andrew P. Martin Phylogenetic Approaches for Describing and Comparing the Diversity of Microbial Communities , 2002, Applied and Environmental Microbiology.

[21]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[22]  E. Purdom,et al.  Diversity of the Human Intestinal Microbial Flora , 2005, Science.

[23]  Y. Escoufier,et al.  The Duality Diagram: A Means for Better Practical Applications , 1987 .

[24]  Stéphane Dray,et al.  Spatial ordination of vegetation data using a generalization of Wartenberg's multivariate spatial correlation , 2008 .

[25]  Grazia Bella,et al.  Including spatial contiguity information in the analysis of multispecific patterns , 1996, Environmental and Ecological Statistics.

[26]  E. Martins,et al.  Phylogeny shape and the phylogenetic comparative method. , 2002, Systematic biology.

[27]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003 .

[28]  J. Felsenstein,et al.  EVOLUTIONARY TREES FROM GENE FREQUENCIES AND QUANTITATIVE CHARACTERS: FINDING MAXIMUM LIKELIHOOD ESTIMATES , 1981, Evolution; international journal of organic evolution.

[29]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.