The statistical geometry of transcriptome divergence in cell-type evolution and cancer

In evolution, body plan complexity increases due to an increase in the number of individualized cell types. Yet, there is very little understanding of the mechanisms that produce this form of organismal complexity. One model for the origin of novel cell types is the sister cell-type model. According to this model, each cell type arises together with a sister cell type through specialization from an ancestral cell type. A key prediction of the sister cell-type model is that gene expression profiles of cell types exhibit tree structure. Here we present a statistical model for detecting tree structure in transcriptomic data and apply it to transcriptomes from ENCODE and FANTOM5. We show that transcriptomes of normal cells harbour substantial amounts of hierarchical structure. In contrast, cancer cell lines have less tree structure, suggesting that the emergence of cancer cells follows different principles from that of evolutionary cell-type origination.

[1]  Martin Renqiang Min,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[2]  A. Dress,et al.  δ Plots: A Tool for Analyzing Phylogenetic Distance Data , 2002 .

[3]  Johan Bollen,et al.  The evolution of complexity , 1999 .

[4]  Klaus Peter Schliep,et al.  phangorn: phylogenetic analysis in R , 2010, Bioinform..

[5]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[6]  D. Chaplin Overview of the immune response. , 2003, The Journal of allergy and clinical immunology.

[7]  Cesare Furlanello,et al.  A promoter-level mammalian expression atlas , 2015 .

[8]  Todd H. Oakley,et al.  Furcation, field-splitting, and the evolutionary origins of novelty in arthropod photoreceptors. , 2007, Arthropod structure & development.

[9]  Günter P. Wagner,et al.  Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples , 2012, Theory in Biosciences.

[10]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[11]  Paul Theodor Pyl,et al.  HTSeq—a Python framework to work with high-throughput sequencing data , 2014, bioRxiv.

[12]  Günter P. Wagner,et al.  A model based criterion for gene expression calls using RNA-seq data , 2013, Theory in Biosciences.

[13]  D. Huson,et al.  Application of phylogenetic networks in evolutionary studies. , 2006, Molecular biology and evolution.

[14]  D. Arendt The evolution of cell types in animals: emerging principles from molecular studies , 2008, Nature Reviews Genetics.

[15]  B. Hall,et al.  Human cell type diversity, evolution, development, and classification with special reference to cells derived from the neural crest , 2006, Biological reviews of the Cambridge Philosophical Society.

[16]  R. Geeta Structure trees and species trees: what they say about morphological development and evolution , 2003, Evolution & development.

[17]  김동규,et al.  [서평]「Algorithms on Strings, Trees, and Sequences」 , 2000 .

[18]  J. W. Valentine,et al.  Morphological complexity increase in metazoans , 1994, Paleobiology.

[19]  Todd H. Oakley The eye as a replicating and diverging, modular developmental unit , 2003 .

[20]  S. Teichmann,et al.  RNA sequencing reveals two major classes of gene expression levels in metazoan cells , 2011, Molecular systems biology.

[21]  V. Moulton,et al.  Neighbor-net: an agglomerative method for the construction of phylogenetic networks. , 2002, Molecular biology and evolution.

[22]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .