A Roadmap to Domain Based Proteomics.

Protein domains are reusable segments of proteins and play an important role in protein evolution. By combining the elements from a relatively small set of domains into unique arrangements, a large number of distinct proteins can be generated. Since domains often have specific functions, changes in their arrangement usually affect the overall protein function. Furthermore, domains are well amenable to computational representations, e.g., by Hidden Markov Models (HMMs), and these HMMs are widely represented in various databases. Therefore, domains can be efficiently used for proteomic analyses. Here, we describe how domains are annotated using different domain databases and then how to assess the annotation quality of proteomes. We next show how functional annotations of domains in large-scale data such as whole genomes or transcriptomes can be used to analyze molecular differences between species. Furthermore, we describe methods to analyze the changes in domain content of proteins which significantly helps to characterize and reconstruct the modular evolution of proteins. Altogether, domain-based methods offer a computationally highly effective approach to analyze large amounts of proteomic data in an evolutionary setting.

[1]  Erich Bornberg-Bauer,et al.  DOGMA: domain-based transcriptome and proteome quality assessment , 2016, Bioinform..

[2]  Erich Bornberg-Bauer,et al.  The Dynamics and Evolutionary Potential of Domain Loss and Emergence , 2011, Molecular biology and evolution.

[3]  C. Chothia,et al.  Structure, function and evolution of multidomain proteins. , 2004, Current opinion in structural biology.

[4]  Evgeny M. Zdobnov,et al.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs , 2015, Bioinform..

[5]  Alessandra Carbone,et al.  A multi-objective optimization approach accurately resolves protein domain architectures , 2015, Bioinform..

[6]  M. Levitt Nature of the protein universe , 2009, Proceedings of the National Academy of Sciences.

[7]  E. Bornberg-Bauer,et al.  Detection of orphan domains in Drosophila using "hydrophobic cluster analysis". , 2015, Biochimie.

[8]  Olivier Gascuel,et al.  Detection of new protein domains using co-occurrence: application to Plasmodium falciparum , 2009, Bioinform..

[9]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[10]  Alan Bridge,et al.  New and continuing developments at PROSITE , 2012, Nucleic Acids Res..

[11]  Ian Sillitoe,et al.  Functional innovation from changes in protein domains and their combinations. , 2016, Current opinion in structural biology.

[12]  Robert D. Finn,et al.  The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..

[13]  V. Gladyshev,et al.  Selenoproteins: molecular pathways and physiological roles. , 2014, Physiological reviews.

[14]  Andrew D. Moore,et al.  Arrangements in the modular evolution of proteins. , 2008, Trends in biochemical sciences.

[15]  Andrew D. Moore,et al.  Quantification and functional analysis of modular protein evolution in a dense phylogenetic tree. , 2013, Biochimica et biophysica acta.

[16]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[17]  Christine G. Elsik,et al.  Hymenoptera Genome Database: integrating genome annotations in HymenopteraMine , 2015, Nucleic Acids Res..

[18]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[19]  Erich Bornberg-Bauer,et al.  Dynamics and Adaptive Benefits of Protein Domain Emergence and Arrangements during Plant Genome Evolution , 2012, Genome biology and evolution.

[20]  Silvio C. E. Tosatto,et al.  InterPro in 2017—beyond protein family and domain annotations , 2016, Nucleic Acids Res..

[21]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.