Evolution of protein indels in plants, animals and fungi

BackgroundInsertions/deletions (indels) in protein sequences are useful as drug targets, protein structure predictors, species diagnostics and evolutionary markers. However there is limited understanding of indel evolutionary patterns. We sought to characterize indel patterns focusing first on the major groups of multicellular eukaryotes.ResultsComparisons of complete proteomes from a taxonically broad set of primarily Metazoa, Fungi and Viridiplantae yielded 299 substantial (>250aa) universal, single-copy (in-paralog only) proteins, from which 901 simple (present/absent) and 3,806 complex (multistate) indels were extracted. Simple indels are mostly small (1-7aa) with a most frequent size class of 1aa. However, even these simple looking indels show a surprisingly high level of hidden homoplasy (multiple independent origins). Among the apparently homoplasy-free simple indels, we identify 69 potential clade-defining indels (CDIs) that may warrant closer examination. CDIs show a very uneven taxonomic distribution among Viridiplante (13 CDIs), Fungi (40 CDIs), and Metazoa (0 CDIs). An examination of singleton indels shows an excess of insertions over deletions in nearly all examined taxa. This excess averages 2.31 overall, with a maximum observed value of 7.5 fold.ConclusionsWe find considerable potential for identifying taxon-marker indels using an automated pipeline. However, it appears that simple indels in universal proteins are too rare and homoplasy-rich to be used for pure indel-based phylogeny. The excess of insertions over deletions seen in nearly every genome and major group examined maybe useful in defining more realistic gap penalties for sequence alignment. This bias also suggests that insertions in highly conserved proteins experience less purifying selection than do deletions.

[1]  T. Kunkel,et al.  Mechanism of a genetic glissando: structural biology of indel mutations. , 2006, Trends in biochemical sciences.

[2]  Radhey S. Gupta Protein signatures (molecular synapomorphies) that are distinctive characteristics of the major cyanobacterial clades. , 2009, International journal of systematic and evolutionary microbiology.

[3]  M. Irimia,et al.  Widespread Recurrent Evolution of Genomic Features , 2012, Genome biology and evolution.

[4]  M. Irimia,et al.  Origins of Human Malaria: Rare Genomic Changes and Full Mitochondrial Genomes Confirm the Relationship of Plasmodium falciparum to Other Mammalian Parasites but Complicate the Origins of Plasmodium vivax , 2008, Molecular biology and evolution.

[5]  Artem Cherkasov,et al.  Large‐scale survey for potentially targetable indels in bacterial and protozoan proteins , 2005, Proteins.

[6]  Radhey S. Gupta,et al.  Phylogenomics and signature proteins for the alpha Proteobacteria and its main groups , 2007, BMC Microbiology.

[7]  K Henrick,et al.  Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. , 2004, Acta crystallographica. Section D, Biological crystallography.

[8]  Nathan S. Watson-Haigh,et al.  SeqFIRE: a web application for automated extraction of indel regions and conserved blocks from protein multiple sequence alignments , 2012, Nucleic Acids Res..

[9]  Hui Liu,et al.  IndelFR: a database of indels in protein structures and their flanking regions , 2011, Nucleic Acids Res..

[10]  P. Argos,et al.  Analysis of insertions/deletions in protein structures. , 1992, Journal of molecular biology.

[11]  J. Palmer,et al.  Phylogeny: Parabasalian flagellates are ancient eukaryotes , 2000, Nature.

[12]  S. Longhorn,et al.  Rare genomic changes and mitochondrial sequences provide independent support for congruent relationships among the sea spiders (Arthropoda, Pycnogonida). , 2010, Molecular phylogenetics and evolution.

[13]  Lushan Wang,et al.  Impact of indels on the flanking regions in structural domains. , 2011, Molecular biology and evolution.

[14]  Dmitri A Petrov,et al.  Mutational equilibrium model of genome size evolution. , 2002, Theoretical population biology.

[15]  J. Palmer,et al.  Lateral transfer at the gene and subgenic levels in the evolution of eukaryotic enolase , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[16]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[17]  David G. Lloyd,et al.  Multi‐residue gaps, a class of molecular characters with exceptional reliability for phylogenetic analyses , 1991 .

[18]  J. Palmer,et al.  Animals and fungi are each other's closest relatives: congruent evidence from multiple proteins. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Ofir Cohen,et al.  Large-scale parsimony analysis of metazoan indels in protein-coding genes. , 2010, Molecular biology and evolution.

[20]  John Healy,et al.  GapCoder automates the use of indel characters in phylogenetic analysis , 2003, BMC Bioinformatics.

[21]  G. Gonnet,et al.  Empirical and structural models for insertions and deletions in the divergent evolution of proteins. , 1993, Journal of molecular biology.

[22]  O. Gascuel,et al.  SeaView version 4: A multiplatform graphical user interface for sequence alignment and phylogenetic tree building. , 2010, Molecular biology and evolution.

[23]  Renata C. Geer,et al.  The NCBI BioSystems database , 2009, Nucleic Acids Res..

[24]  Mark Gerstein,et al.  Patterns of nucleotide substitution, insertion and deletion in the human genome inferred from pseudogenes. , 2003, Nucleic acids research.

[25]  M. Suchard,et al.  Incorporating indel information into phylogeny estimation for rapidly emerging pathogens , 2007, BMC Evolutionary Biology.

[26]  M. Martindale,et al.  Assessing the root of bilaterian animals with scalable phylogenomic methods , 2009, Proceedings of the Royal Society B: Biological Sciences.

[27]  A. Graybeal,et al.  Is it better to add taxa or characters to a difficult phylogenetic problem? , 1998, Systematic biology.

[28]  Artem Cherkasov,et al.  Relationship between insertion/deletion (indel) frequency of proteins and essentiality , 2007, BMC Bioinformatics.

[29]  A Keith Dunker,et al.  Alternative splicing in concert with protein intrinsic disorder enables increased functional diversity in multicellular organisms. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[30]  N Okada,et al.  SINE insertions: powerful tools for molecular systematics. , 2000, BioEssays : news and reviews in molecular, cellular and developmental biology.

[31]  Artem Cherkasov,et al.  Indel‐based targeting of essential proteins in human pathogens that have close host orthologue(s): Discovery of selective inhibitors for Leishmania donovani elongation factor‐1α , 2007, Proteins.

[32]  Samuel Karlin,et al.  Protein length in eukaryotic and prokaryotic proteomes , 2005, Nucleic acids research.

[33]  Jianzhi Zhang,et al.  Positive selection on protein-length in the evolution of a primate sperm ion channel , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Benjamin A. Shoemaker,et al.  Long-term trends in evolution of indels in protein sequences , 2007, BMC Evolutionary Biology.

[35]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[36]  Jun-tao Guo,et al.  Systematic analysis of short internal indels and their impact on protein folding , 2010, BMC Structural Biology.

[37]  Philipp W Messer,et al.  DNA indels in coding regions reveal selective constraints on protein evolution in the human lineage , 2007, BMC Evolutionary Biology.

[38]  A. Goede,et al.  Loops In Proteins (LIP)--a comprehensive loop database for homology modelling. , 2003, Protein engineering.

[39]  Dee R. Denver,et al.  High mutation rate and predominance of insertions in the Caenorhabditis elegans nuclear genome , 2004, Nature.

[40]  Inna Dubchak,et al.  The genome portal of the Department of Energy Joint Genome Institute: 2014 updates , 2013, Nucleic Acids Res..

[41]  Mark P. Simmons,et al.  The relative performance of indel-coding methods in simulations. , 2007, Molecular phylogenetics and evolution.

[42]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[43]  C. Berney,et al.  A molecular time-scale for eukaryote evolution recalibrated with the continuous microfossil record , 2006, Proceedings of the Royal Society B: Biological Sciences.

[44]  Erik L. L. Sonnhammer,et al.  Inparanoid: a comprehensive database of eukaryotic orthologs , 2004, Nucleic Acids Res..

[45]  S. Baldauf,et al.  Evolution of elongation factor G and the origins of mitochondrial and chloroplast forms. , 2011, Molecular biology and evolution.

[46]  Mark P. Simmons,et al.  Gaps as characters in sequence-based phylogenetic analyses. , 2000, Systematic biology.

[47]  Y. Inagaki,et al.  Lateral Transfer of an EF-1α Gene Origin and Evolution of the Large Subunit of ATP Sulfurylase in Eubacteria , 2002, Current Biology.

[48]  Hervé Philippe,et al.  The potential value of indels as phylogenetic markers: position of trichomonads as a case study. , 2002, Molecular biology and evolution.

[49]  O. Madsen,et al.  Indels in protein-coding sequences of Euarchontoglires constrain the rooting of the eutherian tree. , 2003, Molecular phylogenetics and evolution.

[50]  Bernard M. E. Moret,et al.  Phylogenetic Inference , 2011, Encyclopedia of Parallel Computing.

[51]  Artem Cherkasov,et al.  Indel PDB: A database of structural insertions and deletions derived from sequence alignments of closely related proteins , 2008, BMC Bioinformatics.

[52]  A. Löytynoja,et al.  Phylogeny-Aware Gap Placement Prevents Errors in Sequence Alignment and Evolutionary Analysis , 2008, Science.

[53]  P. Holland,et al.  Rare genomic changes as a tool for phylogenetics. , 2000, Trends in ecology & evolution.

[54]  Feng-Chi Chen,et al.  Scanning for the Signatures of Positive Selection for Human-Specific Insertions and Deletions , 2009, Genome biology and evolution.

[55]  Ryan E. Mills,et al.  Small insertions and deletions (INDELs) in human genomes. , 2010, Human molecular genetics.

[56]  L. Katz,et al.  Broadly sampled multigene analyses yield a well-resolved eukaryotic tree of life. , 2010, Systematic biology.

[57]  Artem Cherkasov,et al.  The Effect of Insertions and Deletions on Wirings in Protein-Protein Interaction Networks: A Large-Scale Study , 2009, J. Comput. Biol..

[58]  Baldomero Oliva,et al.  ArchDB: automated protein loop classification as a tool for structural genomics , 2004, Nucleic Acids Res..

[59]  David N. Boone,et al.  The platypus is in its place: nuclear genes and indels confirm the sister group relation of monotremes and Therians. , 2006, Molecular biology and evolution.

[60]  F. Hormozdiari,et al.  Effect of insertions and deletions ( indels ) on wirings in protein-protein interaction networks : a large-scale study , 2022 .