Bioinformatic flowchart and database to investigate the origins and diversity of Clan AA peptidases

BackgroundClan AA of aspartic peptidases relates the family of pepsin monomers evolutionarily with all dimeric peptidases encoded by eukaryotic LTR retroelements. Recent findings describing various pools of single-domain nonviral host peptidases, in prokaryotes and eukaryotes, indicate that the diversity of clan AA is larger than previously thought. The ensuing approach to investigate this enzyme group is by studying its phylogeny. However, clan AA is a difficult case to study due to the low similarity and different rates of evolution. This work is an ongoing attempt to investigate the different clan AA families to understand the cause of their diversity.ResultsIn this paper, we describe in-progress database and bioinformatic flowchart designed to characterize the clan AA protein domain based on all possible protein families through ancestral reconstructions, sequence logos, and hidden markov models (HMMs). The flowchart includes the characterization of a major consensus sequence based on 6 amino acid patterns with correspondence with Andreeva's model, the structural template describing the clan AA peptidase fold. The set of tools is work in progress we have organized in a database within the GyDB project, referred to as Clan AA Reference Database http://gydb.uv.es/gydb/phylogeny.php?tree=caard.ConclusionThe pre-existing classification combined with the evolutionary history of LTR retroelements permits a consistent taxonomical collection of sequence logos and HMMs. This set is useful for gene annotation but also a reference to evaluate the diversity of, and the relationships among, the different families. Comparisons among HMMs suggest a common ancestor for all dimeric clan AA peptidases that is halfway between single-domain nonviral peptidases and those coded by Ty3/Gypsy LTR retroelements. Sequence logos reveal how all clan AA families follow similar protein domain architecture related to the peptidase fold. In particular, each family nucleates a particular consensus motif in the sequence position related to the flap. The different motifs constitute a network where an alanine-asparagine-like variable motif predominates, instead of the canonical flap of the HIV-1 peptidase and closer relatives.ReviewersThis article was reviewed by Daniel H. Haft, Vladimir Kapitonov (nominated by Jerry Jurka), and Ben M. Dunn (nominated by Claus Wilke).

[1]  T. L. Blundell,et al.  Structural evidence for gene duplication in the evolution of the acid proteases , 1978, Nature.

[2]  J. Thornton,et al.  A revised set of potentials for β‐turn formation in proteins , 1994 .

[3]  K. Schweimer,et al.  The solution structure of the simian foamy virus protease reveals a monomeric protein. , 2008, Journal of molecular biology.

[4]  T. D. Schneider,et al.  Information content of binding sites on nucleotide sequences. , 1986, Journal of molecular biology.

[5]  D. Fass,et al.  Ddi1, a eukaryotic protein with the retroviral protease fold. , 2006, Journal of molecular biology.

[6]  B. L. Sibanda,et al.  Conformation of beta-hairpins in protein structures. A systematic classification with applications to modelling by homology, electron density fitting and protein engineering. , 1989, Journal of molecular biology.

[7]  C E Shannon,et al.  The mathematical theory of communication. 1963. , 1997, M.D. computing : computers in medical practice.

[8]  G. Rose,et al.  Turns in peptides and proteins. , 1985, Advances in protein chemistry.

[9]  Richard A. Goldstein,et al.  Probabilistic reconstruction of ancestral protein sequences , 1996, Journal of Molecular Evolution.

[10]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[11]  P. Brindley,et al.  The Sinbad retrotransposon from the genome of the human blood fluke, Schistosoma mansoni, and the distribution of related Pao-like elements , 2005, BMC Evolutionary Biology.

[12]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[13]  C. Venkatachalam Stereochemical criteria for polypeptides and proteins. V. Conformation of a system of three linked peptide units , 1968, Biopolymers.

[14]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[15]  D. Lipman,et al.  National Center for Biotechnology Information , 2019, Springer Reference Medizin.

[16]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[17]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[18]  A. Moya,et al.  Relationships of gag-pol diversity between Ty3/Gypsy and Retroviridae LTR retroelements and the three kings hypothesis , 2008, BMC Evolutionary Biology.

[19]  Manuel C. Peitsch,et al.  SWISS-MODEL: an automated protein homology-modeling server , 2003, Nucleic Acids Res..

[20]  M. Jaskólski,et al.  Molecular replacement with pseudosymmetry and model dissimilarity: a case study. , 2006, Acta crystallographica. Section D, Biological crystallography.

[21]  A Wlodawer,et al.  Structural and biochemical studies of retroviral proteases. , 2000, Biochimica et biophysica acta.

[22]  Alan M. Lambowitz,et al.  Mobile DNA III , 2002 .

[23]  Fred R. McMorris,et al.  Consensusn-trees , 1981 .

[24]  R. Shamir,et al.  A fast algorithm for joint reconstruction of ancestral amino acid sequences. , 2000, Molecular biology and evolution.

[25]  D. M. Krylov,et al.  Correspondence A novel family of predicted retroviral-like aspartyl proteases with a possible key role in eukaryotic cell cycle control , 2001, Current Biology.

[26]  M. Tristem,et al.  The Evolution, Distribution and Diversity of Endogenous Retroviruses , 2003, Virus Genes.

[27]  T. D. Schneider,et al.  Consensus sequence Zen. , 2002, Applied bioinformatics.

[28]  Andrés Moya,et al.  The Gypsy Database (GyDB) of mobile genetic elements , 2008, Nucleic Acids Res..

[29]  Neil D. Rawlings,et al.  MEROPS: the peptidase database , 2009, Nucleic Acids Res..

[30]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[31]  N. Andreeva A consensus template for the aspartic proteinase fold. , 1991, Advances in experimental medicine and biology.

[32]  D. Mager,et al.  Endogenous Human Retroviruses , 1994 .

[33]  Tom Blundell,et al.  The active site of aspartic proteinases , 1991, FEBS letters.

[34]  William R. Taylor,et al.  A structural model for the retroviral proteases , 1987, Nature.

[35]  A. Alix,et al.  High accuracy prediction of β‐turns and their types using propensities and multiple alignments , 2005 .

[36]  J. Thornton,et al.  A revised set of potentials for beta-turn formation in proteins. , 1994, Protein science : a publication of the Protein Society.

[37]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[38]  J. Jurka,et al.  A universal classification of eukaryotic transposable elements implemented in Repbase , 2008, Nature Reviews Genetics.

[39]  M. Jaskólski,et al.  Crystal structure of human T cell leukemia virus protease, a novel target for anticancer drug design. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[40]  I. Weber Structural alignment of retroviral protease sequences. , 1989, Gene.

[41]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[42]  M. Nei,et al.  A new method of inference of ancestral nucleotide and amino acid sequences. , 1995, Genetics.

[43]  X. Puente,et al.  Human and mouse proteases: a comparative genomic approach , 2003, Nature Reviews Genetics.

[44]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[45]  J S Fruton,et al.  The mechanism of the catalytic action of pepsin and related acid proteinases. , 2006, Advances in enzymology and related areas of molecular biology.

[46]  Viktor Hornak,et al.  HIV-1 protease flaps spontaneously close to the correct structure in simulations following manual placement of an inhibitor into the open state. , 2006, Journal of the American Chemical Society.

[47]  M. Tristem,et al.  Evolution and Distribution of Class II-Related Endogenous Retroviruses , 2005, Journal of Virology.

[48]  J. Aldrich R.A. Fisher and the making of maximum likelihood 1912-1922 , 1997 .

[49]  B. Dunn Overview of Pepsin-like Aspartic Peptidases , 2003 .

[50]  J. Louis,et al.  Hydrophilic peptides derived from the transframe region of Gag-Pol inhibit the HIV-1 protease. , 1998, Biochemistry.

[51]  P. Meisel Margaret O. Dayhoff: Atlas of Protein Sequence and Structure 1969 (Volume 4) XXIV u. 361 S., 21 Ausklapptafeln, 68 Abb. und zahlreiche Tabellen. National Biomedical Research Foundation, Silver Spring/Maryland 1969. Preis $ 12,50 , 1971 .

[52]  M. Donovan,et al.  Identification and characterization of a novel retroviral-like aspartic protease specifically expressed in human epidermis. , 2005, The Journal of investigative dermatology.

[53]  M. Freeling,et al.  A low copy number, copia‐like transposon in maize. , 1985, The EMBO journal.

[54]  N. Bowen,et al.  Genomic analysis of Caenorhabditis elegans reveals ancient families of retroviral-like elements. , 1999, Genome research.

[55]  J. Thompson,et al.  The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. , 1997, Nucleic acids research.

[56]  T. Eickbush,et al.  Origins and Evolution of Retrotransposons , 2002 .

[57]  A. Oskooi Molecular Evolution and Phylogenetics , 2008 .

[58]  J. Kapfhammer,et al.  Differential ligand-dependent protein-protein interactions between nuclear receptors and a neuronal-specific cofactor. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[59]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[60]  T. Matsui,et al.  Mouse Homologue of Skin-specific Retroviral-like Aspartic Protease Involved in Wrinkle Formation* , 2006, Journal of Biological Chemistry.