Indel PDB: A database of structural insertions and deletions derived from sequence alignments of closely related proteins

BackgroundInsertions and deletions (indels) represent a common type of sequence variations, which are less studied and pose many important biological questions. Recent research has shown that the presence of sizable indels in protein sequences may be indicative of protein essentiality and their role in protein interaction networks. Examples of utilization of indels for structure-based drug design have also been recently demonstrated. Nonetheless many structural and functional characteristics of indels remain less researched or unknown.DescriptionWe have created a web-based resource, Indel PDB, representing a structural database of insertions/deletions identified from the sequence alignments of highly similar proteins found in the Protein Data Bank (PDB). Indel PDB utilized large amounts of available structural information to characterize 1-, 2- and 3-dimensional features of indel sites.Indel PDB contains 117,266 non-redundant indel sites extracted from 11,294 indel-containing proteins. Unlike loop databases, Indel PDB features more indel sequences with secondary structures including alpha-helices and beta-sheets in addition to loops. The insertion fragments have been characterized by their sequences, lengths, locations, secondary structure composition, solvent accessibility, protein domain association and three dimensional structures.ConclusionBy utilizing the data available in Indel PDB, we have studied and presented here several sequence and structural features of indels. We anticipate that Indel PDB will not only enable future functional studies of indels, but will also assist protein modeling efforts and identification of indel-directed drug binding sites.

[1]  Lubert Stryer,et al.  Protein structure and function , 2005, Experientia.

[2]  D. Schomburg,et al.  Prediction of protein three-dimensional structures in insertion and deletion regions: a procedure for searching data bases of representative protein fragments using geometric scoring criteria. , 1995, Journal of molecular biology.

[3]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[4]  Eric P. Smith,et al.  An Introduction to Statistical Modeling of Extreme Values , 2002, Technometrics.

[5]  N. Goldman,et al.  Comparison of models for nucleotide substitution used in maximum-likelihood phylogenetic estimation. , 1994, Molecular biology and evolution.

[6]  Xun Gu,et al.  The size distribution of insertions and deletions in human and rodent pseudogenes suggests the logarithmic gap penalty for sequence alignment , 1995, Journal of Molecular Evolution.

[7]  Eric Martz,et al.  Protein Data Bank (PDB) , 2004 .

[8]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[9]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[10]  A. Goede,et al.  Loops In Proteins (LIP)--a comprehensive loop database for homology modelling. , 2003, Protein engineering.

[11]  N. Reiner,et al.  Molecular cloning, biochemical and structural analysis of elongation factor-1 alpha from Leishmania donovani: comparison with the mammalian homologue. , 2003, Biochemical and biophysical research communications.

[12]  B. L. Sibanda,et al.  Accommodating sequence changes in β-hairpins in proteins , 1993 .

[13]  Artem Cherkasov,et al.  Relationship between insertion/deletion (indel) frequency of proteins and essentiality , 2007, BMC Bioinformatics.

[14]  B. L. Sibanda,et al.  Accommodating sequence changes in beta-hairpins in proteins. , 1993, Journal of molecular biology.

[15]  Artem Cherkasov,et al.  Indel‐based targeting of essential proteins in human pathogens that have close host orthologue(s): Discovery of selective inhibitors for Leishmania donovani elongation factor‐1α , 2007, Proteins.

[16]  Artem Cherkasov,et al.  Selective targeting of indel‐inferred differences in spatial structures of highly homologous proteins , 2005, Proteins.

[17]  P. Lio’,et al.  Molecular phylogenetics: state-of-the-art methods for looking into the past. , 2001, Trends in genetics : TIG.

[18]  B Qian,et al.  Distribution of indel lengths , 2001, Proteins.

[19]  J. Thorne,et al.  Models of protein sequence evolution and their applications. , 2000, Current opinion in genetics & development.

[20]  A. Wagner How the global structure of protein interaction networks evolves , 2002, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[21]  Gregory A.Petsko and Dagmar Ringe Protein structure and function , 2003 .

[22]  G. Gonnet,et al.  Empirical and structural models for insertions and deletions in the divergent evolution of proteins. , 1993, Journal of molecular biology.

[23]  P. Argos,et al.  Analysis of insertions/deletions in protein structures. , 1992, Journal of molecular biology.

[24]  Artem Cherkasov,et al.  Large‐scale survey for potentially targetable indels in bacterial and protozoan proteins , 2005, Proteins.

[25]  Baldomero Oliva,et al.  ArchDB: automated protein loop classification as a tool for structural genomics , 2004, Nucleic Acids Res..