Influence of Genomic and Other Biological Data Sets in the Understanding of Protein Structures, Functions and Interactions

In the post-genomic era, biological databases are growing at a tremendous rate. Despite rapid accumulation of biological information, functions and other biological properties of many putative gene products of various organisms remain either unknown or obscure. This paper examines how strategic integration of large biological databases and combinations of various biological information helps address some of the fundamental questions on protein structure, function and interactions. New developments in function recognition by remote homology detection and strategic use of sequence databases aid recognition of functions of newly discovered proteins. Knowledge of 3-D structures and combined use of sequences and 3-D structures of homologous protein domains expands the ability of remote homology detection enormously. The authors also demonstrate how combined consideration of functions of individual domains of multi-domain proteins helps in recognizing gross biological attributes. This paper also discusses a few cases of combining disparate biological datasets or combination of disparate biological information in obtaining new insights about protein-protein interactions across a host and a pathogen. Finally, the authors discuss how combinations of low resolution structural data, obtained using cryoEM studies, of gigantic multi-component assemblies, and atomic level 3-D structures of the components is effective in inferring finer features in the assembly.

[1]  L. Johnson,et al.  The structural basis for specificity of substrate and recruitment peptides for cyclin-dependent kinases , 1999, Nature Cell Biology.

[2]  Alejandro A. Schäffer,et al.  IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices , 1999, Bioinform..

[3]  S. Fesik,et al.  Backbone dynamics of the C-terminal domain of Escherichia coli topoisomerase I in the absence and presence of single-stranded DNA. , 1996, Biochemistry.

[4]  Keith Brew,et al.  Increased backbone mobility in beta-barrel enhances entropy gain driving binding of N-TIMP-1 to MMP-3. , 2003, Journal of molecular biology.

[5]  P. Bork,et al.  Predicting functions from protein sequences—where are the bottlenecks? , 1998, Nature Genetics.

[6]  Narayanaswamy Srinivasan,et al.  Structure-Based Phylogeny as a Diagnostic for Functional Characterization of Proteins with a Cupin Fold , 2009, PloS one.

[7]  N Enomoto,et al.  Mutations in the nonstructural protein 5A gene and response to interferon in patients with chronic hepatitis C virus 1b infection. , 1996, The New England journal of medicine.

[8]  Stefan Zeuzem Heterogeneous virologic response rates to interferon-based therapy in patients with chronic hepatitis C: who responds less well? , 2004 .

[9]  N Srinivasan,et al.  Assessment of a Rigorous Transitive Profile Based Search Method to Detect Remotely Similar Proteins , 2005, Journal of biomolecular structure & dynamics.

[10]  Shashi B. Pandit,et al.  Recognition of remotely related structural homologues using sequence profiles of aligned homologous protein structures , 2004, Silico Biol..

[11]  Ming Tang,et al.  COMPASS server for homology detection: improved statistical accuracy, speed and functionality , 2009, Nucleic Acids Res..

[12]  L. Johnson,et al.  Effects of Phosphorylation of Threonine 160 on Cyclin-dependent Kinase 2 Structure and Activity* , 1999, The Journal of Biological Chemistry.

[13]  B. Rost,et al.  Sequence-based prediction of protein domains. , 2004, Nucleic acids research.

[14]  N Srinivasan,et al.  The repertoire of protein kinases encoded in the draft version of the human genome: atypical variations and uncommon domain combinations , 2002, Genome Biology.

[15]  Shashi B. Pandit,et al.  Identification and analysis of a new family of bacterial serine proteinases , 2004, Silico Biol..

[16]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[17]  Philip Beineke,et al.  Statistical analysis of combined substitutions in nonstructural 5A region of hepatitis C virus and interferon response , 2001, Journal of medical virology.

[18]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[19]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[20]  M. Sternberg,et al.  An analysis of conformational changes on protein-protein association: implications for predictive docking. , 1999, Protein engineering.

[21]  Michael G. Katze,et al.  Control of PKR Protein Kinase by Hepatitis C Virus Nonstructural 5A Protein: Molecular Mechanisms of Kinase Regulation , 1998, Molecular and Cellular Biology.

[22]  D. Goldberg,et al.  Plasmepsin II, an Acidic Hemoglobinase from thePlasmodium falciparum Food Vacuole, Is Active at Neutral pH on the Host Erythrocyte Membrane Skeleton* , 1999, The Journal of Biological Chemistry.

[23]  W. Hendrickson,et al.  Quantification of tertiary structural conservation despite primary sequence drift in the globin fold , 1994, Protein science : a publication of the Protein Society.

[24]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[25]  Nels C. Elde,et al.  Protein kinase R reveals an evolutionary model for defeating viral mimicry , 2008, Nature.

[26]  Oruganty Krishnadev,et al.  A Data Integration Approach to Predict Host-Pathogen Protein-Protein Interactions: Application to Recognize Protein Interactions between Human and a Malarial Parasite , 2008, Silico Biol..

[27]  Oruganty Krishnadev,et al.  MulPSSM: a database of multiple position-specific scoring matrices of protein domain families , 2005, Nucleic Acids Res..

[28]  Kimmen Sjölander,et al.  A comparison of scoring functions for protein sequence profile alignment , 2004, Bioinform..

[29]  B. Chait,et al.  The molecular architecture of the nuclear pore complex , 2007, Nature.

[30]  L. Ingram,et al.  Cloning and sequencing of a cellobiose phosphotransferase system operon from Bacillus stearothermophilus XL-65-6 and functional expression in Escherichia coli , 1993, Journal of bacteriology.

[31]  Edward H Egelman,et al.  Problems in fitting high resolution structures into electron microscopic reconstructions , 2008, HFSP journal.

[32]  Andrea Crisanti,et al.  TRAP Is Necessary for Gliding Motility and Infectivity of Plasmodium Sporozoites , 1997, Cell.

[33]  Rupali A. Gadkari,et al.  Recognition of Interaction Interface Residues in Low-Resolution Structures of Protein Assemblies Solely from the Positions of Cα Atoms , 2009, PloS one.

[34]  S. Balaji,et al.  PALI - a database of Phylogeny and ALIgnment of homologous protein structures , 2001, Nucleic Acids Res..

[35]  L. Lim,et al.  Myotonic Dystrophy Kinase-Related Cdc42-Binding Kinase Acts as a Cdc42 Effector in Promoting Cytoskeletal Reorganization , 1998, Molecular and Cellular Biology.

[36]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[37]  S. Balaji,et al.  SUPFAM: A database of sequence superfamilies of protein domains , 2004, BMC Bioinformatics.

[38]  Sung-Hou Kim,et al.  Crystal structure of cyclin-dependent kinase 2 , 1993, Nature.

[39]  M. Gerstein,et al.  Conformational changes associated with protein-protein interactions. , 2004, Current opinion in structural biology.

[40]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[41]  Conrad C. Huang,et al.  UCSF Chimera—A visualization system for exploratory research and analysis , 2004, J. Comput. Chem..

[42]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[43]  R. Durbin,et al.  Pfam: A comprehensive database of protein domain families based on seed alignments , 1997, Proteins.

[44]  Kornelia Polyak,et al.  Mechanism of CDK activation revealed by the structure of a cyclinA-CDK2 complex , 1995, Nature.

[45]  G. Sen,et al.  Viruses and interferons. , 2001, Annual review of microbiology.

[46]  J Andrew McCammon,et al.  Large conformational changes in proteins: signaling and other functions. , 2010, Current opinion in structural biology.

[47]  Michael D. Daily,et al.  Local motions in a benchmark of allosteric proteins , 2007, Proteins.

[48]  M. S. Chapman,et al.  Fitting of high-resolution structures into electron microscopy reconstruction images. , 2005, Structure.

[49]  L. Holm,et al.  Unification of protein families. , 1998, Current opinion in structural biology.

[50]  A. Elofsson,et al.  Multi-domain proteins in the three kingdoms of life: orphan domains and other unassigned regions. , 2005, Journal of molecular biology.

[51]  Golan Yona,et al.  Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. , 2002, Journal of molecular biology.

[52]  C. Chothia,et al.  Intermediate sequences increase the detection of homology between sequences. , 1997, Journal of molecular biology.

[53]  N Srinivasan,et al.  Use of a database of structural alignments and phylogenetic trees in investigating the relationship between sequence and structural variability among homologous proteins. , 2001, Protein engineering.

[54]  Anne-Claude Camproux,et al.  Structural deformation upon protein-protein interaction: A structural alphabet approach , 2008, BMC Structural Biology.

[55]  N. Enomoto,et al.  Comparison of full-length sequences of interferon-sensitive and resistant hepatitis C virus 1b. Sensitivity to interferon is conferred by amino acid substitutions in the NS5A region. , 1995, The Journal of clinical investigation.

[56]  Narayanaswamy Srinivasan,et al.  Prediction of protein-protein interactions in dengue virus coat proteins guided by low resolution cryoEM structures , 2010, BMC Structural Biology.

[57]  T. Barrette,et al.  Probabilistic model of the human protein-protein interaction network , 2005, Nature Biotechnology.

[58]  S. Teichmann,et al.  Domain combinations in archaeal, eubacterial and eukaryotic proteomes. , 2001, Journal of molecular biology.

[59]  Jae-Seong Yang,et al.  Evolutionary conservation in multiple faces of protein interaction , 2009, Proteins.

[60]  Shashi B. Pandit,et al.  SUPFAM - a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomes , 2002, Nucleic Acids Res..

[61]  Narayanan Eswar,et al.  Structure of the mammalian 80S ribosome at 8.7 A resolution. , 2008, Structure.

[62]  T L Blundell,et al.  CAMPASS: a database of structurally aligned protein superfamilies. , 1998, Structure.

[63]  Michael Y. Galperin,et al.  The 2010 Nucleic Acids Research Database Issue and online Database Collection: a community of data resources , 2009, Nucleic Acids Res..

[64]  Chittibabu Guda,et al.  TARGET: a new method for predicting protein subcellular localization in eukaryotes , 2005, Bioinform..

[65]  Smita Mohanty,et al.  How effective is the data on co-occurrence of domains in multi-domain proteins in prediction of protein-protein interactions? , 2009, 2009 IEEE International Workshop on Genomic Signal Processing and Statistics.

[66]  Holger Stark,et al.  Structure of the hepatitis C virus IRES bound to the human 80S ribosome: remodeling of the HCV IRES. , 2005, Structure.

[67]  J. Hoofnagle,et al.  Mechanism of action of interferon and ribavirin in treatment of hepatitis C , 2005, Nature.

[68]  M. Levitt,et al.  A unified statistical framework for sequence comparison and structure comparison. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[69]  M. Falasca,et al.  Specificity in pleckstrin homology (PH) domain membrane targeting: a role for a phosphoinositide–protein co‐operative mechanism , 2001, FEBS letters.

[70]  Andrej Sali,et al.  Comprehensive molecular structure of the eukaryotic ribosome. , 2009, Structure.

[71]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[72]  M. Gerstein,et al.  A database of macromolecular motions. , 1998, Nucleic acids research.

[73]  Narayanan Eswar,et al.  Host–pathogen protein interactions predicted by comparative modeling , 2007, Protein science : a publication of the Protein Society.

[74]  C A Orengo,et al.  Combining sensitive database searches with multiple intermediates to detect distant homologues. , 1999, Protein engineering.

[75]  P. Jeffrey,et al.  Structural basis of cyclin-dependent kinase activation by phosphorylation , 1996, Nature Structural Biology.

[76]  N Srinivasan,et al.  Comparison of sequence-based and structure-based phylogenetic trees of homologous proteins: Inferences on protein evolution , 2007, Journal of Biosciences.

[77]  Arvin C. Dar,et al.  Higher-Order Substrate Recognition of eIF2α by the RNA-Dependent Protein Kinase PKR , 2005, Cell.

[78]  Roger Williams Global challenges in liver disease , 2006, Hepatology.

[79]  A. Godzik,et al.  Exploration of Uncharted Regions of the Protein Universe , 2009, PLoS biology.

[80]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.