Discovering co-occurring patterns and their biological significance in protein families

BackgroundThe large influx of biological sequences poses the importance of identifying and correlating conserved regions in homologous sequences to acquire valuable biological knowledge. These conserved regions contain statistically significant residue associations as sequence patterns. Thus, patterns from two conserved regions co-occurring frequently on the same sequences are inferred to have joint functionality. A method for finding conserved regions in protein families with frequent co-occurrence patterns is proposed. The biological significance of the discovered clusters of conserved regions with co-occurrences patterns can be validated by their three-dimensional closeness of amino acids and the biological functionality found in those regions as supported by published work.MethodsUsing existing algorithms, we discovered statistically significant amino acid associations as sequence patterns. We then aligned and clustered them into Aligned Pattern Clusters (APCs) corresponding to conserved regions with amino acid conservation and variation. When one APC frequently co-occured with another APC, the two APCs have high co-occurrence. We then clustered APCs with high co-occurrence into what we refer to as Co-occurrence APC Clusters (Co-occurrence Clusters).ResultsOur results show that for Co-occurrence Clusters, the three-dimensional distance between their amino acids is closer than average amino acid distances. For the Co-occurrence Clusters of the ubiquitin and the cytochrome c families, we observed biological significance among the residing amino acids of the APCs within the same cluster. In ubiquitin, the residues are responsible for ubiquitination as well as conventional and unconventional ubiquitin-bindings. In cytochrome c, amino acids in the first co-occurrence cluster contribute to binding of other proteins in the electron transport chain, and amino acids in the second co-occurrence cluster contribute to the stability of the axial heme ligand.ConclusionsThus, our co-occurrence clustering algorithm can efficiently find and rank conserved regions that contain patterns that frequently co-occurring on the same proteins. Co-occurring patterns are biologically significant due to their three-dimensional closeness and other evidences reported in literature. These results play an important role in drug discovery as biologists can quickly identify the target for drugs to conduct detailed preclinical studies.

[1]  Photon Factory,et al.  Ubiquitin-binding domains — from structures to functions , 2009 .

[2]  S J Ferguson,et al.  Still a puzzle: why is haem covalently attached in c-type cytochromes? , 1999, Structure.

[3]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[4]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[5]  I Clark-Lewis,et al.  A rationale for the absolute conservation of Asn70 and Pro71 in mitochondrial cytochromes c suggested by protein engineering. , 1997, Biochemistry.

[6]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[7]  Asimul Islam,et al.  The role of key residues in structure, function, and stability of cytochrome-c , 2013, Cellular and Molecular Life Sciences.

[8]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[9]  R. Dickerson,et al.  Redox conformation changes in refined tuna cytochrome c. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[10]  C. Wallace,et al.  Probing the role of the conserved beta-II turn Pro-76/Gly-77 of mitochondrial cytochrome c. , 2007, Biochemistry and cell biology = Biochimie et biologie cellulaire.

[11]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[12]  Emad Tajkhorshid,et al.  The binding interface of cytochrome c and cytochrome c₁ in the bc₁ complex: rationalizing the role of key residues. , 2010, Biophysical journal.

[13]  B. Chait,et al.  Substitutions engineered by chemical synthesis at three conserved sites in mitochondrial cytochrome c. Thermodynamic and functional consequences. , 1989, The Journal of biological chemistry.

[14]  E. Margoliash,et al.  The low ionic strength crystal structure of horse cytochrome c at 2.1 A resolution and comparison with its high ionic strength counterpart. , 1995, Structure.

[15]  G J Pielak,et al.  Exploring the interface between the N- and C-terminal helices of cytochrome c by random mutagenesis within the C-terminal helix. , 1993, Biochemistry.

[16]  O. Lichtarge,et al.  Evolutionary Trace of G Protein-coupled Receptors Reveals Clusters of Residues That Determine Global and Class-specific Functions* , 2004, Journal of Biological Chemistry.

[17]  E Margoliash,et al.  Effects of mutating Asn-52 to isoleucine on the haem-linked properties of cytochrome c. , 1994, The Biochemical journal.

[18]  Kara L Bren,et al.  The chemistry and biochemistry of heme c: functional bases for covalent attachment. , 2008, Natural product reports.

[19]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[20]  T. Hwa,et al.  Identification of direct residue contacts in protein–protein interaction by message passing , 2009, Proceedings of the National Academy of Sciences.

[21]  C. Bugg,et al.  Comparison of the three-dimensional structures of human, yeast, and oat ubiquitin. , 1987, The Journal of biological chemistry.

[22]  T. Alleyne,et al.  Probing the specifics of substrate binding for cytochrome c oxidase: a computer assisted approach. , 2009, The West Indian medical journal.

[23]  Sigurd M. Wilbanks,et al.  Conformational change and human cytochrome c function: mutation of residue 41 modulates caspase activation and destabilizes Met-80 coordination , 2013, JBIC Journal of Biological Inorganic Chemistry.

[24]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[25]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[26]  Conrad C. Huang,et al.  UCSF Chimera—A visualization system for exploratory research and analysis , 2004, J. Comput. Chem..

[27]  Pauline Mayonove,et al.  The alpha helix of ubiquitin interacts with yeast cyclin-dependent kinase subunit CKS1. , 2007, Biochemistry.

[28]  Andrew K. C. Wong,et al.  Confirming biological significance of co-occurrence clusters of aligned pattern clusters , 2013, 2013 IEEE International Conference on Bioinformatics and Biomedicine.

[29]  Olivier Poch,et al.  A Comprehensive Benchmark Study of Multiple Sequence Alignment Methods: Current Challenges and Future Perspectives , 2011, PloS one.

[30]  Andrew K. C. Wong,et al.  Discovery of Delta Closed Patterns and Noninduced Patterns from Sequences , 2012, IEEE Transactions on Knowledge and Data Engineering.

[31]  Soichi Wakatsuki,et al.  Ubiquitin-binding domains — from structures to functions , 2009, Nature Reviews Molecular Cell Biology.

[32]  S. Hagen,et al.  Rapid intrachain binding of histidine-26 and histidine-33 to heme in unfolded ferrocytochrome C. , 2002, Biochemistry.

[33]  G J Pielak,et al.  Role of phenylalanine-82 in yeast iso-1-cytochrome c and remote conformational changes induced by a serine residue at this position. , 1988, Biochemistry.

[34]  María Martín,et al.  Activities at the Universal Protein Resource (UniProt) , 2013, Nucleic Acids Res..

[35]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[36]  A. Haas,et al.  Site-directed mutagenesis of ubiquitin. Differential roles for arginine in the interaction with ubiquitin-activating enzyme. , 1994, Biochemistry.

[37]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[38]  Xuhua Xia,et al.  Position Weight Matrix, Gibbs Sampler, and the Associated Significance Tests in Motif Characterization and Prediction , 2012, Scientifica.

[39]  Erik van Nimwegen,et al.  Disentangling Direct from Indirect Co-Evolution of Residues in Protein Alignments , 2010, PLoS Comput. Biol..

[40]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[41]  R. Dickerson,et al.  Conformation change of cytochrome c. I. Ferrocytochrome c structure refined at 1.5 A resolution. , 1981, Journal of molecular biology.

[42]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[43]  K D Wilkinson,et al.  Three-dimensional structure of ubiquitin at 2.8 A resolution. , 1985, Proceedings of the National Academy of Sciences of the United States of America.