Modeling Protein Interacting Groups by Quasi-Bicliques: Complexity, Algorithm, and Application

Protein-protein interactions (PPIs) are one of the most important mechanisms in cellular processes. To model protein interaction sites, recent studies have suggested to find interacting protein group pairs from large PPI networks at the first step and then to search conserved motifs within the protein groups to form interacting motif pairs. To consider the noise effect and the incompleteness of biological data, we propose to use quasi-bicliquesior finding interacting protein group pairs. We investigate two new problems that arise from finding interacting protein group pairs: the maximum vertex quasi-biclique problem and the maximum balanced quasi-biclique problem. We prove that both problems are NP-hard. This is a surprising result as the widely known maximum vertex biclique problem is polynomial time solvable [1]. We then propose a heuristic algorithm that uses the greedy method to find the quasi-bicliques from PPI networks. Our experiment results on real data show that this algorithm has a better performance than a benchmark algorithm for identifying highly matched BLOCKS and PRINTS motifs. We also report results of two case studies on interacting motif pairs that map well with two interacting domain pairs in iPfam. Availability: The software and supplementary information are available at http://www.cs.cityu.edu.hk/~lwang/software/ppi/index.html.

[1]  Y. Tani,et al.  Recent progress of vitamin B6 biosynthesis. , 2004, Journal of nutritional science and vitaminology.

[2]  T. Attwood,et al.  PRINTS--a protein motif fingerprint database. , 1994, Protein engineering.

[3]  Mihalis Yannakakis,et al.  Node-Deletion Problems on Bipartite Graphs , 1981, SIAM J. Comput..

[4]  J. Spudich,et al.  Laser-induced transient grating analysis of dynamics of interaction between sensory rhodopsin II D75N and the HtrII transducer. , 2007, Biophysical journal.

[5]  Gideon Schechtman,et al.  Approximating bounded 0-1 integer linear programs , 1993, [1993] The 2nd Israel Symposium on Theory and Computing Systems.

[6]  J. Spudich,et al.  Early Photocycle Structural Changes in a Bacteriorhodopsin Mutant Engineered to Transmit Photosensory Signals* , 2007, Journal of Biological Chemistry.

[7]  S. Pietrokovski Searching databases of conserved sequence regions by aligning protein multiple-alignments. , 1996, Nucleic acids research.

[8]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[9]  Gösta Grahne,et al.  Efficiently Using Prefix-trees in Mining Frequent Itemsets , 2003, FIMI.

[10]  Eric Martz,et al.  Protein Explorer: easy yet powerful macromolecular visualization. , 2002, Trends in biochemical sciences.

[11]  T. Takagi,et al.  Assessment of prediction accuracy of protein function from protein–protein interaction data , 2001, Yeast.

[12]  Robert D. Finn,et al.  iPfam: visualization of protein?Cprotein interactions in PDB at domain and amino acid resolutions , 2005, Bioinform..

[13]  D. Cane,et al.  Crystal Structure of Escherichia coli PdxA, an Enzyme Involved in the Pyridoxal Phosphate Biosynthesis Pathway* , 2003, Journal of Biological Chemistry.

[14]  D. Bu,et al.  Topological structure analysis of the protein-protein interaction network in budding yeast. , 2003, Nucleic acids research.

[15]  Desmond J. Higham,et al.  A lock-and-key model for protein-protein interactions , 2006, Bioinform..

[16]  René Peeters,et al.  The maximum edge biclique problem is NP-complete , 2003, Discret. Appl. Math..

[17]  R. Durbin,et al.  Pfam: A comprehensive database of protein domain families based on seed alignments , 1997, Proteins.

[18]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[19]  Tudor Savopol,et al.  Molecular basis of transmembrane signalling by sensory rhodopsin II–transducer complex , 2002, Nature.

[20]  C. Cannings,et al.  On the structure of protein-protein interaction networks. , 2003, Biochemical Society transactions.

[21]  Alex Bateman,et al.  The InterPro database, an integrated documentation resource for protein families, domains and functional sites , 2001, Nucleic Acids Res..

[22]  Jinyan Li,et al.  Bioinformatics Original Paper Discovering Motif Pairs at Interaction Sites from Protein Sequences on a Proteome-wide Scale , 2022 .

[23]  J. Spudich,et al.  Photoactivation Perturbs the Membrane-embedded Contacts between Sensory Rhodopsin II and Its Transducer* , 2005, Journal of Biological Chemistry.

[24]  I. Tews,et al.  Two independent routes of de novo vitamin B6 biosynthesis: not that different after all. , 2007, The Biochemical journal.

[25]  Xiaogang Wang,et al.  Clustering by common friends finds locally significant proteins mediating modules , 2007, Bioinform..