Coupling Graphs, Efficient Algorithms and B-Cell Epitope Prediction

Coupling graphs are newly introduced in this paper to meet many application needs particularly in the field of bioinformatics. A coupling graph is a two-layer graph complex, in which each node from one layer of the graph complex has at least one connection with the nodes in the other layer, and vice versa. The coupling graph model is sufficiently powerful to capture strong and inherent associations between subgraph pairs in complicated applications. The focus of this paper is on mining algorithms of frequent coupling subgraphs and bioinformatics application. Although existing frequent subgraph mining algorithms are competent to identify frequent subgraphs from a graph database, they perform poorly on frequent coupling subgraph mining because they generate many irrelevant subgraphs. We propose a novel graph transformation technique to transform a coupling graph into a generic graph. Based on the transformed coupling graphs, existing graph mining methods are then utilized to discover frequent coupling subgraphs. We prove that the transformation is precise and complete and that the restoration is reversible. Experiments carried out on a database containing 10,511 coupling graphs show that our proposed algorithm reduces the mining time very much in comparison with the existing subgraph mining algorithms. Moreover, we demonstrate the usefulness of frequent coupling subgraphs by applying our algorithm to make accurate predictions of epitopes in antibody-antigen binding.

[1]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[2]  David P. Dobkin,et al.  The quickhull algorithm for convex hulls , 1996, TOMS.

[3]  Ehud Gudes,et al.  Computing frequent graph patterns from semistructured data , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[4]  Zhenhua Li,et al.  Progressive dry-core-wet-rim hydration trend in a nested-ring topology of protein binding interfaces , 2012, BMC Bioinformatics.

[5]  Salvador Eugenio C. Caoili,et al.  B-cell epitope prediction for peptide-based vaccine design: towards a paradigm of biological outcomes for global health , 2011 .

[6]  Mark Johnson,et al.  NCBI BLAST: a better web interface , 2008, Nucleic Acids Res..

[7]  Joost N. Kok,et al.  Frequent graph mining and its application to molecular databases , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[8]  Avner Schlessinger,et al.  Towards a consensus on datasets and evaluation metrics for developing B‐cell epitope prediction tools , 2007, Journal of molecular recognition : JMR.

[9]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[10]  Jeffrey Xu Yu,et al.  Efficient Discovery of Frequent Correlated Subgraph Pairs , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[11]  Bart Deplancke,et al.  Gene Regulatory Networks , 2012, Methods in Molecular Biology.

[12]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[13]  M. Pellegrini,et al.  Protein Interaction Networks , 2004, Expert review of proteomics.

[14]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[15]  George Karypis,et al.  Discovering frequent geometric subgraphs , 2007, Inf. Syst..

[16]  Jinyan Li,et al.  Antibody-Specified B-Cell Epitope Prediction in Line with the Principle of Context-Awareness , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[17]  Christian Borgelt,et al.  Mining molecular fragments: finding relevant substructures of molecules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[18]  Stephen A. Cook,et al.  The complexity of theorem-proving procedures , 1971, STOC.

[19]  Peer Bork,et al.  Systematic Association of Genes to Phenotypes by Genome and Literature Mining , 2005, PLoS biology.

[20]  Chris Bailey-Kellogg,et al.  Ballast: A Ball-Based Algorithm for Structural Motifs , 2012, RECOMB.

[21]  T. Meinl,et al.  The ParMol Package for Frequent Subgraph Mining , 2007, Electron. Commun. Eur. Assoc. Softw. Sci. Technol..

[22]  Wilfred Ng,et al.  Correlation search in graph databases , 2007, KDD '07.

[23]  Hui Xiong,et al.  Mining strong affinity association patterns in data sets with skewed support distribution , 2003, Third IEEE International Conference on Data Mining.

[24]  K. Tsuda,et al.  Mining Significant Substructure Pairs for Interpreting Polypharmacology in Drug-Target Network , 2011, PloS one.

[25]  Tomonobu Ozaki,et al.  Mining Correlated Subgraphs in Graph Databases , 2008, PAKDD.

[26]  Hiroki Arimura,et al.  LCM ver.3: collaboration of array, bitmap and prefix tree for frequent itemset mining , 2005 .

[27]  Yi Wen Kong,et al.  How do microRNAs regulate gene expression? , 2008, Biochemical Society transactions.

[28]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[29]  A. Ballabio,et al.  Identification of microRNA-regulated gene networks by expression analysis of target genes , 2012, Genome research.

[30]  Jinyan Li,et al.  Mining for the antibody-antigen interacting associations that predict the B cell epitopes , 2010, BMC Structural Biology.

[31]  E. Padlan,et al.  Antibody-antigen complexes. , 1988, Annual review of biochemistry.

[32]  Jinyan Li,et al.  B-cell epitope prediction through a graph model , 2012, BMC Bioinformatics.