Classifying Protein Specific Residue Structures Based on Graph Mining

We proposed a 3-D protein structure using a simple and connected graph, where nodes indicate amino acids and edges represent contact distances between amino acids. Based on these graph structures, we present a graph mining algorithm to determine the crucial subgraphs in these graphs, which can be applied to classify protein structural families. The proposed algorithm was compared with BLAST, BLAT, and DALI. Moreover, an experiment was conducted, in which characteristic sub-structural patterns were found in several protein families within the Protein Data Bank.

[1]  S. Parthasarathy,et al.  Protein thermal stability: insights from atomic displacement parameters (B values). , 2000, Protein engineering.

[2]  Zheng Yuan,et al.  Prediction of protein B‐factor profiles , 2005, Proteins.

[3]  Saraswathi Vishveshwara,et al.  PROTEIN STRUCTURE: INSIGHTS FROM GRAPH THEORY , 2002 .

[4]  G. Athithan,et al.  A comparative survey of algorithms for frequent subgraph discovery , 2011 .

[5]  D. Tronrud,et al.  Knowledge-Based B-Factor Restraints for the Refinement of Proteins , 1996 .

[6]  Kian-Lee Tan,et al.  Automatic protein structure classification through structural fingerprinting , 2004, Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering.

[7]  Kim Nasmyth,et al.  Closing the cohesin ring: Structure and function of its Smc3-kleisin interface , 2014, Science.

[8]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[9]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[10]  Christian Böhm,et al.  Frequent subgraph discovery in dynamic networks , 2010, MLG '10.

[11]  Monique Laberge,et al.  Common dynamics of globin family proteins , 2007, IUBMB life.

[12]  J. Snoeyink,et al.  USING FAST SUBGRAPH ISOMORPHISM CHECKING FOR PROTEIN FUNCTIONAL ANNOTATION USING SCOP AND GENE ONTOLOGY , 2004 .

[13]  Jie Gui,et al.  Tumor classification by combining PNN classifier ensemble with neighborhood rough set based gene reduction , 2010, Comput. Biol. Medicine.

[14]  Wajdi Dhifli,et al.  ProtNN: Fast and Accurate Nearest Neighbor Protein Function Prediction based on Graph Embedding in Structural and Topological Space , 2015, ArXiv.

[15]  Eyke Hüllermeier,et al.  Efficient similarity search in protein structure databases by k-clique hashing , 2004, Bioinform..

[16]  Jack Snoeyink,et al.  Functional neighbors: inferring relationships between nonhomologous protein families using family-specific packing motifs , 2010, IEEE Trans. Inf. Technol. Biomed..

[17]  Changjun Jiang,et al.  A New Strategy for Protein Interface Identification Using Manifold Learning Method , 2014, IEEE Transactions on NanoBioscience.

[18]  Keith C. C. Chan,et al.  A Graph Mining Algorithm for Classifying Chemical Compounds , 2008, 2008 IEEE International Conference on Bioinformatics and Biomedicine.

[19]  Wei Wang,et al.  Graph Database Indexing Using Structured Graph Decomposition , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[20]  M. Vidal,et al.  The retinoblastoma protein binds to a family of E2F transcription factors , 1993, Molecular and cellular biology.

[21]  Wei Wang,et al.  Mining protein family specific residue packing patterns from protein structure graphs , 2004, RECOMB.

[22]  Joost N. Kok,et al.  Frequent graph mining and its application to molecular databases , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[23]  Lawrence B. Holder,et al.  Substucture Discovery in the SUBDUE System , 1994, KDD Workshop.

[24]  Wei Wang,et al.  Comparing Graph Representations of Protein Structure for Mining Family-Specific Residue-Based Packing Motifs , 2005, J. Comput. Biol..

[25]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[26]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[27]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[28]  Eileen Remold-O'Donnell,et al.  The ovalbumin family of serpin proteins , 1993, FEBS letters.

[29]  D. Landsman,et al.  Common sequence and structural features in the heat-shock factor and Ets families of DNA-binding domains. , 1995, Trends in biochemical sciences.

[30]  Zong-Ying Yang,et al.  A Novel Algorithm for Classifying Protein Structure Familiar by Using the Graph Mining Approach , 2015, ICIC.

[31]  M. Grzybek,et al.  Structural basis of dynamic membrane recognition by trans-Golgi network specific FAPP proteins. , 2015, Journal of molecular biology.

[32]  Shmuel Pietrokovski,et al.  Blocks+: a non-redundant database of protein alignment blocks derived from multiple compilations , 1999, Bioinform..

[33]  De-Shuang Huang,et al.  Novel 20-D descriptors of protein sequences and it’s applications in similarity analysis , 2012 .

[34]  M. Sternberg,et al.  Automated structure-based prediction of functional sites in proteins: applications to assessing the validity of inheriting protein function from homology in genome annotation and to protein docking. , 2001, Journal of molecular biology.

[35]  Liisa Holm,et al.  Dali server: conservation mapping in 3D , 2010, Nucleic Acids Res..

[36]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[37]  A. Petros,et al.  Structural biology of the Bcl-2 family of proteins. , 2004, Biochimica et biophysica acta.

[38]  R. Nussinov,et al.  Three‐dimensional, sequence order‐independent structural comparison of a serine protease against the crystallographic database reveals active site similarities: Potential implications to evolution and to protein folding , 1994, Protein science : a publication of the Protein Society.