Selecting near-native structures from decoys using maximal cliques

Protein structure prediction is one of the most important subjects in computational structural biology. In the process of protein structure prediction, many structure decoys are obtained. It has remained an unsolved and challenging problem to select the best model from the structure decoys that are closest to the native structure. One of the important methods for selecting the near-native structure is by clustering the structure decoys. The traditional methods simply use clustering methods which are usually not appropriate in the high dimensional conformation space. Here we propose a method based on maximal cliques in graph theory to solve this problem. The similarities between the decoys are first computed using TM-score, and a graph is built using the shared nearest neighbor (SNN) information among the decoys. Then the maximal cliques of the graph are found and the centroids of these maximal cliques are selected as near-native structures. The experiments show that, compared to the traditional methods, the proposed method can select better near-native structures which have higher similarities with the native structures.

[1]  D. Baker,et al.  Clustering of low-energy conformations near the native structures of small proteins. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Ray A. Jarvis,et al.  Clustering Using a Similarity Measure Based on Shared Near Neighbors , 1973, IEEE Transactions on Computers.

[3]  Hui Xiong,et al.  High-dimensional clustering: a clique-based hypergraph partitioning framework , 2012, Knowledge and Information Systems.

[4]  Zhiyong Zhang,et al.  An Overview of Protein Structure Prediction : From Homology to Ab Initio , 2003 .

[5]  Yang Zhang,et al.  I-TASSER server for protein 3D structure prediction , 2008, BMC Bioinformatics.

[6]  David Eppstein,et al.  Listing All Maximal Cliques in Sparse Graphs in Near-optimal Time , 2010, Exact Complexity of NP-hard Problems.

[7]  K Fidelis,et al.  A large‐scale experiment to assess protein structure prediction methods , 1995, Proteins.

[8]  R Samudrala,et al.  A graph-theoretic algorithm for comparative modeling of protein structure. , 1998, Journal of molecular biology.

[9]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[10]  Yang Zhang,et al.  SPICKER: A clustering approach to identify near‐native protein folds , 2004, J. Comput. Chem..

[11]  Yang Zhang,et al.  The I-TASSER Suite: protein structure and function prediction , 2014, Nature Methods.

[12]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[13]  Daniel J. Rigden,et al.  From Protein Structure to Function with Bioinformatics , 2009 .

[14]  Yang Zhang,et al.  I-TASSER: a unified platform for automated protein structure and function prediction , 2010, Nature Protocols.

[15]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Werner Dubitzky,et al.  Towards Automated Case Knowledge Discovery in the M2 Case-Based Reasoning System , 1999, Knowledge and Information Systems.