Automatic 3D Protein Structure Classification without Structural Alignment

In this paper, we present a new scheme named ProtClass for automatic classification of three-dimensional (3D) protein structures. It is a dedicated and unified multiclass classification scheme. Neither detailed structural alignment nor multiple binary classifications are required in this scheme. We adopt a nearest neighbor-based classification strategy. We use a filter-and-refine scheme. In the first step, we filter out the improbable answers using the precalculated parameters from the training data. In the second, we perform a relatively more detailed nearest neighbor search on the remaining answers. We use very concise and effective encoding schemes of the 3D protein structures in both steps. We compare our proposed method against two other dedicated protein structure classification schemes, namely SGM and CPMine. The experimental results show that ProtClass is slightly better in accuracy than SGM and much faster. In comparison with CPMine, ProtClass is much more accurate, while their running times are about the same. We also compare ProtClass against a structural alignment-based classification scheme named DALI, which is found to be more accurate, but extremely slow. The software is available upon request from the authors. The supplementary information on ProtClass method can be found at: http://xena1.ddns.comp.nus.edu.sg/ approximately genesis/PClass.htm.

[1]  A Chinnasamy,et al.  Protein structure and fold prediction using tree-augmented naive Bayesian classifier. , 2004, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[2]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[3]  Tim J. P. Hubbard,et al.  SCOP: a structural classification of proteins database , 1998, Nucleic Acids Res..

[4]  R. Nussinov,et al.  A 3D sequence-independent representation of the protein data bank. , 1995, Protein engineering.

[5]  Wei Wang,et al.  Accurate Classification of Protein Structural Families Using Coherent Subgraph Analysis , 2003, Pacific Symposium on Biocomputing.

[6]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[7]  Robert D. Carr,et al.  1001 Optimal PDB Structure Alignments: Integer Programming Methods for Finding the Maximum Contact Map Overlap , 2004, J. Comput. Biol..

[8]  Patrice Koehl,et al.  The ASTRAL compendium for protein structure and sequence analysis , 2000, Nucleic Acids Res..

[9]  P. Argos,et al.  Knowledge‐based protein secondary structure assignment , 1995, Proteins.

[10]  William R. Taylor,et al.  Structure Comparison and Structure Patterns , 2000, J. Comput. Biol..

[11]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[12]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[13]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[14]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[15]  Giuseppe Lancia,et al.  Protein Structure Comparison: Algorithms and Applications , 2003, Mathematical Methods for Protein Structure Analysis and Design.

[16]  Ankush Mittal,et al.  Protein Structure and Fold Prediction Using Tree-Augmented Bayesian Classifier , 2004, Pacific Symposium on Biocomputing.

[17]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[18]  Kian-Lee Tan,et al.  Automatic protein structure classification through structural fingerprinting , 2004, Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering.

[19]  Ambuj K. Singh,et al.  PSI: indexing protein structures for fast similarity search , 2003, ISMB.

[20]  Kian-Lee Tan,et al.  Rapid 3D protein structure database searching using information retrieval techniques , 2004, Bioinform..

[21]  P. Røgen,et al.  Automatic classification of protein structure by using Gauss integrals , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Liisa Holm,et al.  DaliLite workbench for protein structure comparison , 2000, Bioinform..

[23]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[24]  Douglas L. Brutlag,et al.  Hierarchical Protein Structure Superposition Using Both Secondary Structure and Atomic Representations , 1997, ISMB.

[25]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.