A Protein Secondary Structure Prediction Tool Using Two-Level Strategy to Improve the Prediction Accuracy of Secondary Structures and Structure Boundaries

An important limitation of current protein secondary structure prediction tools is the bad performance in locating the secondary structure boundaries. Efficiently utilize the residue position-specific preference around secondary structure boundaries can help to resolve this problem. TLSSP (Two Level Secondary Structure Predictor), proposed in this study, used a two-level strategy to utilize these properties efficiently and find the optimal global secondary structure. In TLSSP a set of binary classifiers were designed to recognize the boundaries of helices and strands firstly, then a global model based on condition random fields (CRFs) was built to predict the secondary structures. Five-fold cross-validation test on EVA dataset (containing 3744 proteins provided by EVA service) indicated that, TLSSP can get quite good performance on both boundaries prediction and global secondary structure prediction.

[1]  Jaime G. Carbonell,et al.  Protein Fold Recognition Using Segmentation Conditional Random Fields (SCRFs) , 2006, J. Comput. Biol..

[2]  Yang Zhang Progress and challenges in protein structure prediction. , 2008, Current opinion in structural biology.

[3]  David S. Wishart,et al.  Improving the accuracy of protein secondary structure prediction using structural alignment , 2006, BMC Bioinformatics.

[4]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[5]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[6]  Richard Bonneau,et al.  Ab initio protein structure prediction: progress and prospects. , 2001, Annual review of biophysics and biomolecular structure.

[7]  Lixiao Wang,et al.  OnD-CRF: prediciting order and disorder in proteins conditional random fields , 2008, Bioinform..

[8]  Xiaolong Wang,et al.  Protein-protein interaction site prediction based on conditional random fields , 2007, Bioinform..

[9]  V. Lim Algorithms for prediction of α-helical and β-structural regions in globular proteins , 1974 .

[10]  Christina Kiel,et al.  Analyzing protein interaction networks using structural information. , 2008, Annual review of biochemistry.

[11]  V. Lim Algorithms for prediction of alpha-helical and beta-structural regions in globular proteins. , 1974, Journal of molecular biology.

[12]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[13]  S. Hua,et al.  A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. , 2001, Journal of molecular biology.

[14]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[15]  Alessandro Vullo,et al.  Accurate prediction of protein secondary structure and solvent accessibility by consensus combiners of sequence and structure information , 2007, BMC Bioinformatics.

[16]  Min Huang,et al.  Position‐specific residue preference features around the ends of helices and strands and a novel strategy for the prediction of secondary structures , 2008, Protein science : a publication of the Protein Society.

[17]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[18]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[19]  Pierre Baldi,et al.  SCRATCH: a protein structure and structural feature prediction server , 2005, Nucleic Acids Res..

[20]  P. Y. Chou,et al.  Prediction of protein conformation. , 1974, Biochemistry.

[21]  David A. Lee,et al.  Predicting protein function from sequence and structure , 2007, Nature Reviews Molecular Cell Biology.

[22]  Bernard F. Buxton,et al.  Secondary structure prediction with support vector machines , 2003, Bioinform..

[23]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[24]  Jaime G. Carbonell,et al.  Comparison of probabilistic combination methods for protein secondary structure prediction , 2004, Bioinform..

[25]  Andrej Sali,et al.  Comparative Protein Structure Modeling and its Applications to Drug Discovery , 2004 .

[26]  O. Schueler‐Furman,et al.  Progress in Modeling of Protein Structures and Interactions , 2005, Science.

[27]  J. Galagan,et al.  Conrad: gene prediction using conditional random fields. , 2007, Genome research.

[28]  P. Bradley,et al.  High-resolution structure prediction and the crystallographic phase problem , 2007, Nature.

[29]  Marc A. Martí-Renom,et al.  EVA: evaluation of protein structure prediction servers , 2003, Nucleic Acids Res..

[30]  Janet M. Thornton,et al.  Understanding the molecular machinery of genetics through 3D structures , 2008, Nature Reviews Genetics.

[31]  Lixiao Wang,et al.  OnD-CRF: predicting order and disorder in proteins conditional random fields , 2008, Bioinform..