Query-Adaptive Ranking with Support Vector Machines for Protein Homology Prediction

Protein homology prediction is a crucial step in templatebased protein structure prediction. The functions that rank the proteins in a database according to their homologies to a query protein is the key to the success of protein structure prediction. In terms of information retrieval, such functions are called ranking functions, and are often constructed by machine learning approaches. Different from traditional machine learning problems, the feature vectors in the ranking-function learning problem are not identically and independently distributed, since they are calculated with regard to queries and may vary greatly in statistical characteristics from query to query. At present, few existing algorithms make use of the query-dependence to improve ranking performance. This paper proposes a query-adaptive ranking-function learning algorithm for protein homology prediction. Experiments with the support vector machine (SVM) used as the benchmark learner demonstrate that the proposed algorithm can significantly improve the ranking performance of SVMs in the protein homology prediction task.

[1]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[2]  R. Elber,et al.  Distance‐dependent, pair potential for protein folding: Results from linear optimization , 2000, Proteins.

[3]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[4]  K. Ginalski Comparative modeling for protein structure prediction. , 2006, Current opinion in structural biology.

[5]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[6]  Thorsten Joachims,et al.  KDD-Cup 2004: results and analysis , 2004, SKDD.

[7]  Koby Crammer,et al.  Pranking with Ranking , 2001, NIPS.

[8]  Fredric C. Gey,et al.  Inferring probability of relevance using the method of logistic regression , 1994, SIGIR '94.

[9]  R. L. Bradshaw,et al.  RESULTS AND ANALYSIS. , 1971 .

[10]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[11]  Martin Scholz,et al.  KDD-Cup 2004: protein homology task , 2004, SKDD.

[12]  Gert R. G. Lanckriet,et al.  Metric Learning to Rank , 2010, ICML.

[13]  Ramesh Nallapati,et al.  Discriminative models for information retrieval , 2004, SIGIR '04.

[14]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[15]  Yang Zhang Progress and challenges in protein structure prediction. , 2008, Current opinion in structural biology.

[17]  Yoram Singer,et al.  Learning to Order Things , 1997, NIPS.

[18]  S. Sathiya Keerthi,et al.  Efficient algorithms for ranking with SVMs , 2010, Information Retrieval.

[19]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[20]  Yang Zhang,et al.  The protein structure prediction problem could be solved using the current PDB library. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Ron Elber,et al.  Enriching the sequence substitution matrix by structural information , 2003, Proteins.

[22]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[23]  Norbert Fuhr,et al.  Optimum polynomial retrieval functions based on the probability ranking principle , 1989, TOIS.

[24]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[25]  Yanqing Zhang,et al.  Granular support vector machines with association rules mining for protein homology prediction , 2005, Artif. Intell. Medicine.

[26]  Wen Gao,et al.  A block-based support vector machine approach to the protein homology prediction task in KDD Cup 2004 , 2004, SKDD.

[27]  Bernhard Pfahringer,et al.  The Weka solution to the 2004 KDD Cup , 2004, SKDD.