Transductive learning with EM algorithm to classify proteins based on phylogenetic profiles

We proposed a novel method for protein classification based on phylogenetic profiles. Each protein's profile was extended with extra bits encoding the phylogenetic tree structure and the likelihood, in the form of weights on profile indices, of the protein's functional family membership in each of the reference genomes. The extended profiles were then integrated as part of a kernel of a support vector machine, which was trained in a transductive learning scheme using the EM algorithm to update the weights. Classification accuracy was greatly increased when tested on the proteome of Saccharomyces cerevisiae using the MIPS classification as a benchmark.

[1]  Nello Cristianini,et al.  Advances in Kernel Methods - Support Vector Learning , 1999 .

[2]  Jason Weston,et al.  Gene functional classification from heterogeneous data , 2001, RECOMB.

[3]  Li Liao,et al.  Use of Extended Phylogenetic Profiles with E-Values and Support Vector Machines for Protein Family Classification , 2005 .

[4]  Jean-Philippe Vert A tree kernel to analyze phylog enetic profi les , 2002 .

[5]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[6]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[7]  David Haussler,et al.  Using the Fisher Kernel Method to Detect Remote Protein Homologies , 1999, ISMB.

[8]  Arne Elofsson,et al.  The Use of Phylogenetic Profiles for Gene Predictions , 2002 .

[9]  Jason Weston,et al.  Mismatch string kernels for discriminative protein classification , 2004, Bioinform..

[11]  Li Liao,et al.  Combining Pairwise Sequence Similarity and Support Vector Machines for Detecting Remote Protein Evolutionary and Structural Relationships , 2003, J. Comput. Biol..

[12]  Anton J. Enright,et al.  Protein interaction maps for complete genomes based on gene fusion events , 1999, Nature.

[13]  Michael Gribskov,et al.  Use of Receiver Operating Characteristic (ROC) Analysis to Evaluate Sequence Matching , 1996, Comput. Chem..

[14]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[15]  Li Liao,et al.  Iterative weighting of phylogenetic profiles increases classification accuracy , 2005, Fourth International Conference on Machine Learning and Applications (ICMLA'05).

[16]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[17]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[18]  Jean-Philippe Vert,et al.  A tree kernel to analyse phylogenetic profiles , 2002, ISMB.

[19]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 2000, Nucleic Acids Res..

[21]  D. Eisenberg,et al.  A combined algorithm for genome-wide prediction of protein function , 1999, Nature.

[22]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.