In silico prediction of yeast deletion phenotypes.

Analysis of gene deletions is a fundamental approach for investigating gene function. We evaluated an algorithm that uses classification techniques to predict the phenotypic effects of gene deletions in yeast. We used a modified simulated annealing algorithm for feature selection and weighting. The selected features with high weights were phylogenetic conservation scores for bacteria, fungi (excluding Ascomycota), Ascomycota (excluding Saccharomyces cerevisiae), plants, and mammals, degree of paralogy, and number of protein-protein interactions. Classification was performed by weighted k-nearest neighbor and with support vector machine algorithms. To demonstrate how this approach might complement existing experimental procedures, we applied our algorithm to predict essential genes and genes causing morphological alterations in yeast.

[1]  D. M. Krylov,et al.  Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution. , 2003, Genome research.

[2]  George Karypis,et al.  Gene Classification Using Expression Profiles: A Feasibility Study , 2005, Int. J. Artif. Intell. Tools.

[3]  Stanley Falkow,et al.  Global Transposon Mutagenesis and Essential Gene Analysis of Helicobacter pylori , 2004, Journal of bacteriology.

[4]  Anton J. Enright,et al.  COmplete GENome Tracking (COGENT): A Flexible Data Environment for Computational Genomics , 2003, Bioinform..

[5]  N. Takahata,et al.  Genetics and Molecular Research , 2006 .

[6]  K. H. Wolfe,et al.  Relationship of codon bias to mRNA concentration and protein length in Saccharomyces cerevisiae , 2000, Yeast.

[7]  Nicholas Kalouptsidis,et al.  Nearest neighbor pattern classification neural networks , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[8]  Dong Xu,et al.  Understanding protein dispensability through machine-learning analysis of high-throughput data , 2005, Bioinform..

[9]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[10]  S. Ehrlich,et al.  Essential Bacillus subtilis genes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[11]  C. Ouzounis,et al.  Genome-wide identification of genes likely to be involved in human genetic disease. , 2004, Nucleic acids research.

[12]  S. Coulomb,et al.  Gene essentiality and the topology of protein interaction networks , 2005, Proceedings of the Royal Society B: Biological Sciences.

[13]  P. Philippsen,et al.  New heterologous modules for classical or PCR‐based gene disruptions in Saccharomyces cerevisiae , 1994, Yeast.

[14]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[15]  Ronald W. Davis,et al.  Functional profiling of the Saccharomyces cerevisiae genome , 2002, Nature.

[16]  C. Reeves Modern heuristic techniques for combinatorial problems , 1993 .

[17]  Ron D. Appel,et al.  ExPASy: the proteomics server for in-depth protein knowledge and analysis , 2003, Nucleic Acids Res..

[18]  J. Mekalanos,et al.  A genome-scale analysis for identification of genes required for growth or survival of Haemophilus influenzae , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Hiroshi Mizoguchi,et al.  Cell size and nucleoid organization of engineered Escherichia coli cells with a reduced genome , 2004, Molecular microbiology.

[20]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[21]  F. Glover,et al.  In Modern Heuristic Techniques for Combinatorial Problems , 1993 .

[22]  T. Conway,et al.  In search of the minimal Escherichia coli genome. , 2003, Trends in microbiology.

[23]  Damian Smedley,et al.  Ensembl 2005 , 2004, Nucleic Acids Res..

[24]  Ron Kohavi,et al.  The Wrapper Approach , 1998 .

[25]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[26]  O. Ozier-Kalogeropoulos,et al.  A simple and efficient method for direct gene deletion in Saccharomyces cerevisiae. , 1993, Nucleic acids research.

[27]  D. Eisenberg,et al.  Protein interaction databases. , 2001, Current opinion in biotechnology.