Understanding protein dispensability through machine-learning analysis of high-throughput data

MOTIVATION Protein dispensability is fundamental to the understanding of gene function and evolution. Recent advances in generating high-throughput data such as genomic sequence data, protein-protein interaction data, gene-expression data and growth-rate data of mutants allow us to investigate protein dispensability systematically at the genome scale. RESULTS In our studies, protein dispensability is represented as a fitness score that is measured by the growth rate of gene-deletion mutants. By the analyses of high-throughput data in yeast Saccharomyces cerevisiae, we found that a protein's dispensability had significant correlations with its evolutionary rate and duplication rate, as well as its connectivity in protein-protein interaction network and gene-expression correlation network. Neural network and support vector machine were applied to predict protein dispensability through high-throughput data. Our studies shed some lights on global characteristics of protein dispensability and evolution. AVAILABILITY The original datasets for protein dispensability analysis and prediction, together with related scripts, are available at http://digbio.missouri.edu/~ychen/ProDispen/ CONTACT xudong@missouri.edu.

[1]  A. Lawton-Rauh Evolutionary dynamics of duplicated genes in plants. , 2003, Molecular phylogenetics and evolution.

[2]  C. Pál,et al.  Genomic function: Rate of evolution and gene dispensability. , 2003, Nature.

[3]  W R Pearson,et al.  Flexible sequence similarity searching with the FASTA3 program package. , 2000, Methods in molecular biology.

[4]  Massimo Marchiori,et al.  Error and attacktolerance of complex network s , 2004 .

[5]  R. Balling,et al.  Systematic approaches to mouse mutagenesis. , 2001, Current opinion in genetics & development.

[6]  Gary D Bader,et al.  Global Mapping of the Yeast Genetic Interaction Network , 2004, Science.

[7]  C. Pál,et al.  Dosage sensitivity and the evolution of gene families in yeast , 2003, Nature.

[8]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[9]  L. T. Hunt,et al.  Evolution of protein complexity: The blue copper-containing oxidases and related proteins , 2006, Journal of Molecular Evolution.

[10]  B. Barrell,et al.  Life with 6000 Genes , 1996, Science.

[11]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[12]  B. Barrell,et al.  The genome sequence of Schizosaccharomyces pombe , 2002, Nature.

[13]  Albert-László Barabási,et al.  Error and attack tolerance of complex networks , 2000, Nature.

[14]  Michael Y. Galperin,et al.  The COG database: a tool for genome-scale analysis of protein functions and evolution , 2000, Nucleic Acids Res..

[15]  A. E. Hirsh,et al.  Protein dispensability and rate of evolution , 2001, Nature.

[16]  Laurence D. Hurst,et al.  Do essential genes evolve slowly? , 1999, Current Biology.

[17]  D. M. Krylov,et al.  Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution. , 2003, Genome research.

[18]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[19]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[20]  T. Ohta Slightly Deleterious Mutant Substitutions in Evolution , 1973, Nature.

[21]  Andreas Zell,et al.  The SNNS Neural Network Simulator , 1991, GWAI.

[22]  W. Li,et al.  Selective constraints, amino acid composition, and the rate of protein evolution. , 2000, Molecular biology and evolution.

[23]  Michael E Greenberg,et al.  A Defect in Nurturing in Mice Lacking the Immediate Early Gene fosB , 1996, Cell.

[24]  M. Gerstein,et al.  Relating whole-genome expression data with protein-protein interactions. , 2002, Genome research.

[25]  W. J. Dickinson,et al.  Marginal fitness contributions of nonessential genes in yeast. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Ronald W. Davis,et al.  Systematic screen for human disease genes in yeast , 2002, Nature Genetics.

[27]  Paul Nurse,et al.  Schizosaccharomyces pombe essential genes: a pilot study. , 2003, Genome research.

[28]  T. Joshi,et al.  Genome-scale gene function prediction using multiple sources of high-throughput data in yeast Saccharomyces cerevisiae. , 2004, Omics : a journal of integrative biology.

[29]  T. Pawson,et al.  Assembly of Cell Regulatory Systems Through Protein Interaction Domains , 2003, Science.

[30]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[32]  Eugene V Koonin,et al.  No simple dependence between protein evolution rate and the number of protein-protein interactions: only the most prolific interactors tend to evolve slowly , 2003, BMC Evolutionary Biology.

[33]  Ronald W. Davis,et al.  Role of duplicate genes in genetic robustness against null mutations , 2003, Nature.

[34]  Nada Amin,et al.  Global architecture of genetic interactions on the protein network , 2003, Nature Biotechnology.

[35]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[36]  Karl J. Friston,et al.  Metabolic network analysis of the causes and evolution of enzyme dispensability in yeast , 2004 .

[37]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.