A probabilistic model of nuclear import of proteins

MOTIVATION Nucleo-cytoplasmic trafficking of proteins is a core regulatory process that sustains the integrity of the nuclear space of eukaryotic cells via an interplay between numerous factors. Despite progress on experimentally characterizing a number of nuclear localization signals, their presence alone remains an unreliable indicator of actual translocation. RESULTS This article introduces a probabilistic model that explicitly recognizes a variety of nuclear localization signals, and integrates relevant amino acid sequence and interaction data for any candidate nuclear protein. In particular, we develop and incorporate scoring functions based on distinct classes of classical nuclear localization signals. Our empirical results show that the model accurately predicts whether a protein is imported into the nucleus, surpassing the classification accuracy of similar predictors when evaluated on the mouse and yeast proteomes (area under the receiver operator characteristic curve of 0.84 and 0.80, respectively). The model also predicts the sequence position of a nuclear localization signal and whether it interacts with importin-α. AVAILABILITY http://pprowler.itee.uq.edu.au/NucImport

[1]  M. Tomita,et al.  Six Classes of Nuclear Localization Signals Specific to Different Binding Grooves of Importin α* , 2009, Journal of Biological Chemistry.

[2]  Charles Elkan,et al.  Expectation Maximization Algorithm , 2010, Encyclopedia of Machine Learning.

[3]  Yuh Min Chook,et al.  Rules for nuclear localization sequence recognition by karyopherin beta 2. , 2006, Cell.

[4]  Markus Brameier,et al.  BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btm066 Sequence analysis NucPred—Predicting nuclear localization of proteins , 2007 .

[5]  Mikael Bodén,et al.  Molecular basis for specificity of nuclear import and prediction of nuclear localization. , 2011, Biochimica et biophysica acta.

[6]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[7]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[8]  E. O’Shea,et al.  Global analysis of protein localization in budding yeast , 2003, Nature.

[9]  B. Rost,et al.  Finding nuclear localization signals , 2000, EMBO reports.

[10]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[11]  Eleazar Eskin,et al.  The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[12]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[13]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[14]  B. Rost,et al.  Mimicking cellular sorting improves prediction of subcellular localization. , 2005, Journal of molecular biology.

[15]  Chikatoshi Kai,et al.  Towards defining the nuclear proteome , 2008, Genome Biology.

[16]  C. Christophe-Hobertus,et al.  Nuclear targeting of proteins: how many different signals? , 2000, Cellular signalling.

[17]  K. Nakai,et al.  PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. , 1999, Trends in biochemical sciences.

[18]  T. Moon The expectation-maximization algorithm , 1996, IEEE Signal Process. Mag..

[19]  John D. Aitchison,et al.  Cell biology: Pore puzzle , 2007, Nature.

[20]  Christian von Mering,et al.  STRING 8—a global view on proteins and their functional interactions in 630 organisms , 2008, Nucleic Acids Res..

[21]  Piero Fariselli,et al.  BaCelLo: a balanced subcellular localization predictor , 2006, ISMB.

[22]  B. Chait,et al.  The molecular architecture of the nuclear pore complex , 2007, Nature.

[23]  John Hawkins,et al.  Predicting nuclear localization. , 2007, Journal of proteome research.

[24]  M. Tomita,et al.  Systematic identification of cell cycle-dependent yeast nucleocytoplasmic shuttling proteins by prediction of composite motifs , 2009, Proceedings of the National Academy of Sciences.

[25]  M. Hodel,et al.  Dissection of a Nuclear Localization Signal* , 2001, The Journal of Biological Chemistry.

[26]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[27]  Burkhard Rost,et al.  NLSdb: database of nuclear localization signals , 2003, Nucleic Acids Res..

[28]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[29]  Y. Chook,et al.  Rules for Nuclear Localization Sequence Recognition by Karyopherinβ2 , 2006, Cell.

[30]  Bostjan Kobe,et al.  Structural Basis for the Specificity of Bipartite Nuclear Localization Sequence Binding by Importin-α* , 2003, Journal of Biological Chemistry.

[31]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[32]  P. Bartlett,et al.  Probabilities for SV Machines , 2000 .

[33]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[34]  Alan M. Moses,et al.  NLStradamus: a simple Hidden Markov Model for nuclear localization signal prediction , 2009, BMC Bioinformatics.

[35]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[36]  G. Blobel,et al.  Crystallographic Analysis of the Recognition of a Nuclear Localization Signal by the Nuclear Import Factor Karyopherin α , 1998, Cell.