NLStradamus: a simple Hidden Markov Model for nuclear localization signal prediction

BackgroundNuclear localization signals (NLSs) are stretches of residues within a protein that are important for the regulated nuclear import of the protein. Of the many import pathways that exist in yeast, the best characterized is termed the 'classical' NLS pathway. The classical NLS contains specific patterns of basic residues and computational methods have been designed to predict the location of these motifs on proteins. The consensus sequences, or patterns, for the other import pathways are less well-understood.ResultsIn this paper, we present an analysis of characterized NLSs in yeast, and find, despite the large number of nuclear import pathways, that NLSs seem to show similar patterns of amino acid residues. We test current prediction methods and observe a low true positive rate. We therefore suggest an approach using hidden Markov models (HMMs) to predict novel NLSs in proteins. We show that our method is able to consistently find 37% of the NLSs with a low false positive rate and that our method retains its true positive rate outside of the yeast data set used for the training parameters.ConclusionOur implementation of this model, NLStradamus, is made available at: http://www.moseslab.csb.utoronto.ca/NLStradamus/

[1]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[2]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[3]  S. Fiske,et al.  The Handbook of Social Psychology , 1935 .

[4]  Ryan E. Mills,et al.  Classical Nuclear Localization Signals: Definition, Function, and Interaction with Importin α* , 2007, Journal of Biological Chemistry.

[5]  D. Goldfarb,et al.  Importin alpha: a multipurpose nuclear-transport receptor. , 2004, Trends in cell biology.

[6]  Deutsches Krebsforschungszentrum,et al.  Nucleocytoplasmic Transport , 1986, Springer Berlin Heidelberg.

[7]  G. Blobel,et al.  Crystallographic Analysis of the Recognition of a Nuclear Localization Signal by the Nuclear Import Factor Karyopherin α , 1998, Cell.

[8]  D. Jans,et al.  Regulation of Nuclear Transport: Central Role in Development and Transformation? , 2005, Traffic.

[9]  S. Mirski,et al.  Sequence determinants of nuclear localization in the alpha and beta isoforms of human topoisomerase II. , 1999, Experimental cell research.

[10]  S. Adam,et al.  The nuclear pore complex , 2001, Genome Biology.

[11]  D. Goldfarb,et al.  Importin α: A multipurpose nuclear-transport receptor , 2004 .

[12]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[13]  M. Hatanaka,et al.  Discovery of the nucleolar targeting signal. , 1990, BioEssays : news and reviews in molecular, cellular and developmental biology.

[14]  Søren Brunak,et al.  Analysis and prediction of leucine-rich nuclear export signals. , 2004, Protein engineering, design & selection : PEDS.

[15]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[16]  M. Hatanaka,et al.  My favourite molecule: Discovery of the nucleolar targeting signal , 1990 .

[17]  M. Rout,et al.  Isolation of the yeast nuclear pore complex , 1993, The Journal of cell biology.

[18]  B. Kobe,et al.  Structural basis of recognition of monopartite and bipartite nuclear localization sequences by mammalian importin-alpha. , 2000, Journal of molecular biology.

[19]  A E Smith,et al.  Extensive mutagenesis of the nuclear location signal of simian virus 40 large-T antigen , 1986, Molecular and cellular biology.

[20]  G. Blobel,et al.  Nuclear protein import: Ran-GTP dissociates the karyopherin alphabeta heterodimer by displacing alpha from an overlapping binding site on beta. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[21]  B. Rost,et al.  Finding nuclear localization signals , 2000, EMBO reports.

[22]  I. Dokal,et al.  Dyskeratosis congenita in all its forms , 2000, British journal of haematology.

[23]  N. Shimozawa,et al.  BLM (the causative gene of Bloom syndrome) protein translocation into the nucleus by a nuclear localization signal. , 1997, Biochemical and biophysical research communications.

[24]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[25]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[26]  K. Nakai,et al.  PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. , 1999, Trends in biochemical sciences.

[27]  D. Jans,et al.  Regulation of protein transport to the nucleus: central role of phosphorylation. , 1996, Physiological reviews.

[28]  G. Schlenstedt,et al.  Classical NLS proteins from Saccharomyces cerevisiae. , 2008, Journal of molecular biology.