Sequence Similarity Searching Using the BLAST Family of Programs

The BLAST (Basic Local Alignment Search Tool) family of sequence similarity search programs allows users to input either a nucleotide or amino acid query sequence, and search a nucleotide or amino acid sequence database. The program returns a list of the sequence “hits”, alignments to the query sequence, and statistical values. This unit describes how to choose an appropriate BLAST program and database, perform the search, and interpret the results.

[1]  T. Wolfsberg,et al.  ADAM, a widely distributed and developmentally regulated gene family encoding membrane proteins with a disintegrin and metalloprotease domain. , 1995, Developmental biology.

[2]  Cathy H. Wu,et al.  The PIR-International Protein Sequence Database , 1999, Nucleic Acids Res..

[3]  David J. States,et al.  Identification of protein coding regions by database similarity search , 1993, Nature Genetics.

[4]  Hideaki Sugawara,et al.  DNA Data Bank of Japan at work on genome sequence data , 1998, Nucleic Acids Res..

[5]  Thomas L. Madden,et al.  PowerBLAST: a new network BLAST application for interactive or automated sequence analysis and annotation. , 1997, Genome research.

[6]  M. Boguski,et al.  dbEST — database for “expressed sequence tags” , 1993, Nature Genetics.

[7]  A. Kerlavage,et al.  Complementary DNA sequencing: expressed sequence tags and human genome project , 1991, Science.

[8]  Evelyn Camon,et al.  The EMBL Nucleotide Sequence Database , 2000, Nucleic Acids Res..

[9]  W. Pearson Rapid and sensitive sequence comparison with FASTP and FASTA. , 1990, Methods in enzymology.

[10]  M. Boguski,et al.  Database divisions and homology search files: a guide for the perplexed. , 1997, Genome research.

[11]  Jean-Michel Claverie,et al.  Alu alert , 1994, Nature.

[12]  A. C. Chinault,et al.  Deduced amino acid sequence of Escherichia coli adenosine deaminase reveals evolutionarily conserved amino acid residues: implications for catalytic function. , 1991, Biochemistry.

[13]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[14]  G. Schuler,et al.  Sequence alignment and database searching. , 2001, Methods of biochemical analysis.

[15]  J M Ostell,et al.  The NCBI data model. , 2001, Methods of biochemical analysis.

[16]  J. Wootton,et al.  Analysis of compositionally biased regions in sequence databases. , 1996, Methods in enzymology.

[17]  John C. Wootton,et al.  Statistics of Local Complexity in Amino Acid Sequences and Sequence Databases , 1993, Comput. Chem..

[18]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[19]  G. Evans,et al.  Genomic sequence sampling: a strategy for high resolution sequence–based physical mapping of complex genomes , 1994, Nature Genetics.

[20]  Piotr Berman,et al.  Alignments without low-scoring regions , 1998, RECOMB '98.

[21]  Y. Shiloh,et al.  The genetic defect in ataxia-telangiectasia. , 1997, Annual review of immunology.

[22]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[23]  S. Altschul,et al.  Issues in searching molecular sequence databases , 1994, Nature Genetics.

[24]  M. Brown,et al.  Retinal degeneration in choroideremia: deficiency of rab geranylgeranyl transferase. , 1993, Science.

[25]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[26]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Y Wang,et al.  Positional cloning of the gene for multiple endocrine neoplasia-type 1. , 1997, Science.

[28]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[29]  S. Karlin,et al.  Applications and statistics for multiple high-scoring segments in molecular sequences. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[30]  L. Hood,et al.  A common language for physical mapping of the human genome. , 1989, Science.

[31]  I. Kawagishi,et al.  Very fast flagellar rotation , 1994, Nature.

[32]  C Sander,et al.  An evolutionary treasure: unification of a broad set of amidohydrolases related to urease , 1997, Proteins.

[33]  S. Altschul Amino acid substitution matrices from an information theoretic perspective , 1991, Journal of Molecular Biology.

[34]  B F Ouellette,et al.  The GenBank sequence database. , 1998, Methods of biochemical analysis.

[35]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.