Finding Homologs in Amino Acid Sequences Using Network BLAST Searches

BLAST, Basic Local Alignment Search Tool is used more frequently than any other biosequence database search program. The purpose of this unit is not only to show how to run searches on the Web, but also to demonstrate how to fine‐tune arguments for a specific research project. It also offers guidance for interpreting results, handling statistical significance and biological relevance issues, and selecting complementary analyses. This unit covers three classes of the BLAST program: standard protein‐to‐protein searches, translated searches when either the query or the database consists of nucleotide sequences translated into proteins, and finally programs for comparing two sequences (as opposed to searching one sequence against a database of sequences).

[1]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[2]  David J. States,et al.  Identification of protein coding regions by database similarity search , 1993, Nature Genetics.

[3]  Tom Negrino,et al.  JavaScript for the World Wide Web , 1997 .

[4]  The Celera Discovery System TM , 2001 .

[5]  Victor V. Solovyev,et al.  The Prediction of Human Exons By Oligonucleotide Composition and Disriminant Analysis of Spliceable Open Reading Frames , 1994, ISMB.

[6]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[7]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[8]  E. Uberbacher,et al.  Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[9]  J. Jurka,et al.  Repeats in genomic DNA: mining and meaning. , 1998, Current opinion in structural biology.

[10]  J. Wootton,et al.  Analysis of compositionally biased regions in sequence databases. , 1996, Methods in enzymology.

[11]  Thomas L. Madden,et al.  Applications of network BLAST server. , 1996, Methods in enzymology.

[12]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[13]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[14]  R J Mural,et al.  Current status of computational gene finding: a perspective. , 1999, Methods in enzymology.

[15]  Russell F. Doolittle,et al.  “Homology” in proteins and nucleic acids: A terminology muddle and a way out of it , 1987, Cell.

[16]  S. Bryant,et al.  Threading a database of protein cores , 1995, Proteins.

[17]  T. N. Bhat,et al.  The Protein Data Bank: unifying the archive , 2002, Nucleic Acids Res..

[18]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[19]  A Bairoch,et al.  SWISS-PROT: connecting biomolecular knowledge via a protein database. , 2001, Current issues in molecular biology.

[20]  Lincoln Stein Official guide to programming with CGI.pm: the standard for building Web scripts , 1998 .

[21]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[22]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[23]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[24]  R. F. Smith,et al.  BEAUTY: an enhanced BLAST-based search tool that integrates multiple biological information resources into sequence similarity search results. , 1995, Genome research.

[25]  V. Kapitonov,et al.  The age of Alu subfamilies , 2004, Journal of Molecular Evolution.

[26]  X Huang,et al.  Fast comparison of a DNA sequence with a protein sequence database. , 1996, Microbial & comparative genomics.

[27]  Jian Zhang,et al.  The Protein Information Resource: an integrated public resource of functional annotation of proteins , 2002, Nucleic Acids Res..

[28]  T J Gibson,et al.  PairWise and SearchWise: finding the optimal alignment in a simultaneous comparison of a protein profile against all DNA translation frames. , 1996, Nucleic acids research.

[29]  G. Rubin,et al.  A computer program for aligning a cDNA sequence with a genomic DNA sequence. , 1998, Genome research.

[30]  A G Murzin,et al.  SCOP, Structural Classification of Proteins database: applications to evaluation of the effectiveness of sequence alignment methods and statistics of protein structural data. , 1998, Acta crystallographica. Section D, Biological crystallography.

[31]  Susumu Goto,et al.  The KEGG databases at GenomeNet , 2002, Nucleic Acids Res..

[32]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[33]  E. Uberbacher,et al.  Discovering and understanding genes in human DNA sequence using GRAIL. , 1996, Methods in enzymology.

[34]  Peter Li,et al.  The Celera Discovery SystemTM , 2002, Nucleic Acids Res..

[35]  S. Altschul,et al.  Issues in searching molecular sequence databases , 1994, Nature Genetics.

[36]  Tim J. P. Hubbard,et al.  SCOP database in 2002: refinements accommodate structural genomics , 2002, Nucleic Acids Res..

[37]  R. F. Smith,et al.  BEAUTY-X: enhanced BLAST searches for DNA queries , 1998, Bioinform..

[38]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[39]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[40]  Amos Bairoch,et al.  The PROSITE database, its status in 2002 , 2002, Nucleic Acids Res..

[41]  Jerzy Jurka,et al.  Censor - a Program for Identification and Elimination of Repetitive Elements From DNA Sequences , 1996, Comput. Chem..

[42]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[43]  Thomas L. Madden,et al.  BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences. , 1999, FEMS microbiology letters.

[44]  J. Zhang,et al.  Methods for comparing a DNA sequence with a protein sequence , 1996, Comput. Appl. Biosci..

[45]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[46]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[47]  J Schultz,et al.  SMART, a simple modular architecture research tool: identification of signaling domains. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[48]  Peer Bork,et al.  Recent improvements to the SMART domain-based sequence annotation resource , 2002, Nucleic Acids Res..