论文信息 - Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSI-BLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

[1] V. Georgiev. Virology , 1955, Nature.

[2] M. O. Dayhoff,et al. Atlas of protein sequence and structure , 1965 .

[3] S. B. Needleman,et al. A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[4] M. Scawen,et al. The amino acid sequence of leghaemoglobin I from root nodules of broad bean (Vicia faba L.) , 1975, FEBS letters.

[5] D. Huylebroeck,et al. Complete structure of the hemagglutinin gene from the human influenza A/Victoria/3/75 (H3N2) strain as determined from cloned DNA , 1980, Cell.

[6] G. Braunitzer,et al. [Hemoglobins, XXXIII. Note on the Sequence of the hemoglobins of the horse (author's transl)]. , 1980, Hoppe-Seyler's Zeitschrift fur physiologische Chemie.

[7] Robert C. Bolles,et al. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[8] M S Waterman,et al. Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[9] O. Gotoh. An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[10] David Sankoff,et al. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison , 1983 .

[11] A. Mclachlan,et al. Analysis of gene duplication repeats in the myosin rod. , 1983, Journal of molecular biology.

[12] D. Lipman,et al. Rapid similarity searches of nucleic acid and protein data banks. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[13] T. Smith,et al. Optimal sequence alignments. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[14] P. Sellers. Pattern recognition in genetic sequences by mismatch density , 1984 .

[15] R Staden. Computer methods to locate signals in nucleic acid sequences , 1984, Nucleic Acids Res..

[16] Temple F. Smith,et al. The statistical distribution of nucleic acid similarities. , 1985, Nucleic acids research.

[17] T. D. Schneider,et al. Information content of binding sites on nucleotide sequences. , 1986, Journal of molecular biology.

[18] S. Altschul,et al. Optimal sequence alignment using affine gap costs. , 1986, Bulletin of mathematical biology.

[19] S F Altschul,et al. Locally optimal subalignments using nonlinear similarity functions. , 1986, Bulletin of mathematical biology.

[20] W. Taylor,et al. Identification of protein sequence homology by consensus template alignment. , 1986, Journal of molecular biology.

[21] R. Padmanabhan,et al. Sequence analysis in the E1 region of adenovirus type 4 DNA. , 1986, Virology.

[22] M. Waterman,et al. A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons. , 1987, Journal of molecular biology.

[23] K. S. Arun,et al. Least-Squares Fitting of Two 3-D Point Sets , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24] Ian B. Dodd,et al. Systematic method for the detection of potential λ Cro-like DNA-binding regions in proteins , 1987 .

[25] A. D. McLachlan,et al. Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[26] P. V. von Hippel,et al. Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. , 1987, Journal of molecular biology.

[27] L. Patthy,et al. Detecting homology of distantly related proteins with consensus sequences. , 1987, Journal of molecular biology.

[28] Eugene W. Myers,et al. Optimal alignments in linear space , 1988, Comput. Appl. Biosci..

[29] J. F. Collins,et al. The significance of protein sequence similarities , 1988, Comput. Appl. Biosci..

[30] D. Lipman,et al. Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[31] S F Altschul,et al. Weights for data related by a tree. , 1989, Journal of molecular biology.

[32] G. Stormo,et al. Identifying protein-binding sites from unaligned DNA fragments. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[33] J. Buhler,et al. Isolation, characterization, and inactivation of the APA1 gene encoding yeast diadenosine 5',5'''-P1,P4-tetraphosphate phosphorylase , 1989, Journal of bacteriology.

[34] S. Karlin,et al. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[35] P. Argos,et al. Weighting aligned protein or nucleic acid sequences to correct for unequal representation. , 1990, Journal of molecular biology.

[36] E. Myers,et al. Basic local alignment search tool. , 1990, Journal of molecular biology.

[37] C. Sander,et al. Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[38] S. Altschul. Amino acid substitution matrices from an information theoretic perspective , 1991, Journal of Molecular Biology.

[39] A. B. Robinson,et al. Distribution of glutamine and asparagine residues and their near neighbors in peptides and proteins. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[40] S. Henikoff,et al. Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[41] D. Maskell,et al. The gal locus from Haemophilus influenzae: cloning, sequencing and the use of gal mutants to study lipopolysaccharide , 1992, Molecular microbiology.

[42] Richard Mott,et al. Maximum-likelihood estimation of the statistical distribution of Smith-Waterman local sequence similarity scores , 1992 .

[43] Kun-Mao Chao,et al. Aligning two sequences within a specified diagonal band , 1992, Comput. Appl. Biosci..

[44] AC Tose. Cell , 1993, Cell.

[45] S. Karlin,et al. Applications and statistics for multiple high-scoring segments in molecular sequences. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[46] R. Heidenreich,et al. Rat galactose-1-phosphate uridyltransferase coding sequence, transcription start site and genomic organization. , 1993, DNA sequence : the journal of DNA sequencing and mapping.

[47] Jun S. Liu,et al. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[48] David Haussler,et al. Using Dirichlet Mixture Priors to Derive Hidden Markov Models for Protein Families , 1993, ISMB.

[49] D. Trono,et al. Vif is crucial for human immunodeficiency virus type 1 proviral DNA synthesis in infected cells , 1993, Journal of virology.

[50] John C. Wootton,et al. Statistics of Local Complexity in Amino Acid Sequences and Sequence Databases , 1993, Comput. Chem..

[51] Lawrence Hunter,et al. Proceedings of the 1st International Conference on Intelligent Systems for Molecular Biology , 1993 .

[52] Julie Dawn Thompson,et al. Improved sensitivity of profile searches through the use of sequence weights and gap excision , 1994, Comput. Appl. Biosci..

[53] E S Lander,et al. Recognition of related proteins by iterative template refinement (ITR) , 1994, Protein science : a publication of the Protein Society.

[54] Steven E. Bayer,et al. A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1. , 1994, Science.

[55] S. Altschul,et al. Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[56] S. Altschul,et al. Issues in searching molecular sequence databases , 1994, Nature Genetics.

[57] S. Henikoff,et al. Position-based sequence weights. , 1994, Journal of molecular biology.

[58] A. Dembo,et al. Limit Distribution of Maximal Non-Aligned Two-Sequence Segmental Score , 1994 .

[59] C. Chothia,et al. Volume changes in protein evolution. , 1994, Journal of molecular biology.

[60] R. Durbin,et al. 2.2 Mb of contiguous nucleotide sequence from chromosome III of C. elegans , 1994, Nature.

[61] Erik L. L. Sonnhammer,et al. A workbench for large-scale sequence homology analysis , 1994, Comput. Appl. Biosci..

[62] Sean R. Eddy,et al. Maximum Discrimination Hidden Markov Models of Sequence Consensus , 1995, J. Comput. Biol..

[63] Osamu Gotoh,et al. A weighting system and algorithm for aligning many phylogenetically related sequences , 1995, Comput. Appl. Biosci..

[64] Steven E. Brenner,et al. Proceedings Of The Third International Conference On Intelligent Systems For Molecular Biology , 1995 .

[65] A. Amsterdam,et al. Insertional mutagenesis in zebrafish identifies two novel genes, pescadillo and dead eye, essential for embryonic development. , 1996, Genes & development.

[66] David Haussler,et al. Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology , 1996, Comput. Appl. Biosci..

[67] Kevin Karplus,et al. A Flexible Motif Search Technique Based on Generalized Profiles , 1996, Comput. Chem..

[68] C. Croce,et al. The FHIT Gene, Spanning the Chromosome 3p14.2 Fragile Site and Renal Carcinoma–Associated t(3;8) Breakpoint, Is Abnormal in Digestive Tract Cancers , 1996, Cell.

[69] Alfonso Valencia,et al. Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology , 1996 .

[70] Y. Nakamura,et al. Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions (supplement). , 1996, DNA research : an international journal for rapid publication of reports on genes and genomes.

[71] Eugene V. Koonin,et al. …Functional motifs… , 1996, Nature Genetics.

[72] Jorja G. Henikoff,et al. Using substitution probabilities to improve position-specific scoring matrices , 1996, Comput. Appl. Biosci..

[73] Anne M. Bowcock,et al. Identification of a RING protein that can interact in vivo with the BRCA1 gene product , 1996, Nature Genetics.

[74] N. Nomura,et al. Prediction of the coding sequences of unidentified human genes. VI. The coding sequences of 80 new genes (KIAA0201-KIAA0280) deduced by analysis of cDNA clones from cell line KG-1 and brain. , 1996, DNA research : an international journal for rapid publication of reports on genes and genomes.

[75] S F Altschul,et al. Local alignment statistics. , 1996, Methods in enzymology.

[76] J. Mornon,et al. From BRCA1 to RAP1: a widespread BRCT module closely associated with DNA repair , 1997, FEBS letters.

[77] S. Suhai. Theoretical and Computational Methods in Genome Research , 2012, Springer US.

[78] C Sander,et al. New structure--novel fold? , 1997, Structure.

[79] Peer Bork,et al. A superfamily of conserved domains in DNA damage‐ responsive cell cycle checkpoint proteins , 1997, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[80] Gapped BLAST and PSI-BLAST: A new , 1997 .

[81] S. Henikoff,et al. Embedding strategies for effective use of information from multiple sequence alignments , 1997, Protein science : a publication of the Protein Society.

[82] Rolf Apweiler,et al. The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[83] T. Hope,et al. Woodchuck Hepatitis Virus Posttranscriptional Regulatory Element Enhances Expression of Transgenes Delivered by Retroviral Vectors , 1999, Journal of Virology.