REMOTE PROTEIN HOMOLOGY DETECTION USING HIDDEN MARKOV MODELS

OF THE DISSERTATION Remote Protein Homology Detection Using Hidden Markov Models

[1]  J. Thompson,et al.  Multiple sequence alignment with Clustal X. , 1998, Trends in biochemical sciences.

[2]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[3]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[4]  Veronica Morea,et al.  Sequence conservation in families whose members have little or no sequence similarity: the four-helical cytokines and cytochromes. , 2002, Journal of molecular biology.

[5]  R Staden Computer methods to locate signals in nucleic acid sequences , 1984, Nucleic Acids Res..

[6]  Temple F. Smith,et al.  The statistical distribution of nucleic acid similarities. , 1985, Nucleic acids research.

[7]  I. Dodd,et al.  Systematic method for the detection of potential lambda Cro-like DNA-binding regions in proteins. , 1987, Journal of molecular biology.

[8]  P. Argos,et al.  Weighting aligned protein or nucleic acid sequences to correct for unequal representation. , 1990, Journal of molecular biology.

[9]  C. Orengo,et al.  Protein families and their evolution-a structural perspective. , 2005, Annual review of biochemistry.

[10]  R F Doolittle Some reflections on the early days of sequence searching. , 1997, Journal of molecular medicine.

[11]  Robert D. Finn,et al.  Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins , 1999, Nucleic Acids Res..

[12]  S. Henikoff,et al.  Automated assembly of protein blocks for database searching. , 1991, Nucleic acids research.

[13]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[14]  S F Altschul,et al.  Weights for data related by a tree. , 1989, Journal of molecular biology.

[15]  A. Dembo,et al.  Limit Distribution of Maximal Non-Aligned Two-Sequence Segmental Score , 1994 .

[16]  W. Pearson,et al.  The limits of protein sequence comparison? , 2005, Current opinion in structural biology.

[17]  C. Ponting,et al.  On the evolution of protein folds: are similar motifs in different protein folds the result of convergence, insertion, or relics of an ancient peptide world? , 2001, Journal of structural biology.

[18]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[19]  Nick V Grishin,et al.  Access the most recent version at doi: 10.1110/ps.03197403 References , 2003 .

[20]  Terence Hwa,et al.  Hybrid alignment: high-performance with universal statistics , 2002, Bioinform..

[21]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[22]  W. Fitch An improved method of testing for evolutionary homology. , 1966, Journal of molecular biology.

[23]  S. Altschul,et al.  Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Patrice Koehl,et al.  ASTRAL compendium enhancements , 2002, Nucleic Acids Res..

[25]  Charlie Hodgman,et al.  A historical perspective on gene/protein functional assignment , 2000, Bioinform..

[26]  Nick V Grishin,et al.  A tale of two ferredoxins: sequence similarity and structural differences , 2006 .

[27]  R. Durbin,et al.  Pfam: A comprehensive database of protein domain families based on seed alignments , 1997, Proteins.

[28]  Bin Ma,et al.  PatternHunter: faster and more sensitive homology search , 2002, Bioinform..

[29]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[30]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[31]  Sydney Anne Cameron,et al.  Molecular Evolution: A Phylogenetic Approach.—Roderic D. M. Page and Edward C. Holmes. , 2002 .

[32]  Sean R. Eddy,et al.  Pfam: multiple sequence alignments and HMM-profiles of protein domains , 1998, Nucleic Acids Res..

[33]  Louxin Zhang,et al.  Good spaced seeds for homology search , 2004, Bioinform..

[34]  M. Madera,et al.  A comparison of profile hidden Markov model procedures for remote homology detection. , 2002, Nucleic acids research.

[35]  S. Henikoff,et al.  Position-based sequence weights. , 1994, Journal of molecular biology.

[36]  D. Haussler,et al.  Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. , 1998, Journal of molecular biology.

[37]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[38]  J. Kendrew,et al.  The amino-acid sequence of sperm whale myoglobin. Comparison between the amino-acid sequences of sperm whale myoglobin and of human hemoglobin. , 1961, Nature.

[39]  Martin Vingron,et al.  A fast and sensitive multiple sequence alignment algorithm , 1989, Comput. Appl. Biosci..

[40]  C. Chothia,et al.  Volume changes in protein evolution. , 1994, Journal of molecular biology.

[41]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[42]  Shmuel Pietrokovski,et al.  The Blocks database--a system for protein classification , 1996, Nucleic Acids Res..

[43]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[44]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[45]  Nick V. Grishin,et al.  Structural drift: a possible path to protein fold change , 2005, Bioinform..

[46]  Russell F. Doolittle,et al.  On the trail of protein sequences , 2000, Bioinform..

[47]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[48]  Nebojsa Jojic,et al.  Efficient approximations for learning phylogenetic HMM models from data , 2004, ISMB/ECCB.

[49]  Chris Sander,et al.  Removing near-neighbour redundancy from large protein sequence collections , 1998, Bioinform..

[50]  C. Sander,et al.  The FSSP database of structurally aligned protein fold families. , 1994, Nucleic acids research.

[51]  G. Stormo Consensus patterns in DNA. , 1990, Methods in enzymology.

[52]  William Noble Grundy,et al.  Family-based homology detection via pairwise sequence comparison , 1998, RECOMB '98.

[53]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[54]  S. Altschul Amino acid substitution matrices from an information theoretic perspective , 1991, Journal of Molecular Biology.

[55]  Elena Rivas,et al.  Evolutionary models for insertions and deletions in a probabilistic modeling framework , 2005, BMC Bioinformatics.

[56]  G. Mitchison A Probabilistic Treatment of Phylogeny and Sequence Alignment , 1999, Journal of Molecular Evolution.

[57]  Jeremy Buhler,et al.  Designing multiple simultaneous seeds for DNA similarity search , 2004, J. Comput. Biol..

[58]  S F Altschul,et al.  Local alignment statistics. , 1996, Methods in enzymology.

[59]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[60]  Amir Dembo,et al.  Strong limit theorems of empirical functionals for large exceedances of partial sums of i , 1991 .

[61]  C. Chothia,et al.  Intermediate sequences increase the detection of homology between sequences. , 1997, Journal of molecular biology.

[62]  T. D. Schneider,et al.  Use of the 'Perceptron' algorithm to distinguish translational initiation sites in E. coli. , 1982, Nucleic acids research.

[63]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[64]  Rod A Wing,et al.  Sequence, annotation, and analysis of synteny between rice chromosome 3 and diverged grass species. , 2005, Genome research.

[65]  R F Doolittle,et al.  Simian sarcoma virus onc gene, v-sis, is derived from the gene (or genes) encoding a platelet-derived growth factor. , 1983, Science.

[66]  Jeremy Buhler,et al.  Choosing the best heuristic for seeded alignment of DNA sequences , 2006, BMC Bioinformatics.

[67]  Bertil Schmidt,et al.  Hyper customized processors for bio-sequence database scanning on FPGAs , 2005, FPGA '05.

[68]  P. Schultz,et al.  Comparative analysis of human genome assemblies reveals genome-level differences. , 2002, Genomics.

[69]  Tim J. P. Hubbard,et al.  SCOP: a Structural Classification of Proteins database , 1999, Nucleic Acids Res..

[70]  Torbjørn Rognes,et al.  Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors , 2000, Bioinform..

[71]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[72]  Pat Hanrahan,et al.  ClawHMMER: A Streaming HMMer-Search Implementation , 2005, SC.

[73]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[74]  David Haussler,et al.  Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology , 1996, Comput. Appl. Biosci..

[75]  Patrick Crowley,et al.  Exploiting coarse-grained parallelism to accelerate protein motif finding with a network processor , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[76]  Kimmen Sjölander,et al.  COACH : profile-profile alignment of protein families using hidden Markov models , 2003 .

[77]  N. Grishin,et al.  COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. , 2003, Journal of molecular biology.

[78]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[79]  David Haussler,et al.  Combining Phylogenetic and Hidden Markov Models in Biosequence Analysis , 2004, J. Comput. Biol..

[80]  K Karplus,et al.  Predicting protein structure using only sequence information , 1999, Proteins.

[81]  Richard Hughey,et al.  Calibrating E-values for hidden Markov models using reverse-sequence null models , 2005, Bioinform..

[82]  Y. Matsuo,et al.  Exploration of novel motifs derived from mouse cDNA sequences. , 2002, Genome research.

[83]  T. D. Schneider,et al.  Information content of binding sites on nucleotide sequences. , 1986, Journal of molecular biology.

[84]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[85]  Patrice Koehl,et al.  The ASTRAL compendium for protein structure and sequence analysis , 2000, Nucleic Acids Res..

[86]  P. Arruda,et al.  Collection for Tropical Crop Sugarcane Analysis and Functional Annotation of an Expressed Sequence Tag , 2006 .

[87]  김동규,et al.  [서평]「Algorithms on Strings, Trees, and Sequences」 , 2000 .

[88]  N. Grishin,et al.  KH domain: one motif, two folds. , 2001, Nucleic acids research.

[89]  D. Lipman,et al.  Rapid similarity searches of nucleic acid and protein data banks. , 1983, Proceedings of the National Academy of Sciences of the United States of America.