Chapter 3 - An introduction to biological sequence analysis

This chapter introduces biological sequence analysis. Sequences arise in biological research because of the polymeric nature of the major biological macromolecules, nucleic acids, and proteins. The fundamental processes of molecular biology involves the capturing of the human genetic code in long molecules of DNA, which are packaged in the cell nucleus in the chromosomes. DNA is composed of a long chain of subunits—the nucleotides adenine, guanine, cytosine, and thymine (abbreviated A, G, C, and T respectively). DNA is usually found in the form of two paired chains, held together by hydrogen bonds between complementary pairs of nucleotides. A on one strand always pairs with T on the other, and similarly for G and C. Most of the important structural and functional components of a human cell are composed of proteins, which are long, folded chains composed of the 20 common amino acids, or peptides.

[1]  Jean-Michel Claverie,et al.  Information Enhancement Methods for Large Scale Sequence Analysis , 1993, Comput. Chem..

[2]  A. D. McLachlan,et al.  Codon preference and its use in identifying protein coding regions in long DNA sequences , 1982, Nucleic Acids Res..

[3]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[4]  S. Salzberg,et al.  Microbial gene identification using interpolated Markov models. , 1998, Nucleic acids research.

[5]  A. Gibbs,et al.  The Diagram, a Method for Comparing Sequences , 1970 .

[6]  Eugene W. Myers,et al.  An Interface for a Fragment Assembly Kernel , 1996 .

[7]  Temple F. Smith,et al.  Comparison of biosequences , 1981 .

[8]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Y. Sugiura,et al.  Fluorescence detection of specific sequence of nucleic acids by oxazole yellow-linked oligonucleotides. Homogeneous quantitative monitoring of in vitro transcription. , 1996, Nucleic acids research.

[10]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[11]  W. Pearson Comparison of methods for searching protein sequence databases , 1995, Protein science : a publication of the Protein Society.

[12]  S. Henikoff,et al.  Automated construction and graphical presentation of protein blocks from unaligned sequences. , 1995, Gene.

[13]  Maxine Dealing with Genes: The Language of Heredity , 1992 .

[14]  Ying Xu,et al.  Detection of RNA Polymerase II Promoters and Polyadenylation Sites in Human DNA Sequence , 1996, Comput. Chem..

[15]  G D Schuler,et al.  A workbench for multiple alignment construction and analysis , 1991, Proteins.

[16]  Smith Rf,et al.  Pattern-induced multi-sequence alignment (PIMA) algorithm employing secondary structure-dependent gap penalties for use in comparative protein modelling. , 1992 .

[17]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[18]  O. White,et al.  A quality control algorithm for DNA sequencing projects. , 1993, Nucleic acids research.

[19]  J. Bonfield,et al.  A new DNA sequence assembly program. , 1995, Nucleic acids research.

[20]  Mark Borodovsky,et al.  GENMARK: Parallel Gene Recognition for Both DNA Strands , 1993, Comput. Chem..

[21]  J. Fickett,et al.  Assessment of protein coding measures. , 1992, Nucleic acids research.

[22]  K. O. Elliston,et al.  Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data. , 1996, Genome research.

[23]  William Noble Grundy,et al.  Meta-MEME: motif-based hidden Markov models of protein families , 1997, Comput. Appl. Biosci..

[24]  R. Fleischmann,et al.  The Minimal Gene Complement of Mycoplasma genitalium , 1995, Science.

[25]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[26]  R. Fleischmann,et al.  Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. , 1995, Nature.

[27]  M. Gribskov,et al.  Sequence Analysis Primer , 1991 .

[28]  X. Huang,et al.  An improved sequence assembly program. , 1996, Genomics.

[29]  P A Pevzner,et al.  Genome sequence comparison and scenarios for gene rearrangements: a test case. , 1995, Genomics.

[30]  R Staden,et al.  The staden sequence analysis package , 1996, Molecular biotechnology.

[31]  Owen White,et al.  TIGR Assembler: A New Tool for Assembling Large Shotgun Sequencing Projects , 1995 .

[32]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[33]  Evelyn Camon,et al.  The EMBL Nucleotide Sequence Database , 2000, Nucleic Acids Res..

[34]  S. Altschul,et al.  A tool for multiple sequence alignment. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[35]  A. Kerlavage,et al.  Complementary DNA sequencing: expressed sequence tags and human genome project , 1991, Science.

[36]  Jerzy Jurka,et al.  Censor - a Program for Identification and Elimination of Repetitive Elements From DNA Sequences , 1996, Comput. Chem..

[37]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[38]  Gregory D. Schuler,et al.  ESTablishing a human transcript map , 1995, Nature Genetics.

[39]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[40]  Eugene W. Myers,et al.  Progressive multiple alignment with constraints , 1997, RECOMB '97.

[41]  John C. Wootton,et al.  Statistics of Local Complexity in Amino Acid Sequences and Sequence Databases , 1993, Comput. Chem..

[42]  J. Fickett Recognition of protein coding regions in DNA sequences. , 1982, Nucleic acids research.

[43]  S. Altschul Amino acid substitution matrices from an information theoretic perspective , 1991, Journal of Molecular Biology.

[44]  X. Huang,et al.  On global sequence alignment , 1994, Comput. Appl. Biosci..

[45]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.