Genome Sequencing and Annotation

Many microbial genome sequences have been determined, and more new genome projects are ongoing. Shotgun sequencing of randomly cloned short pieces of genomic DNA can provide a simple way of determining whole genome sequences. This process requires sequencing of many fragments, compilation of the separate sequences into one contiguous sequence, and careful editing of the assembled sequence. The genes present on the microbial genome are then predicted using clues derived from typical gene features, such as codon usage, ribosomal binding sequences, and bacterial initiation codons. Function of genes is predicted by homology searches performed against either public or well-established protein databases. This chapter discusses each of these stages in a genome-sequencing project.

[1]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[2]  C. Ouzounis,et al.  Whole‐genome sequence annotation: ‘Going wrong with confidence’ , 1999, Molecular microbiology.

[3]  James R. Cole,et al.  A new version of the RDP (Ribosomal Database Project) , 1999, Nucleic Acids Res..

[4]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[5]  M. Borodovsky,et al.  Detection of new genes in a bacterial genome using Markov models for three gene classes. , 1995, Nucleic acids research.

[6]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[7]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[8]  W. Pearson Comparison of methods for searching protein sequence databases , 1995, Protein science : a publication of the Protein Society.

[9]  S. Salzberg,et al.  Improved microbial gene identification with GLIMMER. , 1999, Nucleic acids research.

[10]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[11]  J. Thompson,et al.  The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. , 1997, Nucleic acids research.

[12]  J. Thompson,et al.  Using CLUSTAL for multiple sequence alignments. , 1996, Methods in enzymology.

[13]  P. Green,et al.  Consed: a graphical tool for sequence finishing. , 1998, Genome research.

[14]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[15]  Alex Bateman,et al.  The InterPro database, an integrated documentation resource for protein families, domains and functional sites , 2001, Nucleic Acids Res..