Human BAC ends quality assessment and sequence analyses.

End sequences from bacterial artificial chromosomes (BACs) provide highly specific sequence markers in large-scale sequencing projects. To date, we have generated >300,000 end sequences from >186,000 human BAC clones with an average read length of >460 bp for a total of 141 Mb covering approximately 4.7% of the genome. Over 60% of the clones have BAC end sequences (BESs) from both ends representing more than fivefold coverage of the human genome by the paired-end clones. Our quality assessments and sequence analyses indicate that BESs from human BAC libraries developed at The California Institute of Technology (CalTech) and Roswell Park Cancer Institute have similar properties. The analyses have highlighted differences in insert size for different segments of the CalTech library. Problems with the fidelity of tracking of sequence data back to physical clones have been observed in some subsets of the overall BES dataset. The annotation results of BESs for the contents of available genomic sequences, sequence tagged sites, expressed sequence tags, protein encoding regions, and repeats indicate that this resource will be valuable in many areas of genome research.

[1]  M. Adams,et al.  Genome duplications and other features in 12 Mb of DNA sequence from human chromosome 16p and 16q. , 1999, Genomics.

[2]  G. Mahairas,et al.  Sequence-tagged connectors: a sequence approach to mapping and scanning the human genome. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[3]  M. Adams,et al.  High throughput direct end sequencing of BAC clones. , 1999, Nucleic acids research.

[4]  G D Schuler,et al.  Electronic PCR: bridging the gap between genome mapping and genome sequencing. , 1998, Trends in biotechnology.

[5]  M. Adams,et al.  Shotgun Sequencing of the Human Genome , 1998, Science.

[6]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[7]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[8]  M. Adams,et al.  A tool for analyzing and annotating genomic sequences. , 1997, Genomics.

[9]  A. Smit,et al.  The origin of interspersed repeats in the human genome. , 1996, Current opinion in genetics & development.

[10]  P. Deloukas,et al.  A Gene Map of the Human Genome , 1996, Science.

[11]  J. Craig Venter,et al.  A new strategy for genome sequencing , 1996, Nature.

[12]  S F Altschul,et al.  Local alignment statistics. , 1996, Methods in enzymology.

[13]  R. Fleischmann,et al.  Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. , 1995, Nature.

[14]  R. Nowak Mining treasures from 'junk DNA'. , 1994, Science.

[15]  B. Birren,et al.  Cloning and stable maintenance of 300-kilobase-pair fragments of human DNA in Escherichia coli using an F-factor-based vector. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[16]  A. Kerlavage,et al.  Complementary DNA sequencing: expressed sequence tags and human genome project , 1991, Science.

[17]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[18]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.