Physical map-assisted whole-genome shotgun sequence assemblies.

We describe a targeted approach to improve the contiguity of whole-genome shotgun sequence (WGS) assemblies at run-time, using information from Bacterial Artificial Chromosome (BAC)-based physical maps. Clone sizes and overlaps derived from clone fingerprints are used for the calculation of length constraints between any two BAC neighbors sharing 40% of their size. These constraints are used to promote the linkage and guide the arrangement of sequence contigs within a sequence scaffold at the layout phase of WGS assemblies. This process is facilitated by FASSI, a stand-alone application that calculates BAC end and BAC overlap length constraints from clone fingerprint map contigs created by the FPC package. FASSI is designed to work with the assembly tool PCAP, but its output can be formatted to work with other WGS assembly algorithms able to use length constraints for individual clones. The FASSI method is simple to implement, potentially cost-effective, and has resulted in the increase of scaffold contiguity for both the Drosophila melanogaster and Cryptococcus gattii genomes when compared to a control assembly without map-derived constraints. A 6.5-fold coverage draft DNA sequence of the Pan troglodytes (chimpanzee) genome was assembled using map-derived constraints and resulted in a 26.1% increase in scaffold contiguity.

[1]  Steven J. M. Jones,et al.  Assembly of fingerprint contigs: parallelized FPC , 2002, Bioinform..

[2]  R. Wilson,et al.  High throughput fingerprint analysis of large-insert clones. , 1997, Genome research.

[3]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[4]  Elaine R. Mardis,et al.  Application of a superword array in genome assembly , 2006, Nucleic acids research.

[5]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.

[6]  Lisa M. D'Souza,et al.  Genome sequence of the Brown Norway rat yields insights into mammalian evolution , 2004, Nature.

[7]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[8]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[9]  D. Haussler,et al.  Assembly of the working draft of the human genome with GigAssembler. , 2001, Genome research.

[10]  Carol Soderlund,et al.  FPC: a system for building contigs from restriction fingerprinted clones , 1997, Comput. Appl. Biosci..

[11]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[12]  Parvaneh Saeedi,et al.  Software for automated analysis of DNA fingerprinting gels. , 2003, Genome research.

[13]  Owen White,et al.  TIGR Assembler: A New Tool for Assembling Large Shotgun Sequencing Projects , 1995 .

[14]  T. Sorrell Cryptococcus neoformans variety gattii. , 2001, Medical mycology.

[15]  G. Weinstock,et al.  The Atlas genome assembly system. , 2004, Genome research.

[16]  Steven J. M. Jones,et al.  Management and visualization of whole genome shotgun assemblies using SAM. , 2005, BioTechniques.

[17]  J. Mullikin,et al.  The phusion assembler. , 2003, Genome research.

[18]  B. Berger,et al.  ARACHNE: a whole-genome shotgun assembler. , 2002, Genome research.

[19]  International Human Genome Sequencing Consortium Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution , 2004 .

[20]  Martin Farach-Colton,et al.  Barnacle: An Assembly Algorithm for Clone-based Sequences of Whole Genomes , 2003, Gene.

[21]  J. Berg Genome sequence of the nematode C. elegans: a platform for investigating biology. , 1998, Science.

[22]  Parvaneh Saeedi,et al.  A physical map of the mouse genome , 2002, Nature.

[23]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[24]  Andrew Smith Genome sequence of the nematode C-elegans: A platform for investigating biology , 1998 .

[25]  A. Coulson,et al.  Toward a physical map of the genome of the nematode Caenorhabditis elegans. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[26]  X. Huang,et al.  CAP3: A DNA sequence assembly program. , 1999, Genome research.

[27]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[28]  L. Hillier,et al.  PCAP: a whole-genome assembly program. , 2003, Genome research.

[29]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[30]  Elaine R. Mardis,et al.  A physical map of the chicken genome , 2004, Nature.

[31]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[32]  Rodger Staden,et al.  Software for genome mapping by fingerprinting techniques , 1988, Comput. Appl. Biosci..

[33]  Colin N. Dewey,et al.  Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution , 2004, Nature.

[34]  Jean L. Chang,et al.  Initial sequence of the chimpanzee genome and comparison with the human genome , 2005, Nature.

[35]  Jean L. Chang,et al.  An initial strategy for the systematic identification of functional elements in the human genome by low-redundancy comparative sequencing. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Jian Wang,et al.  The Genome Sequence of the Malaria Mosquito Anopheles gambiae , 2002, Science.

[37]  E. Mauceli,et al.  Whole-genome sequence assembly for mammalian genomes: Arachne 2. , 2003, Genome research.