Random Covering of Multiple One-Dimensional Domains with an Application to DNA Sequencing

Classical results for randomly covering a one-dimensional domain are generalized to multiple domains. The density function for the number of gaps is derived in the context of Bell's polynomials. Limiting forms are determined as well. The multiple domain configuration is a good model for DNA sequencing scenarios in which the target is fragmented, e.g., filtered DNA libraries and macronuclear genomes. Large-scale sequencing efforts are now starting to focus on such projects. Fragmentation effects are most prominent for small targets but vanish for very large targets. Here, the current model converges with classical theory. Pyrosequencing has been suggested as a viable, much cheaper alternative for large filtered projects. However, our model indicates that a recently demonstrated microscale Sanger reaction will likely be far more effective.

[1]  R. Durbin,et al.  2.2 Mb of contiguous nucleotide sequence from chromosome III of C. elegans , 1994, Nature.

[3]  P. Green,et al.  Against a whole-genome shotgun. , 1997, Genome research.

[4]  M. Wendl Occupancy Modeling of Coverage Distribution for Whole Genome Shotgun Dna Sequencing , 2006, Bulletin of mathematical biology.

[5]  O. White,et al.  Environmental Genome Shotgun Sequencing of the Sargasso Sea , 2004, Science.

[6]  Robert V. Hogg,et al.  Introduction to Mathematical Statistics. , 1966 .

[7]  Alan G. Konheim,et al.  The Random Division of an Interval and the Random Covering of a Circle , 1962 .

[8]  The problem of random intervals on a line , 1947 .

[9]  Herbert Solomon,et al.  Geometric Probability , 1978, CBMS-NSF regional conference series in applied mathematics.

[10]  C. Domb On Hammersley ’ s Method for One-Dimensional Covering Problems , 2007 .

[11]  Eugene W. Myers,et al.  Whole-genome DNA sequencing , 1999, Comput. Sci. Eng..

[12]  Random coverage of the circle and asymptotic distributions , 1982 .

[13]  Bradley I. Coleman,et al.  An intermediate grade of finished genomic sequence suitable for comparative analyses. , 2004, Genome research.

[14]  S. Anderson,et al.  Shotgun DNA sequencing using cloned DNase I-generated fragments , 1981, Nucleic Acids Res..

[15]  John Moriarty,et al.  Bounds on the distribution of the number of gaps when circles and lines are covered by fragments: Theory and practical application to genomic and metagenomic projects , 2006, BMC Bioinformatics.

[16]  J. Roach Random subcloning. , 1995, Genome research.

[17]  Andre R. O. Cavalcanti,et al.  Sequencing the Oxytricha trifallax macronuclear genome: a pilot project. , 2003, Trends in genetics : TIG.

[18]  Representation of cloned genomic sequences in two sequencing vectors: correlation of DNA sequence and subclone distribution. , 1997, Nucleic acids research.

[19]  Curtis A Suttle,et al.  Metagenomic Analysis of Coastal RNA Virus Communities , 2006, Science.

[20]  P. Deininger Random subcloning of sonicated DNA: application to shotgun DNA sequence analysis. , 1983, Analytical biochemistry.

[21]  A. Siegel Random arcs on the circle , 1978 .

[22]  Michael C. Wendl,et al.  Extension of Lander-Waterman theory for sequencing filtered DNA libraries , 2005, BMC Bioinformatics.

[23]  D. Prescott,et al.  Coding properties of macronuclear DNA molecules in Sterkiella nova (Oxytricha nova). , 2002, Protist.

[24]  The search for a sequencing thoroughbred , 2005, Nature Biotechnology.

[25]  A. Halpern,et al.  The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific , 2007, PLoS biology.

[26]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[27]  Richard A Mathies,et al.  Microfabricated bioprocessor for integrated nanoliter-scale Sanger DNA sequencing. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[29]  A. Siegel Asymptotic Coverage Distributions on the Circle , 1979 .

[30]  Michael C Wendl,et al.  Generalized gap model for bacterial artificial chromosome clone fingerprint mapping and shotgun sequencing. , 2002, Genome research.

[31]  J Quackenbush,et al.  Enrichment of Gene-Coding Sequences in Maize by Genome Filtration , 2003, Science.

[32]  W Miller,et al.  Analysis of the quality and utility of random shotgun sequencing at low redundancies. , 1998, Genome research.

[33]  B. Birren,et al.  Stable propagation of cosmid sized human DNA inserts in an F factor based vector. , 1992, Nucleic acids research.

[34]  Haixu Tang,et al.  Fragment assembly with short reads , 2004, Bioinform..

[35]  E. Mauceli,et al.  The genome sequence of the filamentous fungus Neurospora crassa , 2003, Nature.

[36]  Florent E. Angly,et al.  The Marine Viromes of Four Oceanic Regions , 2006, PLoS biology.

[37]  E. Lander,et al.  Genomic mapping by fingerprinting random clones: a mathematical analysis. , 1988, Genomics.

[38]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[39]  W. Stevens SOLUTION TO A GEOMETRICAL PROBLEM IN PROBABILITY , 1939 .

[40]  F. Sanger,et al.  DNA sequencing with chain-terminating inhibitors. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[41]  W. Dixon,et al.  Introduction to Mathematical Statistics. , 1964 .

[42]  R. Fisher ON THE SIMILARITY OF THE DISTRIBUTIONS FOUND FOR THE TEST OF SIGNIFICANCE IN HARMONIC ANALYSIS, AND IN STEVENS'S PROBLEM IN GEOMETRICAL PROBABILITY , 1940 .

[43]  Natalia Ivanova,et al.  Metagenomic analysis of two enhanced biological phosphorus removal (EBPR) sludge communities , 2006, Nature Biotechnology.