Statistical Significance of Optical Map Alignments

The Optical Mapping System constructs ordered restriction maps spanning entire genomes through the assembly and analysis of large datasets comprising individually analyzed genomic DNA molecules. Such restriction maps uniquely reveal mammalian genome structure and variation, but also raise computational and statistical questions beyond those that have been solved in the analysis of smaller, microbial genomes. We address the problem of how to filter maps that align poorly to a reference genome. We obtain map-specific thresholds that control errors and improve iterative assembly. We also show how an optimal self-alignment score provides an accurate approximation to the probability of alignment, which is useful in applications seeking to identify structural genomic abnormalities.

[1]  Yi Yang,et al.  Alignment of Optical Maps , 2005, RECOMB.

[2]  Miron Livny,et al.  Validation of rice genome sequence by optical mapping , 2007, BMC Genomics.

[3]  Joshua M. Korn,et al.  Mapping and sequencing of structural variation from eight human genomes , 2008, Nature.

[4]  Dawn H. Nagel,et al.  The B73 Maize Genome: Complexity, Diversity, and Dynamics , 2009, Science.

[5]  E. Birney,et al.  Challenges and standards in integrating surveys of structural variation , 2007, Nature Genetics.

[6]  S. Altschul Amino acid substitution matrices from an information theoretic perspective , 1991, Journal of Molecular Biology.

[7]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[8]  David C. Schwartz,et al.  A Single Molecule Scaffold for the Maize Genome , 2009, PLoS genetics.

[9]  Kenny Q. Ye,et al.  Large-Scale Copy Number Polymorphism in the Human Genome , 2004, Science.

[10]  R. Doolittle Similar amino acid sequences: chance or common ancestry? , 1981, Science.

[11]  D. Schwartz,et al.  Ordered restriction maps of Saccharomyces cerevisiae chromosomes constructed by optical mapping. , 1993, Science.

[12]  David C. Schwartz,et al.  High-resolution human genome structure by single-molecule analysis , 2010, Proceedings of the National Academy of Sciences.

[13]  David C. Schwartz,et al.  A large, complex structural polymorphism at 16p12.1 underlies microdeletion disease risk , 2010, Nature Genetics.

[14]  Mark Borodovsky,et al.  Statistical significance in biological sequence analysis , 2006, Briefings Bioinform..

[15]  Juan J de Pablo,et al.  A microfluidic system for large DNA molecule arrays. , 2004, Analytical chemistry.

[16]  J. Mullikin,et al.  The phusion assembler. , 2003, Genome research.