Contig selection in physical mapping

In physical mapping one orders a set of genetic landmarks or a library of cloned fragments of DNA according to their position in the genome. This is a preparatory step for efficient sequencing. Our approach to physical mapping divides the problem into smaller and easier subproblems by partitioning the probe set into independent parts (contigs). The focus is on the selection of probe sets which can be grouped together into contigs. We introduce a new distance function between probes, the averaged rank distance (ARD). The ARD measures the reliability of certain probe configurations in physical maps which are generated by bootstrap resampling of the raw data. This mimics an independent experiment repetition in silico. The ARD measures the distances of probes within a contig and smoothes the distances of probes in different contigs. It shows distinct jumps at contig borders. This makes it appropriate for contig selection by clustering. We designed a physical mapping algorithm that makes use of these observations and seems to be particularly well suited to the delineation of reliable contigs. We evaluated our method on data sets from two physical mapping projects. In comparison to a physical map of Pasteurella haemolytica that was computed using simulated annealing, the newly computed map is considerably cleaner. On data from Xylella fastidiosa the contigs produced by the new method could be compared to a map produced by a group of experts and the two maps largely agree in the definition of the contigs. The results of our method have already proven helpful for the design of experiments aiming at further improving the quality of a map.

[1]  Yuhong Wang,et al.  ODS_BOOTSTRAP: assessing the statistical reliability of physical maps by bootstrap resampling , 1994, Comput. Appl. Biosci..

[2]  David S. Greenberg,et al.  Physical Mapping by STS Hybridization: Algorithmic Strategies and the Challenge of Software Evaluation , 1995, J. Comput. Biol..

[3]  Lee Aaron Newberg,et al.  Physical mapping of chromosomes: A combinatorial problem in molecular biology , 1995, SODA '93.

[4]  A. Coulson,et al.  The physical map of the Caenorhabditis elegans genome. , 1995, Methods in cell biology.

[5]  O. White,et al.  Whole-genome shotgun optical mapping of Deinococcus radiodurans. , 1999, Science.

[6]  Eric Harley,et al.  Revealing hidden interval graph structure in STS-content data , 1999, Bioinform..

[7]  Guy Mayraz,et al.  Construction of Physical Maps from Oligonucleotide Fingerprints Data , 1999, J. Comput. Biol..

[8]  Hans Lehrach,et al.  High resolution cosmid and P1 maps spanning the 14 Mb genome of the fission yeast S. pombe , 1993, Cell.

[9]  J. M. Bevan,et al.  Rank Correlation Methods , 1949 .

[10]  S Meier-Ewert,et al.  Fine-mapping of shotgun template-libraries; an efficient strategy for the systematic sequencing of genomic DNA. , 1995, Nucleic acids research.

[11]  R. Forthofer,et al.  Rank Correlation Methods , 1981 .

[12]  J. Griffith,et al.  On the consistency of a physical mapping method to reconstruct a chromosome in vitro. , 1996, Genetics.

[13]  Kellogg S. Booth,et al.  Testing for the Consecutive Ones Property, Interval Graphs, and Graph Planarity Using PQ-Tree Algorithms , 1976, J. Comput. Syst. Sci..

[14]  P M Nadkarni,et al.  CONTIG EXPLORER: interactive marker-content map assembly. , 1996, Genomics.

[15]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[16]  William H. Press,et al.  Numerical recipes in C , 2002 .

[17]  John D. Kececioglu,et al.  Computing physical maps of chromosomes with nonoverlapping probes by branch-and-cut , 1999, RECOMB.

[18]  Donna K. Slonim,et al.  Building Human Genome Maps with Radiation Hybrids , 1997, J. Comput. Biol..

[19]  A Grigoriev,et al.  Algorithms and software tools for ordering clone libraries: application to the mapping of the genome of Schizosaccharomyces pombe. , 1993, Nucleic acids research.

[20]  L Kruglyak,et al.  An STS-Based Map of the Human Genome , 1995, Science.

[21]  Kurt Mehlhorn,et al.  LEDA: a platform for combinatorial and geometric computing , 1997, CACM.

[22]  J D Hoheisel,et al.  Hybridization mapping of Trypanosoma cruzi chromosomes III and IV , 1998, Electrophoresis.

[23]  Geoffrey Zweig,et al.  Physical mapping of chromosomes using unique probes , 1994, SODA '94.

[24]  David S. Greenberg,et al.  The Chimeric Mapping Problem: Algorithmic Strategies and Performance Evaluation on Synthetic Genomic Data , 1994, Comput. Chem..

[25]  J Griffith,et al.  A fast random cost algorithm for physical mapping. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Ben Hui Liu,et al.  Statistical Genomics: Linkage, Mapping, and QTL Analysis , 1997 .

[27]  Hans Lehrach,et al.  11 – Integrated Genome Mapping by Hybridization Techniques , 1996 .

[28]  Michael Jünger,et al.  A Branch-and-Cut Approach to Physical Mapping of Chromosomes by Unique End-Probes , 1997, J. Comput. Biol..

[29]  D E Weeks,et al.  Preliminary ranking procedures for multilocus ordering. , 1987, Genomics.

[30]  A. Cuticchia,et al.  The use of simulated annealing in chromosome reconstruction experiments based on binary scoring. , 1992, Genetics.

[31]  E. Green,et al.  Sequence-tagged site (STS) content mapping of human chromosomes: theoretical considerations and early experiences. , 1991, PCR methods and applications.