Physical Mapping by STS Hybridization: Algorithmic Strategies and the Challenge of Software Evaluation

An important tool in the analysis of genomic sequences is the physical map. In this paper we examine the construction of physical maps from hybridization data between sequence tag sites (STS) probes and clones of genomic fragments. An algorithmic theory of the mapping process, a proposed performance evaluation procedure, and several new algorithmic strategies for mapping are given. A unifying theme for these developments is the idea of a "conservative extension." An algorithm, measure of algorithm quality, or description of physical map is a conservative extension if it is a generalization for data with errors of a corresponding concept in the error-free case. In our algorithmic theory we show that the nature of hybridization experiments imposes inherent limitations on the mapping information recorded in the experimental data. We prove that only certain types of mapping information can be reliably calculated by any algorithm. A test generator is then presented along with quantitative measures for determining how much of the possible information is being computed by a given algorithm. Weaknesses and strengths of these measures are discussed. Each of the new algorithms presented in this paper is based on combinatorial optimizations. Despite the fact that all the optimizations are NP-complete, we have developed algorithmic tools for the design of competitive approximation algorithms. We apply our performance evaluation program to our algorithms and obtain solid evidence that the algorithms are capable of retrieving high-level reliable mapping information.

[1]  A Grigoriev,et al.  Algorithms and software tools for ordering clone libraries: application to the mapping of the genome of Schizosaccharomyces pombe. , 1993, Nucleic acids research.

[2]  R. Mott,et al.  An algorithm to detect chimeric clones and random noise in genomic mapping. , 1994, Genomics.

[3]  J. Weissenbach,et al.  A first-generation physical map of the human genome , 1993, Nature.

[4]  M. Fiedler Algebraic connectivity of graphs , 1973 .

[5]  Y Kuchino,et al.  [Gene cloning]. , 1982, Nihon rinsho. Japanese journal of clinical medicine.

[6]  Jonathan Arnold,et al.  ODS: ordering DNA sequences - a physical mapping algorithm based on simulated annealing , 1993, Comput. Appl. Biosci..

[7]  X. Estivill,et al.  Continuum of overlapping clones spanning the entire human chromosome 21q , 1992, Nature.

[8]  E. Lander,et al.  Genomic mapping by fingerprinting random clones: a mathematical analysis. , 1988, Genomics.

[9]  Lee Aaron Newberg,et al.  Physical mapping of chromosomes: A combinatorial problem in molecular biology , 1995, SODA '93.

[10]  M. Olson,et al.  Detection and characterization of chimeric yeast artificial-chromosome clones. , 1991, Genomics.

[11]  D. R. Fulkerson,et al.  Incidence matrices and interval graphs , 1965 .

[12]  D. Torney Mapping using unique sequences. , 1991, Journal of molecular biology.

[13]  Lawrence T. Kou,et al.  Polynomial Complete Consecutive Information Retrieval Problems , 1977, SIAM J. Comput..

[14]  Kellogg S. Booth,et al.  Testing for the Consecutive Ones Property, Interval Graphs, and Graph Planarity Using PQ-Tree Algorithms , 1976, J. Comput. Syst. Sci..

[15]  D. Page,et al.  The human Y chromosome: overlapping DNA clones spanning the euchromatic region. , 1992, Science.

[16]  Bojan Mohar,et al.  Optimal linear labelings and eigenvalues of graphs , 1992, Discret. Appl. Math..

[17]  Bruce Hendrickson,et al.  A spectral algorithm for the seriation problem , 1994 .

[18]  M. Fiedler A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory , 1975 .

[19]  P. Beer-Romero,et al.  The human Y chromosome: a 43-interval map based on naturally occurring deletions. , 1992, Science.

[20]  Bruce Hendrickson,et al.  The Chaco user`s guide. Version 1.0 , 1993 .

[21]  Emanuel Knill,et al.  Lower bounds for identifying subset members with subset queries , 1994, SODA '95.

[22]  A. Cuticchia,et al.  The use of simulated annealing in chromosome reconstruction experiments based on binary scoring. , 1992, Genetics.

[23]  B. Parlett,et al.  The Lanczos algorithm with selective orthogonalization , 1979 .

[24]  Peter Little,et al.  Mapping the way ahead , 1992, Nature.

[25]  David S. Greenberg,et al.  The Chimeric Mapping Problem: Algorithmic Strategies and Performance Evaluation on Synthetic Genomic Data , 1994, Comput. Chem..