Efficient Computational Design of Tiling Arrays Using a Shortest Path Approach

Genomic tiling arrays are a type of DNA microarrays which can investigate the complete genome of arbitrary species for which the sequence is known. The design or selection of suitable oligonucleotide probes for such arrays is however computationally difficult if features such as oligonucleotide quality and repetitive regions are to be considered. We formulate the minimal cost tiling path problem for the selection of oligonucleotides from a set of candidates, which is equivalent to a shortest path problem. An efficient implementation of Dijkstra's shortest path algorithm allows us to compute globally optimal tiling paths from millions of candidate oligonucleotides on a standard desktop computer. The solution to this multi-criterion optimization is spatially adaptive to the problem instance. Our formulation incorporates experimental constraints with respect to specific regions of interest and tradeoffs between hybridization parameters, probe quality and tiling density easily. Solutions for the basic formulation can be obtained more efficiently from Monge theory.

[1]  David Eppstein,et al.  Sequence Comparison with Mixed Convex and Concave Costs , 1990, J. Algorithms.

[2]  Robert E. Tarjan,et al.  Fibonacci heaps and their uses in improved network optimization algorithms , 1984, JACM.

[3]  Yonatan Aumann,et al.  Optimization of probe coverage for high-resolution oligonucleotide aCGH , 2007, Bioinform..

[4]  M. Gerstein,et al.  Design optimization methods for genomic DNA tiling arrays. , 2005, Genome research.

[5]  G. Church,et al.  RNA expression analysis using a 30 base pair resolution Escherichia coli genome array , 2000, Nature Biotechnology.

[6]  A D Tsodikov,et al.  Thermodynamic calculations and statistical correlations for oligo-probes design. , 2003, Nucleic acids research.

[7]  Jizhong Zhou,et al.  Empirical Establishment of Oligonucleotide Probe Design Criteria , 2005, Applied and Environmental Microbiology.

[8]  Alexander Schliep,et al.  Group testing with DNA chips: generating designs and decoding experiments , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[9]  D. Tautz,et al.  Tests of rRNA hybridization to microarrays suggest that hybridization characteristics of oligonucleotide probes for species discrimination cannot be predicted , 2006, Nucleic Acids Research.

[10]  Brendan J. Frey,et al.  Inferring global levels of alternative splicing isoforms using a generative model of microarray data , 2006, Bioinform..

[11]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[12]  Alok Aggarwal,et al.  Finding a minimum weight K-link path in graphs with Monge property and applications , 1993, SCG '93.

[13]  Robert E. Wilber The Concave Least-Weight Subsequence Problem Revisited , 1988, J. Algorithms.

[14]  Jiasen Lu,et al.  Assessment of the sensitivity and specificity of oligonucleotide (50mer) microarrays. , 2000, Nucleic acids research.

[15]  Enno Ohlebusch,et al.  Replacing suffix trees with enhanced suffix arrays , 2004, J. Discrete Algorithms.

[16]  Zvi Galil,et al.  A Linear-Time Algorithm for Concave One-Dimensional Dynamic Programming , 1990, Inf. Process. Lett..

[17]  Rainer E. Burkard,et al.  Perspectives of Monge Properties in Optimization , 1996, Discret. Appl. Math..

[18]  Alok Aggarwal,et al.  Geometric Applications of a Matrix Searching Algorithm , 1986, Symposium on Computational Geometry.

[19]  B. Barrell,et al.  Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence , 1998, Nature.