Selection of optimal DNA oligos for gene expression arrays

MOTIVATION High density DNA oligo microarrays are widely used in biomedical research. Selection of optimal DNA oligos that are deposited on the microarrays is critical. Based on sequence information and hybridization free energy, we developed a new algorithm to select optimal short (20-25 bases) or long (50 or 70 bases) oligos from genes or open reading frames (ORFs) and predict their hybridization behavior. Having optimized probes for each gene is valuable for two reasons. By minimizing background hybridization they provide more accurate determinations of true expression levels. Having optimum probes minimizes the number of probes needed per gene, thereby decreasing the cost of each microarray, raising the number of genes on each chip and increasing its usage. RESULTS In this paper we describe algorithms to optimize the selection of specific probes for each gene in an entire genome. The criteria for truly optimum probes are easily stated but they are not computable at all levels currently. We have developed an heuristic approach that is efficiently computable at all levels and should provide a good approximation to the true optimum set. We have run the program on the complete genomes for several model organisms and deposited the results in a database that is available on-line (http://ural.wustl.edu/~lif/probe.pl). AVAILABILITY The program is available upon request.

[1]  J. SantaLucia,et al.  Improved nearest-neighbor parameters for predicting DNA duplex stability. , 1996, Biochemistry.

[2]  Jiasen Lu,et al.  Assessment of the sensitivity and specificity of oligonucleotide (50mer) microarrays. , 2000, Nucleic acids research.

[3]  Eugene W. Myers A Fast Bit-Vector Algorithm for Approximate String Matching Based on Dynamic Programming , 1998, CPM.

[4]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[5]  J. SantaLucia,et al.  Thermodynamics and NMR of internal G.T mismatches in DNA. , 1997, Biochemistry.

[6]  Eugene W. Myers,et al.  A fast bit-vector algorithm for approximate string matching based on dynamic programming , 1998, JACM.

[7]  Ignacio Tinoco,et al.  Base-base mismatches. Thermodynamics of double helix formation for dCA3XA3G + dCT3YT3G (X, Y = A, C, G, T) , 1985, Nucleic Acids Res..

[8]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[9]  Michael Zuker,et al.  Algorithms and Thermodynamics for RNA Secondary Structure Prediction: A Practical Guide , 1999 .

[10]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[11]  J. SantaLucia,et al.  Nearest-neighbor thermodynamics and NMR of DNA sequences with internal A.A, C.C, G.G, and T.T mismatches. , 1999, Biochemistry.

[12]  J. SantaLucia,et al.  Nearest neighbor thermodynamic parameters for internal G.A mismatches in DNA. , 1998, Biochemistry.

[13]  S. P. Fodor,et al.  Light-generated oligonucleotide arrays for rapid DNA sequence analysis. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Eugene W. Myers,et al.  Xlandscape: the graphical display of word frequencies in sequences , 1998, Bioinform..

[15]  L. Wodicka,et al.  Genome-wide expression monitoring in Saccharomyces cerevisiae , 1997, Nature Biotechnology.

[16]  D. Lockhart,et al.  Expression monitoring by hybridization to high-density oligonucleotide arrays , 1996, Nature Biotechnology.

[17]  R E Rhoads,et al.  Optimization of the annealing temperature for DNA amplification in vitro. , 1990, Nucleic acids research.

[18]  M. Schena Genome analysis with gene expression microarrays. , 1996, BioEssays : news and reviews in molecular, cellular and developmental biology.

[19]  N. Yamamoto,et al.  Microarray fabrication with covalent attachment of DNA using Bubble Jet technology , 2000, Nature Biotechnology.

[20]  J. SantaLucia,et al.  Nearest-neighbor thermodynamics of internal A.C mismatches in DNA: sequence dependence and pH effects. , 1998, Biochemistry.

[21]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[22]  S. P. Fodor,et al.  High density synthetic oligonucleotide arrays , 1999, Nature Genetics.

[23]  D. Gerhold,et al.  DNA chips: promising toys have become powerful tools. , 1999, Trends in biochemical sciences.

[24]  J. SantaLucia,et al.  A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[25]  J. SantaLucia,et al.  Thermodynamics of internal C.T mismatches in DNA. , 1998, Nucleic acids research.

[26]  E. Lander Array of hope , 1999, Nature Genetics.

[27]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[28]  V. Florentiev,et al.  Partial thermodynamic parameters for prediction stability and washing behavior of DNA duplexes immobilized on gel matrix. , 1996, Journal of biomolecular structure & dynamics.