An information theoretic method of microarray probe design for genome classification

In recent years, oligo microarrays, or more commonly-known DNA chips, have had a major impact in disease diagnosis, drug discovery, and gene identification. Microarrays contain Nmer DNA fragments, or oligos, in a series of “wells” placed across the chip, where each well contains thousands of the same fragments and acts as a probe that detects the amount of a specific fragment. A recent use for microarrays is for identification of genomes, such as pathogens. In current techniques, probes that detect unique gene regions of particular species are selected to be placed on the microarray, using the assumption that if one gene unique to a pathogen species can be detected, then the pathogen can be classified. This approach is useful, but the technology relies on finding the gene sequences that are divergent enough to be used as a genomic identifier and robust to cross-hybridization. In our work, we present a method to choose the most unique probes between two organisms. We accomplish this by choosing the oligo probes that maximize the level of divergence between the genomes, calculated by three different information-theoretic measures. We show the results for a 12-mer and 25-mer oligo pathogen probe set and that our method chooses probes less likely to cross-hybridize.