Information theoretical probe selection for hybridisation experiments

MOTIVATION The choice of probes is an important feature of hybridisation experiments. In this paper we present an algorithm that optimises probes with respect to a training set of sequences based on Shannon entropy as a quality criterion. The practical motivation for our algorithm is oligonucleotide fingerprinting, a method for the simultaneous identification of sequences (cDNA or genomic DNA) by their hybridisation tags according to a set of short probes such as octamers, although the algorithm is of course not restricted to that application. RESULTS We can show that our method is superior to the selection of probes according to their frequencies, which is a widely used strategy, and to randomly chosen probe sets. The quality of probe sets is assessed by a simulation pipeline that entails the set of probes as a simulation parameter. The performance of probe sets trained on sequences from different organisms shows additionally that probes should be chosen with regard to the organism under analysis. Case studies are presented on how constraints (G+C-content, complexity of the individual probes) influence the selection process. AVAILABILITY A description of the oligonucleotide fingerprinting pipeline is published on our web-page http://www.molgen.mpg.de/ approximately ag_onf/met.htm. An executable of the algorithm and probe lists designed for human and rodents can be downloaded from the ftp-site ftp://ftp.molgen.mpg.de/pub/mpimg/probe_design/.

[1]  Jonathan Arnold,et al.  PCAP: probe choice and analysis package - a set of programs to aid in choosing synthetic oligomers for contig mapping , 1993, Comput. Appl. Biosci..

[2]  C. Müller,et al.  Large-scale clustering of cDNA-fingerprinting data. , 1999, Genome research.

[3]  Wolfgang Sebastian Meier-Ewert Global expression mapping of mammalian genomes , 1994 .

[4]  Graziano Pesole,et al.  CLEANUP: a fast computer program for removing redundancies from nucleotide sequence databases , 1996, Comput. Appl. Biosci..

[5]  R. Drmanac,et al.  Gene-representing cDNA clusters defined by hybridization of 57,419 clones from infant brain libraries with short oligonucleotide probes. , 1996, Genomics.

[6]  G. Lennon,et al.  Hybridization analyses of arrayed cDNA libraries. , 1991, Trends in genetics : TIG.

[7]  G. Pesole,et al.  GeneUp: a program to select short PCR primer pairs that occur in multiple members of sequence lists. , 1998, BioTechniques.

[8]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[9]  Y. Fu,et al.  On the design of genome mapping experiments using short synthetic oligonucleotides. , 1992, Biometrics.

[10]  Peter E. Nielsen,et al.  LNA (Locked Nucleic Acids): Synthesis and High-Affinity Nucleic Acid Recognition. , 1998 .

[11]  H. Lehrach,et al.  Preselection of shotgun clones by oligonucleotide fingerprinting: an efficient and high throughput strategy to reduce redundancy in large-scale sequencing projects. , 1998, Nucleic acids research.

[12]  Peter E. Nielsen,et al.  PNA hybridizes to complementary oligonucleotides obeying the Watson–Crick hydrogen-bonding rules , 1993, Nature.

[13]  Hans Lehrach,et al.  An automated approach to generating expressed sequence catalogues , 1993, Nature.

[14]  Ralf Herwig,et al.  High-density cDNA Grids for hybridization fingerprinting experiments , 1999 .

[15]  H. Lehrach,et al.  Construction and analysis of arrayed cDNA libraries. , 1999, Methods in enzymology.

[16]  A. Poustka,et al.  Molecular approaches to mammalian genetics. , 1986, Cold Spring Harbor symposia on quantitative biology.

[17]  S. Colowick,et al.  Methods in Enzymology , Vol , 1966 .

[18]  R Herwig,et al.  Comparative gene expression profiling by oligonucleotide fingerprinting. , 1998, Nucleic acids research.

[19]  A. Milosavljevic,et al.  Discovering distinct genes represented in 29,570 clones from infant brain cDNA libraries by applying sequencing by hybridization methodology. , 1996, Genome research.

[20]  S. Meier-Ewert,et al.  Toward the gene catalogue of sea urchin development: the construction and analysis of an unfertilized egg cDNA library highly normalized by oligonucleotide fingerprinting. , 1999, Genomics.

[21]  S. Meier-Ewert,et al.  Application of robotic technology to automated sequence fingerprint analysis by oligonucleotide hybridisation. , 1994, Journal of biotechnology.

[22]  Boris Mirkin,et al.  Mathematical Classification and Clustering , 1996 .