Flexible and efficient genome tiling design with penalized uniqueness score

BackgroundAs a powerful tool in whole genome analysis, tiling array has been widely used in the answering of many genomic questions. Now it could also serve as a capture device for the library preparation in the popular high throughput sequencing experiments. Thus, a flexible and efficient tiling array design approach is still needed and could assist in various types and scales of transcriptomic experiment.ResultsIn this paper, we address issues and challenges in designing probes suitable for tiling array applications and targeted sequencing. In particular, we define the penalized uniqueness score, which serves as a controlling criterion to eliminate potential cross-hybridization, and a flexible tiling array design pipeline. Unlike BLAST or simple suffix array based methods, computing and using our uniqueness measurement can be more efficient for large scale design and require less memory. The parameters provided could assist in various types of genomic tiling task. In addition, using both commercial array data and experiment data we show, unlike previously claimed, that palindromic sequence exhibiting relatively lower uniqueness.ConclusionsOur proposed penalized uniqueness score could serve as a better indicator for cross hybridization with higher sensitivity and specificity, giving more control of expected array quality. The flexible tiling design algorithm incorporating the penalized uniqueness score was shown to give higher coverage and resolution. The package to calculate the penalized uniqueness score and the described probe selection algorithm are implemented as a Perl program, which is freely available at http://www1.fbn-dummerstorf.de/en/forschung/fbs/fb3/paper/2012-yang-1/OTAD.v1.1.tar.gz.

[1]  S. Salzberg,et al.  Repetitive DNA and next-generation sequencing: computational challenges and solutions , 2011, Nature Reviews Genetics.

[2]  S. P. Fodor,et al.  Large-Scale Transcriptional Activity in Chromosomes 21 and 22 , 2002, Science.

[3]  Jay Shendure,et al.  Multiplex amplification of large sets of human exons , 2007, Nature Methods.

[4]  Gary D. Stormo,et al.  Selection of optimal DNA oligos for gene expression arrays , 2001, Bioinform..

[5]  J. SantaLucia,et al.  A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[6]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[7]  M. Gerstein,et al.  Design optimization methods for genomic DNA tiling arrays. , 2005, Genome research.

[8]  G. Weinstock,et al.  Direct selection of human genomic loci by microarray hybridization , 2007, Nature Methods.

[9]  T. Tuschl,et al.  Identification of Novel Genes Coding for Small Expressed RNAs , 2001, Science.

[10]  J. Shaffer,et al.  Hybridization of synthetic oligodeoxyribonucleotides to ΦX 174 DNA: the effect of single base pair mismatch , 1979 .

[11]  Justin O. Borevitz,et al.  Global Analysis of Genetic, Epigenetic and Transcriptional Polymorphisms in Arabidopsis thaliana Using Whole Genome Tiling Arrays , 2008, PLoS genetics.

[12]  F. Cohen,et al.  Expression profiling of the schizont and trophozoite stages of Plasmodium falciparum with a long-oligonucleotide microarray , 2003, Genome Biology.

[13]  Chen-Zen Lo,et al.  UPS 2.0: unique probe selector for probe design and oligonucleotide microarrays at the pangenomic/ genomic level , 2010, BMC Genomics.

[14]  G. Phillips,et al.  Identification of transcribed sequences in Arabidopsis thaliana by using high-resolution genome tiling arrays. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[15]  M. Gerstein,et al.  Comparison and calibration of transcriptome data from RNA-Seq and tiling arrays , 2010, BMC Genomics.

[16]  M. Eileen Dolan,et al.  Comprehensive Survey of SNPs in the Affymetrix Exon Array Using the 1000 Genomes Dataset , 2010, PloS one.

[17]  Jacek Majewski,et al.  Effect of polymorphisms within probe–target sequences on olignonucleotide microarray experiments , 2008, Nucleic acids research.

[18]  B. Reinhart,et al.  The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans , 2000, Nature.

[19]  J. Lieb,et al.  ChIP-chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments. , 2004, Genomics.

[20]  M. Mitas,et al.  Trinucleotide repeats associated with human disease. , 1997, Nucleic acids research.

[21]  J. Schimenti,et al.  Ruminant globin gene structures suggest an evolutionary role for Alu-type repeats. , 1984, Nucleic acids research.

[22]  S. Ranade,et al.  Stem cell transcriptome profiling via massive-scale mRNA sequencing , 2008, Nature Methods.

[23]  Paul Flicek,et al.  Optimized design and assessment of whole genome tiling arrays , 2007, ISMB/ECCB.

[24]  Patrick S. Schnable,et al.  Picky: oligo microarray design for large genomes , 2004, Bioinform..

[25]  J. SantaLucia,et al.  Thermodynamics and NMR of internal G.T mismatches in DNA. , 1997, Biochemistry.

[26]  M. Israel,et al.  A rapid method for detecting and mapping homology between heterologous DNAs. Evaluation of polyomavirus genomes. , 1979, The Journal of biological chemistry.

[27]  Martin Radolf,et al.  The profile of repeat‐associated histone lysine methylation states in the mouse epigenome , 2005, The EMBO journal.

[28]  Peter Dalgaard,et al.  R Development Core Team (2010): R: A language and environment for statistical computing , 2010 .

[29]  Pascal Borry,et al.  Europe and direct-to-consumer genetic tests , 2011, Nature Reviews Genetics.

[30]  S. Ljungquist,et al.  A DNA-recombinogenic activity in human cells. , 1984, Nucleic acids research.

[31]  Gonzalo Navarro,et al.  Compressed representations of sequences and full-text indexes , 2007, TALG.

[32]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[33]  Margit Burmeister,et al.  SNPs on Chips: The Hidden Genetic Code in Expression Arrays , 2007, Biological Psychiatry.

[34]  S. Cawley,et al.  Unbiased Mapping of Transcription Factor Binding Sites along Human Chromosomes 21 and 22 Points to Widespread Regulation of Noncoding RNAs , 2004, Cell.

[35]  I. Goodhead,et al.  Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution , 2008, Nature.

[36]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[37]  Mark Gerstein,et al.  Issues in the analysis of oligonucleotide tiling microarrays for transcript mapping. , 2005, Trends in genetics : TIG.

[38]  Zhongxue Chen,et al.  Parameter Estimation for the Exponential-Normal Convolution Model for Background Correction of Affymetrix GeneChip Data , 2006, Statistical applications in genetics and molecular biology.

[39]  Alejandro A. Schäffer,et al.  WindowMasker: window-based masker for sequenced genomes , 2006, Bioinform..

[40]  D. Hood,et al.  Microsatellite instability regulates transcription factor binding and gene expression. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[41]  BMC Bioinformatics , 2005 .

[42]  M. Zuker,et al.  OligoArray 2.0: design of oligonucleotide probes for DNA microarrays using a thermodynamic approach. , 2003, Nucleic acids research.

[43]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[44]  Hao Chen,et al.  Oliz, a suite of Perl scripts that assist in the design of microarrays using 50mer oligonucleotides from the 3' untranslated region , 2002, BMC Bioinformatics.

[45]  E V Koonin,et al.  Avoidance of palindromic words in bacterial and archaeal genomes: a close connection with restriction enzymes. , 1997, Nucleic acids research.

[46]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[47]  D. Turner,et al.  Improved free-energy parameters for predictions of RNA duplex stability. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[48]  David T. Okou,et al.  Microarray-based genomic selection for high-throughput resequencing , 2007, Nature Methods.

[49]  Changiz Eslahchi,et al.  A tale of two symmetrical tails: Structural and functional characteristics of palindromes in proteins , 2008, BMC Bioinformatics.

[50]  V. Ambros,et al.  The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14 , 1993, Cell.

[51]  R. Lothe,et al.  Microsatellite instability in human solid tumors. , 1997, Molecular medicine today.

[52]  Eric C. Rouchka,et al.  MPrime: efficient large scale multiple primer and oligonucleotide design for customized gene microarrays , 2005, BMC Bioinformatics.

[53]  Steven Russell,et al.  MAMMOT - a set of tools for the design, management and visualization of genomic tiling arrays , 2006, Bioinform..