Efficient oligonucleotide probe selection for pan-genomic tiling arrays

BackgroundArray comparative genomic hybridization is a fast and cost-effective method for detecting, genotyping, and comparing the genomic sequence of unknown bacterial isolates. This method, as with all microarray applications, requires adequate coverage of probes targeting the regions of interest. An unbiased tiling of probes across the entire length of the genome is the most flexible design approach. However, such a whole-genome tiling requires that the genome sequence is known in advance. For the accurate analysis of uncharacterized bacteria, an array must query a fully representative set of sequences from the species' pan-genome. Prior microarrays have included only a single strain per array or the conserved sequences of gene families. These arrays omit potentially important genes and sequence variants from the pan-genome.ResultsThis paper presents a new probe selection algorithm (PanArray) that can tile multiple whole genomes using a minimal number of probes. Unlike arrays built on clustered gene families, PanArray uses an unbiased, probe-centric approach that does not rely on annotations, gene clustering, or multi-alignments. Instead, probes are evenly tiled across all sequences of the pan-genome at a consistent level of coverage. To minimize the required number of probes, probes conserved across multiple strains in the pan-genome are selected first, and additional probes are used only where necessary to span polymorphic regions of the genome. The viability of the algorithm is demonstrated by array designs for seven different bacterial pan-genomes and, in particular, the design of a 385,000 probe array that fully tiles the genomes of 20 different Listeria monocytogenes strains with overlapping probes at greater than twofold coverage.ConclusionPanArray is an oligonucleotide probe selection algorithm for tiling multiple genome sequences using a minimal number of probes. It is capable of fully tiling all genomes of a species on a single microarray chip. These unique pan-genome tiling arrays provide maximum flexibility for the analysis of both known and uncharacterized strains.

[1]  J. Ecker,et al.  Applications of DNA tiling arrays for whole-genome analysis. , 2005, Genomics.

[2]  M. Vergassola,et al.  The Listeria transcriptional landscape from saprophytism to virulence , 2009, Nature.

[3]  C. Nusbaum,et al.  Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. , 1998, Science.

[4]  Gary Benson,et al.  Sequence analysis Oligonucleotide fingerprint identification for microarray-based pathogen diagnostic assays , 2006 .

[5]  David W Ussery,et al.  Characterization of probiotic Escherichia coli isolates with a novel pan-genome microarray , 2007, Genome Biology.

[6]  Paul Flicek,et al.  Optimized design and assessment of whole genome tiling arrays , 2007, ISMB/ECCB.

[7]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[8]  Giorgio Gambosi,et al.  Complexity and Approximation , 1999, Springer Berlin Heidelberg.

[9]  C. Buchrieser,et al.  New Aspects Regarding Evolution and Virulence of Listeria monocytogenes Revealed by Comparative Genomics and DNA Arrays , 2004, Infection and Immunity.

[10]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[11]  D. Call,et al.  Selective Discrimination of Listeria monocytogenes Epidemic Strains by a Mixed-Genome DNA Microarray Compared to Discrimination by Pulsed-Field Gel Electrophoresis, Ribotyping, and Multilocus Sequence Typing , 2004, Journal of Clinical Microbiology.

[12]  Jaideep P. Sundaram,et al.  Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Adam M. Phillippy,et al.  Comprehensive DNA Signature Discovery and Validation , 2007, PLoS Comput. Biol..

[14]  Min Zhang,et al.  Genome Diversification in Phylogenetic Lineages I and II of Listeria monocytogenes: Identification of Segments Unique to Lineage II Populations , 2003, Journal of bacteriology.

[15]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[16]  Giorgio Gambosi,et al.  Complexity and approximation: combinatorial optimization problems and their approximability properties , 1999 .

[17]  Jay Shendure,et al.  Multiplex amplification of large sets of human exons , 2007, Nature Methods.

[18]  Mário Ramirez,et al.  Optimal control and analysis of two-color genomotyping experiments using bacterial multistrain arrays , 2008, BMC Genomics.

[19]  Thomas E. Besser,et al.  Mixed-Genome Microarrays Reveal Multiple Serotype and Lineage-Specific Differences among Strains of Listeria monocytogenes , 2003, Journal of Clinical Microbiology.

[20]  M Wiedmann,et al.  Ribotypes and virulence gene polymorphisms suggest three distinct Listeria monocytogenes lineages with differences in pathogenic potential , 1997, Infection and immunity.

[21]  David S. Johnson,et al.  Approximation algorithms for combinatorial problems , 1973, STOC.

[22]  Ingolf F. Nes,et al.  Improved analysis of bacterial CGH data beyond the log-ratio paradigm , 2009, BMC Bioinformatics.

[23]  Joseph R. Ecker,et al.  Corrigendum to ‘‘Applications of DNA tiling arrays for whole-genome analysis’’ [Genomics 85 (2005) 1–15] , 2005 .

[24]  Yong-Ha Park,et al.  Design of long oligonucleotide probes for functional gene detection in a microbial community , 2005, Bioinform..

[25]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[26]  G. Weinstock,et al.  Direct selection of human genomic loci by microarray hybridization , 2007, Nature Methods.

[27]  Dennis B. Troup,et al.  NCBI GEO: archive for high-throughput functional genomic data , 2008, Nucleic Acids Res..

[28]  D. Volokhov,et al.  Identification of Listeria Species by Microarray-Based Assay , 2002, Journal of Clinical Microbiology.

[29]  W. Kuo,et al.  High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays , 1998, Nature Genetics.

[30]  I. Peiris Listeria monocytogenes, a Food-Borne Pathogen , 1991, Microbiological reviews.

[31]  Dirk Repsilber,et al.  Detection of divergent genes in microbial aCGH experiments , 2006, BMC Bioinformatics.

[32]  David T. Okou,et al.  Microarray-based genomic selection for high-throughput resequencing , 2007, Nature Methods.

[33]  Fangfang Xia,et al.  The National Microbial Pathogen Database Resource (NMPDR): a genomics platform based on subsystem annotation , 2006, Nucleic Acids Res..

[34]  David R. Riley,et al.  Comparative genomics: the bacterial pan-genome. , 2008, Current opinion in microbiology.

[35]  M. Gerstein,et al.  Design optimization methods for genomic DNA tiling arrays. , 2005, Genome research.

[36]  J. A. Comer,et al.  A novel coronavirus associated with severe acute respiratory syndrome. , 2003, The New England journal of medicine.

[37]  J. Derisi,et al.  Microarray-based detection and genotyping of viral pathogens , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[38]  S. Kurtz The Vmatch large scale sequence analysis software , 2003 .

[39]  Jane Fridlyand,et al.  Bioinformatics Original Paper a Comparison Study: Applying Segmentation to Array Cgh Data for Downstream Analyses , 2022 .

[40]  Adam Zemla,et al.  Comparative Genomics Tools Applied to Bioterrorism Defence , 2003, Briefings Bioinform..

[41]  Shengzhong Feng,et al.  A fast and flexible approach to oligonucleotide probe design for genomes and gene families , 2007, Bioinform..

[42]  H. Tettelin,et al.  The microbial pan-genome. , 2005, Current opinion in genetics & development.