Characterization of probiotic Escherichia coli isolates with a novel pan-genome microarray

BackgroundMicroarrays have recently emerged as a novel procedure to evaluate the genetic content of bacterial species. So far, microarrays have mostly covered single or few strains from the same species. However, with cheaper high-throughput sequencing techniques emerging, multiple strains of the same species are rapidly becoming available, allowing for the definition and characterization of a whole species as a population of genomes - the 'pan-genome'.ResultsUsing 32 Escherichia coli and Shigella genome sequences we estimate the pan- and core genome of the species. We designed a high-density microarray in order to provide a tool for characterization of the E. coli pan-genome. Technical performance of this pan-genome microarray based on control strain samples (E. coli K-12 and O157:H7) demonstrated a high sensitivity and relatively low false positive rate. A single-channel analysis approach is robust while allowing the possibility for deriving presence/absence predictions for any gene included on our pan-genome microarray. Moreover, the array was highly sufficient to investigate the gene content of non-pathogenic isolates, despite the strong bias towards pathogenic E. coli strains that have been sequenced so far.ConclusionThis high-density microarray provides an excellent tool for characterizing the genetic makeup of unknown E. coli strains and can also deliver insights into phylogenetic relationships. Its design poses a considerably larger challenge and involves different considerations than the design of single strain microarrays. Here, lessons learned and future directions will be discussed in order to optimize design of microarrays targeting entire pan-genomes.

[1]  Henrik Bjørn Nielsen,et al.  OligoWiz 2.0—integrating sequence feature annotation into the design of microarray probes , 2005, Nucleic Acids Res..

[2]  S. Maguire,et al.  A simple method for the preparation of plasmid and chromosomal E. coli DNA. , 1989, Nucleic acids research.

[3]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[4]  B. Wren,et al.  Comparative phylogenomics of pathogenic bacteria by microarray analysis , 2005, Current Opinion in Microbiology.

[5]  D. Hartl,et al.  The population genetics of Escherichia coli. , 1984, Annual review of genetics.

[6]  J. Sambrook,et al.  Molecular Cloning: A Laboratory Manual , 2001 .

[7]  Yudong D. He,et al.  Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer , 2001, Nature Biotechnology.

[8]  Ron Edgar,et al.  Gene Expression Omnibus ( GEO ) : Microarray data storage , submission , retrieval , and analysis , 2008 .

[9]  C. McSweeney,et al.  The diversity of Escherichia coli serotypes and biotypes in cattle faeces , 2005, Journal of applied microbiology.

[10]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[11]  M. Damian,et al.  Incidence of virulence-encoding genes among enteric Escherichia coli strains isolated from healthy subjects. , 2005, Roumanian archives of microbiology and immunology.

[12]  Maria Jesus Martin,et al.  High-quality Protein Knowledge Resource: SWISS-PROT and TrEMBL , 2002, Briefings Bioinform..

[13]  N. W. Davis,et al.  Genome sequence of enterohaemorrhagic Escherichia coli O157:H7 , 2001, Nature.

[14]  Peter F. Hallin,et al.  Genome update: chromosome atlases. , 2004, Microbiology.

[15]  D. Ussery,et al.  Design of a Seven-Genome Escherichia coli Microarray for Comparative Genomic Profiling , 2006, Journal of bacteriology.

[16]  Anders Krogh,et al.  Large-scale prokaryotic gene prediction and comparison to genome annotation , 2005, Bioinform..

[17]  Jon R. Armstrong,et al.  Identification of genes subject to positive selection in uropathogenic strains of Escherichia coli: a comparative genomics approach. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Sylvia Kleta,et al.  Virulence Factor Gene Profiles of Escherichia coli Isolates from Clinically Healthy Pigs , 2006, Applied and Environmental Microbiology.

[19]  H. Tettelin,et al.  The microbial pan-genome. , 2005, Current opinion in genetics & development.

[20]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[21]  Jun Yu,et al.  Revisiting the Molecular Evolutionary History of Shigella spp. , 2006, Journal of Molecular Evolution.

[22]  Ruiting Lan,et al.  Escherichia coli in disguise: molecular origins of Shigella. , 2002, Microbes and infection.

[23]  S Brunak,et al.  A DNA structural atlas for Escherichia coli. , 2000, Journal of molecular biology.

[24]  Jane Fridlyand,et al.  Bioinformatics Original Paper a Comparison Study: Applying Segmentation to Array Cgh Data for Downstream Analyses , 2022 .

[25]  G. Tannock Molecular assessment of intestinal microflora. , 2001, The American journal of clinical nutrition.

[26]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[27]  E. Nielsen,et al.  Asymptomatic bacteriuria Escherichia coli strains: adhesins, growth and competition. , 2006, FEMS microbiology letters.

[28]  M. Venkatesan,et al.  Subtractive hybridization and optical mapping of the enterotoxigenic Escherichia coli H10407 chromosome: isolation of unique sequences and demonstration of significant similarity to the chromosome of E. coli K-12. , 2006, Microbiology.

[29]  Jaideep P. Sundaram,et al.  Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Anders Krogh,et al.  EasyGene – a prokaryotic gene finder that ranks ORFs by statistical significance , 2003, BMC Bioinformatics.