BackgroundMicrobial forensics is important in tracking the source of a pathogen, whether the disease is a naturally occurring outbreak or part of a criminal investigation.ResultsA method and SPR Opt (S NP and P CR-R FLP Opt imization) software to perform a comprehensive, whole-genome analysis to forensically discriminate multiple sequences is presented. Tools for the optimization of forensic typing using Single Nucleotide Polymorphism (SNP) and PCR-Restriction Fragment Length Polymorphism (PCR-RFLP) analyses across multiple isolate sequences of a species are described. The PCR-RFLP analysis includes prediction and selection of optimal primers and restriction enzymes to enable maximum isolate discrimination based on sequence information. SPR Opt calculates all SNP or PCR-RFLP variations present in the sequences, groups them into haplotypes according to their co-segregation across those sequences, and performs combinatoric analyses to determine which sets of haplotypes provide maximal discrimination among all the input sequences. Those set combinations requiring that membership in the fewest haplotypes be queried (i.e. the fewest assays be performed) are found. These analyses highlight variable regions based on existing sequence data. These markers may be heterogeneous among unsequenced isolates as well, and thus may be useful for characterizing the relationships among unsequenced as well as sequenced isolates. The predictions are multi-locus. Analyses of mumps and SARS viruses are summarized. Phylogenetic trees created based on SNPs, PCR-RFLPs, and full genomes are compared for SARS virus, illustrating that purported phylogenies based only on SNP or PCR-RFLP variations do not match those based on multiple sequence alignment of the full genomes.ConclusionThis is the first software to optimize the selection of forensic markers to maximize information gained from the fewest assays, accepting whole or partial genome sequence data as input. As more sequence data becomes available for multiple strains and isolates of a species, automated, computational approaches such as those described here will be essential to make sense of large amounts of information, and to guide and optimize efforts in the laboratory. The software and source code for SPR Opt is publicly available and free for non-profit use at http://www.llnl.gov/IPandC/technology/software/softwaretitles/spropt.php.
[1]
P. Kwok,et al.
Methods for genotyping single nucleotide polymorphisms.
,
2003,
Annual review of genomics and human genetics.
[2]
Enno Ohlebusch,et al.
Efficient multiple genome alignment
,
2002,
ISMB.
[3]
R. Zhang,et al.
Single nucleotide polymorphism discrimination assisted by improved base stacking hybridization using oligonucleotide microarrays.
,
2003,
BioTechniques.
[4]
D. Relman,et al.
Microbial Forensics--"Cross-Examining Pathogens"
,
2002,
Science.
[5]
Adam Zemla,et al.
Comparative Genomics Tools Applied to Bioterrorism Defence
,
2003,
Briefings Bioinform..
[6]
J. Rentsch,et al.
PCR-RFLP analysis of mitochondrial DNA: a reliable method for species identification.
,
1999,
Journal of agricultural and food chemistry.
[7]
J. Lüthy,et al.
Polymerase chain reaction-restriction fragment length polymorphism analysis: a simple method for species identification in food.
,
1995,
Journal of AOAC International.
[8]
Bruce Budowle,et al.
Building Microbial Forensics as a Response to Bioterrorism
,
2003,
Science.