Rapid advances in the genomic sequencing of bacteria and viruses over the past few years have made it possible to consider sequencing the genomes of all pathogens that affect humans and the crops and livestock upon which our lives depend. Recent events make it imperative that full genome sequencing be accomplished as soon as possible for pathogens that could be used as weapons of mass destruction or disruption. This sequence information must be exploited to provide rapid and accurate diagnostics to identify pathogens and distinguish them from harmless near-neighbours and hoaxes. The Chem-Bio Non-Proliferation (CBNP) programme of the US Department of Energy (DOE) began a large-scale effort of pathogen detection in early 2000 when it was announced that the DOE would be providing bio-security at the 2002 Winter Olympic Games in Salt Lake City, Utah. Our team at the Lawrence Livermore National Lab (LLNL) was given the task of developing reliable and validated assays for a number of the most likely bioterrorist agents. The short timeline led us to devise a novel system that utilised whole-genome comparison methods to rapidly focus on parts of the pathogen genomes that had a high probability of being unique. Assays developed with this approach have been validated by the Centers for Disease Control (CDC). They were used at the 2002 Winter Olympics, have entered the public health system, and have been in continual use for non-publicised aspects of homeland defence since autumn 2001. Assays have been developed for all major threat list agents for which adequate genomic sequence is available, as well as for other pathogens requested by various government agencies. Collaborations with comparative genomics algorithm developers have enabled our LLNL team to make major advances in pathogen detection, since many of the existing tools simply did not scale well enough to be of practical use for this application. It is hoped that a discussion of a real-life practical application of comparative genomics algorithms may help spur algorithm developers to tackle some of the many remaining problems that need to be addressed. Solutions to these problems will advance a wide range of biological disciplines, only one of which is pathogen detection. For example, exploration in evolution and phylogenetics, annotating gene coding regions, predicting and understanding gene function and regulation, and untangling gene networks all rely on tools for aligning multiple sequences, detecting gene rearrangements and duplications, and visualising genomic data. Two key problems currently needing improved solutions are: (1) aligning incomplete, fragmentary sequence (eg draft genome contigs or arbitrary genome regions) with both complete genomes and other fragmentary sequences; and (2) ordering, aligning and visualising non-colinear gene rearrangements and inversions in addition to the colinear alignments handled by current tools.
[1]
Thomas A. Kuczmarski,et al.
Limitations of TaqMan PCR for Detecting Divergent Viral Pathogens Illustrated by Hepatitis A, B, C, and E Viruses and Human Immunodeficiency Virus
,
2003,
Journal of Clinical Microbiology.
[2]
A. Dress,et al.
Multiple DNA and protein sequence alignment based on segment-to-segment comparison.
,
1996,
Proceedings of the National Academy of Sciences of the United States of America.
[3]
S. Salzberg,et al.
Fast algorithms for large-scale genome alignment and comparison.
,
2002,
Nucleic acids research.
[4]
D. Lipman,et al.
National Center for Biotechnology Information
,
2019,
Springer Reference Medizin.
[5]
M. Madigan,et al.
Brock Biology of Microorganisms
,
1996
.
[6]
J. Patrick Fitch,et al.
Rapid development of nucleic acid diagnostics
,
2002,
Proc. IEEE.
[7]
B. Rost,et al.
Conservation and prediction of solvent accessibility in protein families
,
1994,
Proteins.
[8]
S. Salzberg,et al.
Alignment of whole genomes.
,
1999,
Nucleic acids research.
[9]
Enno Ohlebusch,et al.
Efficient multiple genome alignment
,
2002,
ISMB.
[10]
P. Hegde,et al.
The Institute for Genomic Research
,
1998,
Current Biology.
[11]
M. Borodovsky,et al.
Detection of new genes in a bacterial genome using Markov models for three gene classes.
,
1995,
Nucleic acids research.
[12]
W. Pearson.
Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms.
,
1991,
Genomics.
[13]
G. Olsen,et al.
CRITICA: coding region identification tool invoking comparative analysis.
,
1999,
Molecular biology and evolution.
[14]
M S Waterman,et al.
Identification of common molecular subsequences.
,
1981,
Journal of molecular biology.