ELM: enhanced lowest common ancestor based method for detecting a pathogenic virus from a large sequence dataset

BackgroundEmerging viral diseases, most of which are caused by the transmission of viruses from animals to humans, pose a threat to public health. Discovering pathogenic viruses through surveillance is the key to preparedness for this potential threat. Next generation sequencing (NGS) helps us to identify viruses without the design of a specific PCR primer. The major task in NGS data analysis is taxonomic identification for vast numbers of sequences. However, taxonomic identification via a BLAST search against all the known sequences is a computational bottleneck.DescriptionHere we propose an enhanced lowest-common-ancestor based method (ELM) to effectively identify viruses from massive sequence data. To reduce the computational cost, ELM uses a customized database composed only of viral sequences for the BLAST search. At the same time, ELM adopts a novel criterion to suppress the rise in false positive assignments caused by the small database. As a result, identification by ELM is more than 1,000 times faster than the conventional methods without loss of accuracy.ConclusionsWe anticipate that ELM will contribute to direct diagnosis of viral infections. The web server and the customized viral database are freely available at http://bioinformatics.czc.hokudai.ac.jp/ELM/.

[1]  Joseph L. DeRisi,et al.  Identification, Characterization, and In Vitro Culture of Highly Divergent Arenaviruses from Boa Constrictors and Annulated Tree Boas: Candidate Etiological Agents for Snake Inclusion Body Disease , 2012, mBio.

[2]  E. Lavezzo,et al.  Applications of Next-Generation Sequencing Technologies to Diagnostic Virology , 2011, International Journal of Molecular Sciences.

[3]  A. Takada,et al.  Novel Arenavirus, Zambia , 2011, Emerging infectious diseases.

[4]  Weizhong Yang,et al.  Fever with thrombocytopenia associated with a novel bunyavirus in China. , 2011, The New England journal of medicine.

[5]  René L. Warren,et al.  Assembling millions of short DNA sequences using SSAKE , 2006, Bioinform..

[6]  E. Mardis,et al.  An obesity-associated gut microbiome with increased capacity for energy harvest , 2006, Nature.

[7]  Jie Dong,et al.  Human Infection with a Novel Avian-Origin Influenza A (H7N9) Virus. , 2018 .

[8]  T. Pilot‐Matias,et al.  Identification of two flavivirus-like genomes in the GB hepatitis agent. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Jonathan H. Epstein,et al.  Bats Are Natural Reservoirs of SARS-Like Coronaviruses , 2005, Science.

[10]  J. Stoye,et al.  Taxonomic classification of metagenomic shotgun sequences with CARMA3 , 2011, Nucleic acids research.

[11]  R. Johne,et al.  The general composition of the faecal virome of pigs depends on age, but not on feeding with a probiotic bacterium. , 2014, PloS one.

[12]  Monzoorul Haque Mohammed,et al.  SOrt-ITEMS: Sequence orthology based approach for improved taxonomic estimation of metagenomic sequences , 2009, Bioinform..

[13]  Daniel R. O’Leary,et al.  The outbreak of West Nile virus infection in the New York City area in 1999. , 2001, The New England journal of medicine.

[14]  Ron A M Fouchier,et al.  Antigenic and Genetic Characteristics of Swine-Origin 2009 A(H1N1) Influenza Viruses Circulating in Humans , 2009, Science.

[15]  S. Mendoza,et al.  Cross-Species Transmission of a Novel Adenovirus Associated with a Fulminant Pneumonia Outbreak in a New World Monkey Colony , 2011, PLoS pathogens.

[16]  K. Lindblade,et al.  A distinct lineage of influenza A virus from bats , 2012, Proceedings of the National Academy of Sciences.

[17]  Larissa B. Thackray,et al.  Pathogenic Simian Immunodeficiency Virus Infection Is Associated with Expansion of the Enteric Virome , 2012, Cell.

[18]  Steven J M Jones,et al.  Ebola virus ecology: a continuing mystery. , 2004, Trends in microbiology.

[19]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[20]  BMC Bioinformatics , 2005 .

[21]  E. Cesarman,et al.  Identification of herpesvirus-like DNA sequences in AIDS-associated Kaposi's sarcoma. , 1994, Science.

[22]  A divergent clade of circular single-stranded DNA viruses from pig feces , 2013, Archives of Virology.

[23]  Alexander F. Auch,et al.  MEGAN analysis of metagenomic data. , 2007, Genome research.