MePIC, metagenomic pathogen identification for clinical specimens.

Next-generation DNA sequencing technologies have led to a new method of identifying the causative agents of infectious diseases. The analysis comprises three steps. First, DNA/RNA is extracted and extensively sequenced from a specimen that includes the pathogen, human tissue and commensal microorganisms. Second, the sequenced reads are matched with a database of known sequences, and the organisms from which the individual reads were derived are inferred. Last, the percentages of the organisms' genomic sequences in the specimen (i.e., the metagenome) are estimated, and the pathogen is identified. The first and last steps have become easy due to the development of benchtop sequencers and metagenomic software. To facilitate the middle step, which requires computational resources and skill, we developed a cloud-computing pipeline, MePIC: "Metagenomic Pathogen Identification for Clinical specimens." In the pipeline, unnecessary bases are trimmed off the reads, and human reads are removed. For the remaining reads, similar sequences are searched in the database of known nucleotide sequences. The search is drastically sped up by using a cloud-computing system. The webpage interface can be used easily by clinicians and epidemiologists. We believe that the use of the MePIC pipeline will promote metagenomic pathogen identification and improve the understanding of infectious diseases.

[1]  T. Kawanami,et al.  Epidemic Myalgia in Adults Associated with Human Parechovirus Type 3 Infection, Yamagata, Japan, 2008 , 2012, Emerging infectious diseases.

[2]  Chrystala Constantinidou,et al.  Genome sequencing in clinical microbiology , 2012, Nature Biotechnology.

[3]  C. Huttenhower,et al.  Metagenomic microbial community profiling using unique clade-specific marker genes , 2012, Nature Methods.

[4]  Y. Sugita‐Konishi,et al.  Identification of Kudoa septempunctata as the causative agent of novel food poisoning outbreaks in Japan by consumption of Paralichthys olivaceus in raw fish. , 2012, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[5]  T. Sata,et al.  Detection of a Possible Bioterrorism Agent, Francisella sp., in a Clinical Specimen by Use of Next-Generation Direct DNA Sequencing , 2012, Journal of Clinical Microbiology.

[6]  S. Schuster,et al.  Integrative analysis of environmental sequences using MEGAN4. , 2011, Genome research.

[7]  G. Getz,et al.  PathSeq: software to identify or discover microbes by deep sequencing of human tissue , 2011, Nature Biotechnology.

[8]  Masato Tashiro,et al.  Characterization of Quasispecies of Pandemic 2009 Influenza A Virus (A/H1N1/2009) by De Novo Sequencing Using a Next-Generation DNA Sequencer , 2010, PloS one.

[9]  C. Chiu,et al.  Metagenomics for the discovery of novel human viruses. , 2010, Future microbiology.

[10]  Thomas L. Madden,et al.  Database indexing for production MegaBLAST searches , 2008, Bioinform..

[11]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[12]  T. Mizutani,et al.  Pathogenic characterization of a cervical lymph node derived from a patient with Kawasaki disease. , 2012, International journal of clinical and experimental pathology.

[13]  Claude-Alain H. Roten,et al.  Theoretical and practical advances in genome halving , 2004 .