Studies have shown that shotgun metagenomics sequencing facilitates the evaluation of diverse viruses, bacteria, and eukaryotic microbes and assists in exploring their abundances in complex samples. Due to the challenges of processing a substantial amount of sequences and overall computational complexity, it is time-consuming to analyze these data through traditional database sequence comparison approaches. Deep learning has been widely used to solve many classification problems, including those in the bioinformatics field, and has demonstrated its accuracy and efficiency for analyzing large-scale datasets. The purpose of this work is to explore how a long short-term memory (LSTM) network can be used to learn sequential genome patterns through pathogen detection from metagenome data. Our experimental result showed that we can obtain similar accuracy to the conventional BLAST method, but at a speed that is about 36 times faster.
[1]
E. Myers,et al.
Basic local alignment search tool.
,
1990,
Journal of molecular biology.
[2]
Jonas S. Almeida,et al.
Alignment-free sequence comparison: benefits, applications, and tools
,
2017,
Genome Biology.
[3]
O. Gotoh.
An improved algorithm for matching biological sequences.
,
1982,
Journal of molecular biology.