Applying Neural Network to Reconstruction of Phylogenetic Tree

Reconstruction of phylogenetic tree from biological sequences is a fundamental step in molecular biology, but it is computationally exhausting. Our goal is to use neural network to learn the heuristic strategy of phylogenetic tree reconstruction algorithm. We propose an attention model to learn heuristic strategies for constructing circular ordering related to phylogenetic trees. We use alignment-free K-mer frequency vector representation to represent biological sequences and use unlabeled sequence data sets to train attention model through reinforcement learning. Comparing with traditional methods, our approach is alignment-free and can be easily extended to large-scale data with computational efficiency. With the rapid growth of public biological sequence data, our method provides a potential way to reconstruct phylogenetic tree.

[1]  Alexandros Stamatakis,et al.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies , 2014, Bioinform..

[2]  Gaston H. Gonnet,et al.  Using traveling salesman problem algorithms for evolutionary tree construction , 2000, Bioinform..

[3]  R. Graham,et al.  The steiner problem in phylogeny is NP-complete , 1982 .

[4]  Eoin L. Brodie,et al.  Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB , 2006, Applied and Environmental Microbiology.

[5]  Paramvir S. Dehal,et al.  FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments , 2010, PloS one.

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[10]  Maxim Teslenko,et al.  MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space , 2012, Systematic biology.

[11]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[12]  Le Song,et al.  2 Common Formulation for Greedy Algorithms on Graphs , 2018 .

[13]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[14]  Motta Vincenzo,et al.  Unweighted Pair Group Method with Arithmetic Mean (UPGMA) Cluster analysis (Bray-Curtis distance) on the samples of the gastric mucosa based on the relative abundance of the sequence reads classified at the genus level. , 2017 .

[15]  Jeet Sukumaran,et al.  DendroPy: a Python library for phylogenetic computing , 2010, Bioinform..

[16]  Lena Osterhagen,et al.  Molecular Evolution A Statistical Approach , 2016 .

[17]  B. Rannala,et al.  Molecular phylogenetics: principles and practice , 2012, Nature Reviews Genetics.

[18]  Vladimir Makarenkov,et al.  Circular orders of tree metrics, and their uses for the reconstruction and fitting of phylogenetic trees , 1996, Mathematical Hierarchies and Biology.

[19]  Samy Bengio,et al.  Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.

[20]  Ambuj K. Singh,et al.  Learning Heuristics over Large Graphs via Deep Reinforcement Learning , 2019, ArXiv.

[21]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[22]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[23]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[24]  Max Welling,et al.  Attention, Learn to Solve Routing Problems! , 2018, ICLR.

[25]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[26]  Olivier Gascuel,et al.  The minimum evolution distance-based approach of phylogenetic inference , 2007, Mathematics of Evolution and Phylogeny.