The Effect of Machine Learning Algorithms on Metagenomics Gene Prediction

The development of next generation sequencing facilitates the study of metagenomics. Computational gene prediction aims to find the location of genes in a given DNA sequence. Gene prediction in metagenomics is a challenging task because of the short and fragmented nature of the data. Our previous framework minimum redundancy maximum relevance - support vector machines (mRMR-SVM) produced promising results in metagenomics gene prediction. In this paper, we review available metagenomics gene prediction programs and study the effect of the machine learning approach on gene prediction by altering the underlining machine learning algorithm in our previous framework. Overall, SVM produces the highest accuracy based on tests performed on a simulated dataset.

[1]  T. Takagi,et al.  MetaGene: prokaryotic gene finding from environmental genome shotgun sequences , 2006, Nucleic acids research.

[2]  日経BP社,et al.  Amazon Web Services完全ソリューションガイド , 2016 .

[3]  John R. Rose,et al.  MGC: a metagenomic gene caller , 2013, BMC Bioinformatics.

[4]  Amani Al-Ajlan,et al.  Feature selection for gene prediction in metagenomic fragments , 2018, BioData Mining.

[5]  Katharina J. Hoff,et al.  BMC Bioinformatics BioMed Central Methodology article Gene prediction in metagenomic fragments: A large scale machine , 2008 .

[6]  G. Gloor,et al.  High throughput sequencing methods and analysis for microbiome research. , 2013, Journal of microbiological methods.

[7]  Huzefa Rangwala,et al.  Machine Learning Approaches for Metagenomics , 2014, ECML/PKDD.

[8]  Katharina J. Hoff,et al.  Orphelia: predicting genes in metagenomic sequencing reads , 2009, Nucleic Acids Res..

[9]  Ljupco Kocarev,et al.  Computational Methods for Gene Finding in Prokaryotes , 2010 .

[10]  Yazhu Chen,et al.  A Brief Review of Computational Gene Prediction Methods , 2004, Genomics, proteomics & bioinformatics.

[11]  O. Stegle,et al.  Deep learning for computational biology , 2016, Molecular systems biology.

[12]  J. Handelsman Metagenomics: Application of Genomics to Uncultured Microorganisms , 2004, Microbiology and Molecular Biology Reviews.

[13]  J. Handelsman Metagenomics: Application of Genomics to Uncultured Microorganisms , 2005, Microbiology and Molecular Biology Reviews.

[14]  Mick Watson,et al.  A Review of Bioinformatics Tools for Bio-Prospecting from Metagenomic Sequence Data , 2017, Front. Genet..

[15]  John C. Wooley,et al.  A Primer on Metagenomics , 2010, PLoS Comput. Biol..

[16]  Hayssam Soueidan,et al.  Machine learning for metagenomics: methods and tools , 2015, 1510.06621.

[17]  Georgios A. Pavlopoulos,et al.  Metagenomics: Tools and Insights for Analyzing Next-Generation Sequencing Data Derived from Biodiversity Studies , 2015, Bioinformatics and biology insights.

[18]  David A. Fenstermacher,et al.  Introduction to bioinformatics , 2005, J. Assoc. Inf. Sci. Technol..

[19]  N. Segata,et al.  Shotgun metagenomics, from sampling to analysis , 2017, Nature Biotechnology.

[20]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[21]  Davide Chicco Support Vector Machines in Bioinformatics: a Survey , 2012 .

[22]  Gail Rosen,et al.  Benchmarking of gene prediction programs for metagenomic data , 2010, 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology.

[23]  David R. Kelley,et al.  Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering , 2011, Nucleic acids research.

[24]  Shao-Wu Zhang,et al.  Gene Prediction in Metagenomic Fragments with Deep Learning , 2017, BioMed research international.

[25]  Katharina J Hoff,et al.  The effect of sequencing errors on metagenomic gene prediction , 2009, BMC Genomics.

[26]  Javier Pérez-Rodríguez,et al.  An Evolutionary Algorithm for Gene Structure Prediction , 2011, IEA/AIE.

[27]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[28]  J. Gilbert,et al.  Metagenomics - a guide from sampling to data analysis , 2012, Microbial Informatics and Experimentation.

[29]  Haixu Tang,et al.  FragGeneScan: predicting genes in short and error-prone reads , 2010, Nucleic acids research.

[30]  T. Itoh,et al.  MetaGeneAnnotator: Detecting Species-Specific Patterns of Ribosomal Binding Site for Precise Gene Prediction in Anonymous Prokaryotic and Phage Genomes , 2008, DNA research : an international journal for rapid publication of reports on genes and genomes.

[31]  Miriam L. Land,et al.  Trace: Tennessee Research and Creative Exchange Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification Recommended Citation Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification , 2022 .

[32]  M. Borodovsky,et al.  Ab initio gene identification in metagenomic sequences , 2010, Nucleic acids research.

[33]  Jocelyn E. Krebs,et al.  Lewin's Genes X , 2009 .

[34]  P. Rouzé,et al.  Current methods of gene prediction, their strengths and weaknesses. , 2002, Nucleic acids research.