Top-down strategies for hierarchical classification of transposable elements with neural networks

Transposable Elements are DNA sequences that can move from one place to another inside the genome of a cell. They are important for genetic variability, and can modify the functionality of genes. The correct classification of these elements is crucial to understand their role in the evolution of species. In this paper, we investigate Transposable Elements classification as a Hierarchical Classification problem using Machine Learning. We present new hierarchical datasets suitable to be used by Machine Learning methods, and also new hierarchical top-down classification strategies using neural networks. We compared our strategies with existing ones in the literature, and evaluated them using measures specific for hierarchical problems. Experiments showed that our proposal achieved better or competitive results than those found by other methods in the literature.

[1]  Yangyang Zhao,et al.  Hierarchical Multilabel Classification with Optimal Path Prediction , 2016, Neural Processing Letters.

[2]  Byunghan Lee,et al.  Deep learning in bioinformatics , 2016, Briefings Bioinform..

[3]  Jerzy Jurka,et al.  Censor - a Program for Identification and Elimination of Repetitive Elements From DNA Sequences , 1996, Comput. Chem..

[4]  Nicolò Cesa-Bianchi,et al.  Hierarchical Cost-Sensitive Algorithms for Genome-Wide Gene Function Prediction , 2009, MLSB.

[5]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[6]  Giorgio Valentini,et al.  True Path Rule Hierarchical Ensembles , 2009, MCS.

[7]  J. Bennetzen,et al.  A unified classification system for eukaryotic transposable elements , 2007, Nature Reviews Genetics.

[8]  Giorgio Valentini,et al.  True Path Rule Hierarchical Ensembles for Genome-Wide Gene Function Prediction , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  Celine Vens,et al.  Annotating transposable elements in the genome using relational decision tree ensembles , 2013, ILP 2013.

[10]  G. Valentini,et al.  Weighted True Path Rule: a multilabel hierarchical algorithm for gene function prediction , 2009 .

[11]  Stan Matwin,et al.  Functional Annotation of Genes Using Hierarchical Text Categorization , 2005 .

[12]  Saso Dzeroski,et al.  Decision trees for hierarchical multi-label classification , 2008, Machine Learning.

[13]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.

[14]  Stan Matwin,et al.  Hierarchical Text Categorization as a Tool of Associating Genes with Gene Ontology Codes , 2004 .

[15]  Robert D. Finn,et al.  Dfam: a database of repetitive DNA based on profile hidden Markov models , 2012, Nucleic Acids Res..

[16]  Thomas Nussbaumer,et al.  MIPS PlantsDB: a database framework for comparative plant genome research , 2012, Nucleic Acids Res..

[17]  György Abrusán,et al.  TEclass - a tool for automated classification of unknown eukaryotic transposable elements , 2009, Bioinform..

[18]  Kyudong Han,et al.  Transposable Elements: No More 'Junk DNA' , 2012, Genomics & informatics.

[19]  D. Wolfe,et al.  Nonparametric Statistical Methods. , 1974 .

[20]  Nicolò Cesa-Bianchi,et al.  Synergy of multi-label hierarchical ensembles, data fusion, and cost-sensitive methods for gene functional inference , 2012, Machine Learning.

[21]  Alex Alves Freitas,et al.  Comparing Several Approaches for Hierarchical Classification of Proteins with Decision Trees , 2007, BSB.

[22]  Júlio C. Nievola,et al.  Multi-Label Hierarchical Classification using a Competitive Neural Network for protein function prediction , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[23]  S. Kurtz,et al.  Fine-grained annotation and classification of de novo predicted LTR retrotransposons , 2009, Nucleic acids research.

[24]  Michelangelo Ceci,et al.  Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction , 2013, BMC Bioinformatics.

[25]  Nuno A. Fonseca,et al.  Boosting the Detection of Transposable Elements Using Machine Learning , 2013, PACBB.

[26]  Casey M. Bergman,et al.  Discovering and detecting transposable elements in genome sequences , 2007, Briefings Bioinform..

[27]  Alex Alves Freitas,et al.  Improving Local Per Level Hierarchical Classification , 2012, J. Inf. Data Manag..

[28]  Nirmal Ranganathan,et al.  Exploring Repetitive DNA Landscapes Using REPCLASS, a Tool That Automates the Classification of Transposable Elements in Eukaryotic Genomes , 2009, Genome biology and evolution.

[29]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Reduction strategies for hierarchical multi-label classification in protein function prediction , 2016, BMC Bioinformatics.

[30]  H. Quesneville,et al.  PASTEC: An Automatic Transposable Element Classification Tool , 2014, PloS one.

[31]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[32]  J. Jurka,et al.  Repbase Update, a database of eukaryotic repetitive elements , 2005, Cytogenetic and Genome Research.

[33]  Páll Melsted,et al.  Efficient counting of k-mers in DNA sequences using a bloom filter , 2011, BMC Bioinformatics.

[34]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.