Hierarchical Classification of Transposable Elements with a Weighted Genetic Algorithm

Most of the related works in Machine Learning (ML) are concerned with Flat Classification, in which an instance is often associated with one class within a small set of classes. However, in some cases, instances have to be assigned to many classes simultaneously, and these classes are arranged in a hierarchical structure. This problem, called Hierarchical Classification (HC), has received special attention in some fields, such as Bioinformatics. In this context, a topic that has gained attention is the classification of Transposable Elements (TEs), which are DNA fragments capable of moving inside the genome of their hosts. In this paper, we propose a novel hierarchical method based on Genetic Algorithms (GAs) that generates HC rules and classifies TEs in many hierarchical levels of its taxonomy. The proposed method is called Hierarchical Classification with a Weighted Genetic Algorithm (HC-WGA), and is based on a Weighted Sum approach to deal with the accuracy-interpretability trade-off, which is a common and still relevant problem in both ML and Bioinformatics. To the best of our knowledge, this is the first HC method to use such an approach. Experiments with two popular TEs datasets showed that our method achieves competitive results with most of the state-of-the-art HC methods, with the advantage of presenting an interpretable model.

[1]  J. Jurka,et al.  Repbase Update, a database of eukaryotic repetitive elements , 2005, Cytogenetic and Genome Research.

[2]  D. Finnegan,et al.  Eukaryotic transposable elements and genome evolution. , 1989, Trends in genetics : TIG.

[3]  Stan Matwin,et al.  Learning and Evaluation in the Presence of Class Hierarchies: Application to Text Categorization , 2006, Canadian AI.

[4]  Sylvio Barbon Junior,et al.  Improving Hierarchical Classification of Transposable Elements using Deep Neural Networks , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[5]  Alex Alves Freitas,et al.  Comparing Several Approaches for Hierarchical Classification of Proteins with Decision Trees , 2007, BSB.

[6]  Felipe Kenji Nakano,et al.  Strategies for Selection of Positive and Negative Instances in the Hierarchical Classification of Transposable Elements , 2018, 2018 7th Brazilian Conference on Intelligent Systems (BRACIS).

[7]  Ricardo Cerri,et al.  A Genetic Algorithm for Transposable Elements Hierarchical Classification Rule Induction , 2018, 2018 IEEE Congress on Evolutionary Computation (CEC).

[8]  Douglas A. Wolfe,et al.  Nonparametric Statistical Methods , 1973 .

[9]  Thomas Nussbaumer,et al.  MIPS PlantsDB: a database framework for comparative plant genome research , 2012, Nucleic Acids Res..

[10]  Ricardo Cerri,et al.  Hierarchical and Non-Hierarchical Classification of Transposable Elements with a Genetic Algorithm , 2018, J. Inf. Data Manag..

[11]  Saso Dzeroski,et al.  Decision trees for hierarchical multi-label classification , 2008, Machine Learning.

[12]  Gisele L. Pappa,et al.  Top-down strategies for hierarchical classification of transposable elements with neural networks , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[13]  Alex Alves Freitas,et al.  A critical review of multi-objective optimization in data mining: a position paper , 2004, SKDD.

[14]  Kalyanmoy Deb,et al.  Data mining methods for knowledge discovery in multi-objective optimization: Part A - Survey , 2017, Expert Syst. Appl..

[15]  J. Bennetzen,et al.  A unified classification system for eukaryotic transposable elements , 2007, Nature Reviews Genetics.

[16]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.