A hierarchical multi-label classification method based on neural networks for gene function prediction

abstract Gene function prediction is used to assign biological or biochemical functions to genes, which continues to be a challenging problem in modern biology. Genes may exhibit many functions simultaneously, and these functions are organized into a hierarchy, such as a directed acyclic graph (DAG) for Gene Ontology (GO). Because of these characteristics, gene function prediction can be seen as a typical hierarchical multi-label classification (HMC) task. A novel HMC method based on neural networks is proposed in this article for predicting gene function based on GO. The proposed method belongs to a local approach by transferring the HMC task to a set of subtasks. There are three strategies implemented in this method to improve its performance. First, to tackle the imbalanced data set problem when building the training data set for each class, negative instances selecting policy and SMOTE approach are used to preprocess each imbalanced training data set. Second, a particular multi-layer perceptron (MLP) is designed for each node in GO. Third, a post processing method based on the Bayesian network is used to guarantee that the results are consistent with the hierarchy constraint. The experimental results indicate that the proposed HMC-MLPN method is a promising method for gene function prediction based on a comparison with two other state-of-the-art methods.

[1]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[2]  Robert E. Schapire,et al.  Hierarchical multi-label prediction of gene function , 2006, Bioinform..

[3]  Alex Alves Freitas,et al.  A hierarchical multi-label classification ant colony algorithm for protein function prediction , 2010, Memetic Comput..

[4]  Yadong Wang,et al.  Extending gene ontology with gene association networks , 2016, Bioinform..

[5]  Yangyang Zhao,et al.  Hierarchical Multilabel Classification with Optimal Path Prediction , 2016, Neural Processing Letters.

[6]  Zixiang Wang,et al.  Ontological function annotation of long non‐coding RNAs through hierarchical multi‐label classification , 2018, Bioinform..

[7]  Giorgio Valentini,et al.  A Hierarchical Ensemble Method for DAG-Structured Taxonomies , 2015, MCS.

[8]  Peerapon Vateekul,et al.  Irrelevant attributes and imbalanced classes in multi-label text-categorization domains , 2011, Intell. Data Anal..

[9]  Salabat Khan,et al.  Ant colony optimization based hierarchical multi-label classification algorithm , 2017, Appl. Soft Comput..

[10]  Alex Alves Freitas,et al.  An Extensive Evaluation of Decision Tree–Based Hierarchical Multilabel Classification Methods and Performance Measures , 2015, Comput. Intell..

[11]  Zexuan Zhu,et al.  Orderly Roulette Selection Based Ant Colony Algorithm for Hierarchical Multilabel Protein Function Prediction , 2017 .

[12]  Anne M. P. Canuto,et al.  Applying semi-supervised learning in hierarchical multi-label classification , 2014, Expert Syst. Appl..

[13]  Qingfang Yang,et al.  Quantitative Analysis of Urban Regional Traffic Status , 2017 .

[14]  Ioannis A. Kakadiaris,et al.  Hierarchical Multi-label Classification using Fully Associative Ensemble Learning , 2017, Pattern Recognit..

[15]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Hierarchical multi-label classification using local neural networks , 2014, J. Comput. Syst. Sci..

[16]  Concha Bielza,et al.  Multi-label classification with Bayesian network-based chain classifiers , 2014, Pattern Recognit. Lett..

[17]  Ping Fu,et al.  A Hierarchical Multi-Label Classification Algorithm for Gene Function Prediction , 2017 .

[18]  Jun Meng,et al.  Protein function prediction based on data fusion and functional interrelationship. , 2016, Mathematical biosciences.

[19]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.

[20]  Alex Alves Freitas,et al.  Adapting non-hierarchical multilabel classification methods for hierarchical multilabel classification , 2011, Intell. Data Anal..

[21]  Michelangelo Ceci,et al.  Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction , 2013, BMC Bioinformatics.

[22]  Hongfei Lin,et al.  Gene Function Prediction Based on the Gene Ontology Hierarchical Structure , 2014, PloS one.

[23]  Benhui Chen,et al.  Hierarchical multi‐label classification based on over‐sampling and hierarchy constraint for gene function prediction , 2012 .

[24]  Júlio C. Nievola,et al.  Hierarchical Multi-label Classification Problems: An LCS Approach , 2015, DCAI.

[25]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Reduction strategies for hierarchical multi-label classification in protein function prediction , 2016, BMC Bioinformatics.

[26]  Peerapon Vateekul,et al.  Hierarchical multi-label classification with SVMs: A case study in gene function prediction , 2014, Intell. Data Anal..

[27]  Rodrigo C. Barros,et al.  Hierarchical Multi-Label Classification Networks , 2018, ICML.

[28]  Saso Dzeroski,et al.  Decision trees for hierarchical multi-label classification , 2008, Machine Learning.

[29]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[30]  Giorgio Valentini,et al.  True Path Rule Hierarchical Ensembles for Genome-Wide Gene Function Prediction , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[31]  Daniel W. A. Buchan,et al.  A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.

[32]  Saso Dzeroski,et al.  Evaluation of Different Data-Derived Label Hierarchies in Multi-label Classification , 2014, NFMCP.

[33]  Luis Enrique Sucar,et al.  Hierarchical multilabel classification based on path evaluation , 2016, Int. J. Approx. Reason..

[34]  Alex Alves Freitas,et al.  An Extended Local Hierarchical Classifier for Prediction of Protein and Gene Functions , 2013, DaWaK.

[35]  Giorgio Valentini,et al.  Hierarchical Ensemble Methods for Protein Function Prediction , 2014, ISRN bioinformatics.

[36]  James T. Kwok,et al.  Bayes-Optimal Hierarchical Multilabel Classification , 2015, IEEE Transactions on Knowledge and Data Engineering.

[37]  Arash Ahmadi,et al.  Realistic Hodgkin–Huxley Axons Using Stochastic Behavior of Memristors , 2017, Neural Processing Letters.