Extreme learning machine prediction under high class imbalance in bioinformatics

Class imbalance in machine learning is when there are significantly fewer training instances of one class in comparison to another one. In bioinformatics, there is such a problem in the computational prediction of novel microRNA (miRNAs) within a full genome. The well-known precursors miRNA (pre-miRNA) are usually only a few in comparison to the hundreds of thousands of potential candidates, which makes this task a high class imbalance classification problem. It is well-known that high class imbalance usually affects any classical supervised machine learning classifier. Thus the imbalance must be explicitly considered. Extreme Learning Machine (ELM) is a supervised artificial neural network model that has gained interest in the last years because of its high learning rate and performance. In this work, we propose a novel approach to overcome the high class imbalance in pre-miRNAs prediction data in which ELMs are used for predicting good candidates to pre-miRNA, without needing balanced data sets. Real datasets were used for validation of the proposal with several class imbalance levels. The results obtained showed the superiority of the ELM approach against very recent state-of-the-art methods in the same experimental conditions.

[1]  Sumeet Dua,et al.  Data Mining for Bioinformatics , 2012 .

[2]  P. Saratchandran,et al.  Multicategory Classification Using An Extreme Learning Machine for Microarray Gene Expression Cancer Diagnosis , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Ola R. Snøve,et al.  Reliable prediction of Drosha processing sites improves microRNA gene prediction. , 2007, Bioinformatics.

[4]  Robert Hecht-Nielsen,et al.  Theory of the backpropagation neural network , 1989, International 1989 Joint Conference on Neural Networks.

[5]  Gongping Yang,et al.  On the Class Imbalance Problem , 2008, 2008 Fourth International Conference on Natural Computation.

[6]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[7]  Changyin Sun,et al.  ODOC-ELM: Optimal decision outputs compensation-based extreme learning machine for classifying imbalanced data , 2016, Knowl. Based Syst..

[8]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[9]  P. Poirazi,et al.  MatureBayes: A Probabilistic Algorithm for Identifying the Mature miRNA within Novel Precursors , 2010, PloS one.

[10]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[11]  Peter F. Stadler,et al.  Hairpins in a Haystack: recognizing microRNA precursors in comparative genomics data , 2006, ISMB.

[12]  Alexander Schliep,et al.  The discriminant power of RNA features for pre-miRNA recognition , 2013, BMC Bioinformatics.

[13]  Shuigeng Zhou,et al.  MiRenSVM: towards better prediction of microRNA precursors using an ensemble SVM classifier with multi-loop features , 2010, BMC Bioinformatics.

[14]  Guang-Bin Huang,et al.  Extreme learning machine: a new learning scheme of feedforward neural networks , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[15]  R. Ji,et al.  Improved and Promising Identification of Human MicroRNAs by Incorporating a High-Quality Negative Set , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  Marek Sikora,et al.  HuntMi: an efficient and taxon-specific approach in pre-miRNA identification , 2013, BMC Bioinformatics.

[17]  Bin Fan,et al.  MiRFinder: an improved approach and software implementation for genome-wide fast microRNA precursor scans , 2007, BMC Bioinformatics.

[18]  Shang Gao,et al.  Classification of imbalanced bioinformatics data by using boundary movement-based ELM. , 2015, Bio-medical materials and engineering.

[19]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[20]  Rok Blagus,et al.  SMOTE for high-dimensional class-imbalanced data , 2013, BMC Bioinformatics.

[21]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.

[22]  S. R,et al.  Data Mining with Big Data , 2017, 2017 11th International Conference on Intelligent Systems and Control (ISCO).

[23]  Spiridon D. Likothanassis,et al.  YamiPred: A Novel Evolutionary Method for Predicting Pre-miRNAs and Selecting Relevant Features , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[24]  SætromPål,et al.  Reliable prediction of Drosha processing sites improves microRNA gene prediction , 2007 .

[25]  A. Saïb,et al.  A Cellular MicroRNA Mediates Antiviral Defense in Human Cells , 2005, Science.

[26]  Li Li,et al.  Computational approaches for microRNA studies: a review , 2010, Mammalian Genome.

[27]  Ana Kozomara,et al.  miRBase: integrating microRNA annotation and deep-sequencing data , 2010, Nucleic Acids Res..

[28]  F. Slack,et al.  Oncomirs — microRNAs with a role in cancer , 2006, Nature Reviews Cancer.

[29]  Xin Yao,et al.  Multiclass Imbalance Problems: Analysis and Potential Solutions , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[30]  Jiuyong Li,et al.  Identifying miRNAs, targets and functions , 2012, Briefings Bioinform..

[31]  Vasile Palade,et al.  microPred: effective classification of pre-miRNAs for human miRNA gene prediction , 2009, Bioinform..

[32]  Georgina Stegmayer,et al.  miRNAfe: A comprehensive tool for feature extraction in microRNA prediction , 2015, Biosyst..

[33]  Fei Li,et al.  Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine , 2005, BMC Bioinformatics.

[34]  Peng Jiang,et al.  MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features , 2007, Nucleic Acids Res..