Multi-level gene/MiRNA feature selection using deep belief nets and active learning

Selecting the most discriminative genes/miRNAs has been raised as an important task in bioinformatics to enhance disease classifiers and to mitigate the dimensionality curse problem. Original feature selection methods choose genes/miRNAs based on their individual features regardless of how they perform together. Considering group features instead of individual ones provides a better view for selecting the most informative genes/miRNAs. Recently, deep learning has proven its ability in representing the data in multiple levels of abstraction, allowing for better discrimination between different classes. However, the idea of using deep learning for feature selection is not widely used in the bioinformatics field yet. In this paper, a novel multi-level feature selection approach named MLFS is proposed for selecting genes/miRNAs based on expression profiles. The approach is based on both deep and active learning. Moreover, an extension to use the technique for miRNAs is presented by considering the biological relation between miRNAs and genes. Experimental results show that the approach was able to outperform classical feature selection methods in hepatocellular carcinoma (HCC) by 9%, lung cancer by 6% and breast cancer by around 10% in F1-measure. Results also show the enhancement in F1-measure of our approach over recently related work in [1] and [2].

[1]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[2]  Jen-Tzung Chien,et al.  Introduction to the Special Section on Deep Learning for Speech and Language Processing , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Manfred Huber,et al.  Using deep learning to enhance cancer diagnosis and classication , 2013 .

[4]  Anton J. Enright,et al.  Correction: Human MicroRNA Targets , 2005, PLoS Biology.

[5]  Yan Liu,et al.  Semiconducting bilinear deep learning for incomplete image recognition , 2012, ICMR '12.

[6]  Mira Ayadi,et al.  Gene Expression Classification of Colon Cancer into Molecular Subtypes: Characterization, Validation, and Prognostic Value , 2013, PLoS medicine.

[7]  Thibault Helleputte,et al.  Robust biomarker identification for cancer diagnosis with ensemble feature selection methods , 2010, Bioinform..

[8]  Fillia Makedon,et al.  Application of Relief-F feature filtering algorithm to selecting informative genes for cancer classification using microarray data , 2004 .

[9]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[10]  Satoru Miyano,et al.  A Top-r Feature Selection Algorithm for Microarray Gene Expression Data , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  Mohamed A. Ismail,et al.  miRNA and gene expression based cancer classification using self-learning and co-training approaches , 2013, 2013 IEEE International Conference on Bioinformatics and Biomedicine.

[12]  Hiroshi Tanaka,et al.  Identification of pathogenesis-related microRNAs in hepatocellular carcinoma by expression profiling. , 2012, Oncology letters.