Exploring Features and Classifiers to Classify MicroRNA Expression Profiles of Human Cancer

Recently, some non-coding small RNAs, known as microRNAs (miRNA), have drawn a lot of attention to identify their role in gene regulation and various biological processes. The miRNA profiles are surprisingly informative, reflecting the malignancy state of the tissues. In this paper, we attempt to explore extensive features and classifiers through a comparative study of the most promising feature selection methods and machine learning classifiers. Here we use the expression profile of 217 miRNAs from 186 samples, including multiple human cancers. Pearson's and Spearman's correlation coefficients, Euclidean distance, cosine coefficient, information gain, mutual information and signal to noise ratio have been used for feature selection. Backpropagation neural network, support vector machine, and knearest neighbor have been used for classification. Experimental results indicate that k-nearest neighbor with cosine coefficient produces the best result, 95.0% of recognition rate on the test data.

[1]  H. Horvitz,et al.  MicroRNA expression profiles classify human cancers , 2005, Nature.

[2]  Chee Keong Kwoh,et al.  Informative MicroRNA Expression Patterns for Cancer Classification , 2006, BioDM.

[3]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[4]  V. Ambros The functions of animal microRNAs , 2004, Nature.

[5]  Rui Xu,et al.  MicroRNA expression profile based cancer classification using Default ARTMAP , 2009, Neural Networks.

[6]  Sung-Bae Cho,et al.  Machine Learning in DNA Microarray Analysis for Cancer Classification , 2003, APBC.

[7]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.

[8]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[9]  Vladimir Pavlovic,et al.  RankGene: identification of diagnostic genes based on expression data , 2003, Bioinform..

[10]  Sung-Bae Cho Exploring Features and Classifiers to Classify Gene Expression Profiles of Acute Leukemia , 2002, Int. J. Pattern Recognit. Artif. Intell..

[11]  Ah-Hwee Tan,et al.  Data Mining for Biomedical Applications , 2006, Lecture Notes in Computer Science.

[12]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[13]  LarrañagaPedro,et al.  A review of feature selection techniques in bioinformatics , 2007 .