miRNA and gene expression based cancer classification using self-learning and co-training approaches

A number of attempts to classify cancer samples using miRNA/gene expression profiles are known in literature. However, semi-supervised learning models have only been recently introduced to exploit the huge unlabeled expression profiles in enhancing sample classification. It is important to combine both miRNA and gene expression sets as that provides more information on the characteristics of cancer samples. The use of both of labeled and unlabeled miRNA and gene expression sets to enhance sample classification has not been explored yet. In this paper, two semi-supervised machine learning approaches, namely self-learning and co-training are adapted to enhance the quality of cancer sample classification. In self-learning, miRNA and gene based classifiers are enhanced independently. While in co-training, both miRNA and gene expression profiles are used simultaneously to provide different views of cancer samples. The approaches were evaluated using breast cancer, hepatocellular carcinoma (HCC) and lung cancer expression sets. Results show up to 20% improvement in F1-measure over Random Forests and SVM classifiers. Co-Training also outperforms Low Density Separation (LDS) approach by around 25% improvement in F1-measure in breast cancer.

[1]  R. Aharonov,et al.  MicroRNAs accurately identify cancer tissue origin , 2008, Nature Biotechnology.

[2]  J. Ferlay,et al.  Global Cancer Statistics, 2002 , 2005, CA: a cancer journal for clinicians.

[3]  Michiie Sakamoto,et al.  Growth and spread of hepatocellular carcinoma: A review of 240 consecutive autopsy cases , 1990, Cancer.

[4]  Rada Mihalcea,et al.  Co-training and Self-training for Word Sense Disambiguation , 2004, CoNLL.

[5]  Anton J. Enright,et al.  Correction: Human MicroRNA Targets , 2005, PLoS Biology.

[6]  S. Thorgeirsson,et al.  Molecular pathogenesis of human hepatocellular carcinoma , 2002, Nature Genetics.

[7]  Richard Pazdur,et al.  FDA drug approval summary: bevacizumab (Avastin) plus Carboplatin and Paclitaxel as first-line treatment of advanced/metastatic recurrent nonsquamous non-small cell lung cancer. , 2007, The oncologist.

[8]  Tom Michael Mitchell,et al.  The Role of Unlabeled Data in Supervised Learning , 2004 .

[9]  Ziv Bar-Joseph,et al.  A Semi-Supervised Method for Predicting Transcription Factor–Gene Interactions in Escherichia coli , 2008, PLoS Comput. Biol..

[10]  Ellen Riloff,et al.  Learning subjective nouns using extraction pattern bootstrapping , 2003, CoNLL.

[11]  Martial Hebert,et al.  Semi-Supervised Self-Training of Object Detection Models , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[12]  Margaret H. Dunham,et al.  CLASSIFIER FUSION FOR POORLY-DIFFERENTIATED TUMOR CLASSIFICATION USING BOTH MESSENGER RNA AND MICRORNA EXPRESSION PROFILES , 2006 .

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  W. Tapper,et al.  Support Vector Machine Classifier for Estrogen Receptor Positive and Negative Early-Onset Breast Cancer , 2013, PloS one.

[15]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[16]  Jason Weston,et al.  Semi-supervised Protein Classification Using Cluster Kernels , 2003, NIPS.

[17]  H. Benjamin,et al.  Accurate Classification of Non–Small Cell Lung Carcinoma Using a Novel MicroRNA-Based Approach , 2010, Clinical Cancer Research.

[18]  I. Macdonald,et al.  Metastasis: Dissemination and growth of cancer cells in metastatic sites , 2002, Nature Reviews Cancer.

[19]  Sung-Bae Cho,et al.  Exploring Features and Classifiers to Classify MicroRNA Expression Profiles of Human Cancer , 2010, ICONIP.

[20]  David M Jablons,et al.  Randomized phase II trial comparing bevacizumab plus carboplatin and paclitaxel with carboplatin and paclitaxel alone in previously untreated locally advanced or metastatic non-small-cell lung cancer. , 2004, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[21]  Griselda Saldaña-González,et al.  Investigation of Random Forest Performance with Cancer Microarray Data , 2008, CATA.

[22]  H. Horvitz,et al.  MicroRNA expression profiles classify human cancers , 2005, Nature.

[23]  A. Jemal,et al.  Global cancer statistics , 2011, CA: a cancer journal for clinicians.

[24]  Krista A. Zanetti,et al.  Identification of metastasis‐related microRNAs in hepatocellular carcinoma , 2008, Hepatology.

[25]  Christophe Lemetre,et al.  MicroRNA signatures predict oestrogen receptor, progesterone receptor and HER2/neu receptor status in breast cancer , 2009, Breast Cancer Research.

[26]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[27]  Oleg Okun,et al.  Random Forest for Gene Expression Based Cancer Classification: Overlooked Issues , 2007, IbPRIA.

[28]  T. Okanoue,et al.  Comprehensive analysis of microRNA expression patterns in hepatocellular carcinoma and non-tumorous tissues , 2006, Oncogene.

[29]  Hiroshi Tanaka,et al.  Identification of pathogenesis-related microRNAs in hepatocellular carcinoma by expression profiling. , 2012, Oncology letters.

[30]  Thomas D. Schmittgen,et al.  Expression profiling identifies microRNA signature in pancreatic cancer , 2006, International journal of cancer.

[31]  Aidong Zhang,et al.  Gene Co-Adaboost: a semi-supervised approach for classifying gene expression data , 2011, BCB '11.

[32]  Chee Keong Kwoh,et al.  Cancer Classification With MicroRNA Expression Patterns Found By An Information Theory Approach , 2006, J. Comput..

[33]  C. Croce,et al.  A microRNA expression signature of human solid tumors defines cancer gene targets , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Stan Matwin,et al.  Email classification with co-training , 2011, CASCON.

[35]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[36]  Peter Schirmacher,et al.  MicroRNA gene expression profile of hepatitis C virus–associated hepatocellular carcinoma , 2007, Hepatology.

[37]  Dayou Liu,et al.  A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis , 2011, Expert Syst. Appl..

[38]  A. Jemal,et al.  Global Cancer Statistics , 2011 .

[39]  Bing Zhang,et al.  Semi-supervised learning improves gene expression-based prediction of cancer recurrence , 2011, Bioinform..