Handling Label Noise in Microarray Classification with One-Class Classifier Ensemble

The advance of high-throughput techniques, such as gene microarrays and protein chips have a major impact on contemporary biology and medicine. Due to the high-dimensionality and complexity of the data, it is impossible to analyze it manually. Therefore machine learning techniques play an important role in dealing with such data. In this paper, we investigate the influence of label noise on the effectiveness of classification system applied to microarray analysis. Popular methods do not have any mechanism for handling such difficulties embedded in the nature of data. To cope with that, we propose to use a one-class classifiers, which distinct from canonical methods, rely on objects coming from single class distributions only. They distinguish observations coming from the given class from any other possible decision about the examples, that were unseen during the classification step. While having less information to dichotomize between classes, one-class models can easily learn the specific properties of a given data set and are robust to difficulties embedded in the nature of the data. We show, that using ensembles of one-class classifiers can give as good results as canonical multi-class classifiers, while allowing to deal with unexpected label noise in the data. Experimental investigations, carried out on public data sets, prove the usefulness of the proposed approach.

[1]  Michal Wozniak,et al.  Soft computing methods applied to combination of one-class classifiers , 2012, Neurocomputing.

[2]  Mário A. T. Figueiredo,et al.  Soft clustering using weighted one-class support vector machines , 2009, Pattern Recognit..

[3]  F. Pépin,et al.  Stromal gene expression predicts clinical outcome in breast cancer , 2008, Nature Medicine.

[4]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Dimitrios K. Iakovidis,et al.  nsupervised SVM-based gridding for DNA microarray images , 2009 .

[6]  Tomoyuki Shirai,et al.  MMP-7 promotes prostate cancer-induced osteolysis via the solubilization of RANKL. , 2005, Cancer cell.

[7]  Emilio Corchado,et al.  A survey of multiple classifier systems as hybrid systems , 2014, Inf. Fusion.

[8]  Bernhard Schölkopf,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[9]  Robert P. W. Duin,et al.  Outlier Detection Using Ball Descriptions with Adjustable Metric , 2006, SSPR/SPR.

[10]  Concha Bielza,et al.  Machine Learning in Bioinformatics , 2008, Encyclopedia of Database Systems.

[11]  M. Ringnér,et al.  Analyzing array data using supervised methods. , 2002, Pharmacogenomics.

[12]  Robert P. W. Duin,et al.  Support Vector Data Description , 2004, Machine Learning.

[13]  C. Scrideli,et al.  Gene expression pattern contributing to prognostic factors in childhood acute lymphoblastic leukemia , 2013, Leukemia & lymphoma.

[14]  Carla E. Brodley,et al.  FRaC: a feature-modeling approach for semi-supervised and unsupervised anomaly detection , 2012, Data Mining and Knowledge Discovery.

[15]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[16]  Bartosz Krawczyk,et al.  Combining one-class support vector machines for microarray classification , 2013, 2013 Federated Conference on Computer Science and Information Systems.

[17]  Pedro Larrañaga,et al.  Filter versus wrapper gene selection approaches in DNA microarray domains , 2004, Artif. Intell. Medicine.

[18]  A. Tinker,et al.  The challenges of gene expression microarrays for the study of human cancer. , 2006, Cancer cell.

[19]  Zu-Guo Yu,et al.  Fuzzy C-means method with empirical mode decomposition for clustering microarray data , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[20]  Kohbalan Moorthy,et al.  Random forest for gene selection and microarray data classification. , 2011 .

[21]  De-Shuang Huang,et al.  Cancer classification using Rotation Forest , 2008, Comput. Biol. Medicine.

[22]  Mohamed H. Sayegh,et al.  Identification of cells initiating human melanomas , 2008, Nature.