A Fast Algorithm for Outlier Detection in Microarray

A Fast Outlier Sample Detection(FOSD) algorithm is proposed in this paper which can be used to recognize mislabeled samples or abnormal samples in microarray datasets. The proposed algorithm uses CL-stability alorithm as a basic operator. The Machine Learning method is used as classifier in the FOSD. The outlier samples are detected depending on the gobal stability of samples. Experimental results show that the FOSD algorithm is not only better than other existing algorithms, but also robust for detecting outlier samples in microarray dataset.

[1]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[2]  Douglas M. Hawkins Identification of Outliers , 1980, Monographs on Applied Probability and Statistics.

[3]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[4]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[5]  Theodore Johnson,et al.  Fast Computation of 2-Dimensional Depth Contours , 1998, KDD.

[6]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[7]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[9]  A. Levine,et al.  Gene assessment and sample classification for gene expression data using a genetic algorithm/k-nearest neighbor method. , 2001, Combinatorial chemistry & high throughput screening.

[10]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[11]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[12]  K. Kadota,et al.  Detecting outlying samples in microarray data: A critical assessment of the effect of outliers on sample classification , 2003 .

[13]  Enrico Blanzieri,et al.  Detecting potential labeling errors in microarrays by data perturbation , 2006, Bioinform..

[14]  Chao Yan,et al.  Outlier analysis for gene expression data , 2008, Journal of Computer Science and Technology.