Estimating Missing Value in Microarray Data Using Fuzzy Clustering and Gene Ontology

Microarray experiments usually generate data sets with multiple missing expression values, due to several problems. In this paper, a new and robust method based on fuzzy clustering and gene ontology is proposed to estimate missing values in microarray data. In the proposed method, missing values are imputed with values generated from cluster centers. To determine the similar genes in clustering process, we have utilized the biological knowledge obtained from gene ontology as well as gene expression values. We have applied the proposed method on yeast cell cycle data with different percentage of missing entries. We compared the estimation accuracy of our method with some other methods. The experimental results indicate that the proposed method outperforms other methods in terms of accuracy.

[1]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[2]  Witold Pedrycz,et al.  Advances in Fuzzy Clustering and its Applications , 2007 .

[3]  Atul J. Butte,et al.  Determining Significant Fold Differences in Gene Expression Analysis , 2000, Pacific Symposium on Biocomputing.

[4]  Mohammed Yeasin,et al.  Two-way Clustering using Fuzzy ASI for Knowledge Discovery in Microarrays , 2007, 2007 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology.

[5]  YanWang,et al.  Missing value estimation for microarray data based on fuzzy C-means clustering , 2005, Eighth International Conference on High-Performance Computing in Asia-Pacific Region (HPCASIA'05).

[6]  Chengqi Zhang,et al.  Missing Value Imputation Based on Data Clustering , 2008, Trans. Comput. Sci..

[7]  Shichao Zhang,et al.  Clustering-based Missing Value Imputation for Data Preprocessing , 2006, 2006 4th IEEE International Conference on Industrial Informatics.

[8]  E. Winzeler,et al.  Genomics, gene expression and DNA arrays , 2000, Nature.

[9]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[10]  James C. Bezdek,et al.  Fuzzy c-means clustering of incomplete data , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[11]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.