Missing data imputation using Evolutionary k- Nearest neighbor algorithm for gene expression data

Gene expression data are recognized as a common data source which contains missing expression values. In this paper, we present a genetic algorithm optimized k- Nearest neighbor algorithm (Evolutionary kNNImputation) for missing data imputation. Despite the common imputation methods this paper addresses the effectiveness of using supervised learning algorithms for missing data imputation. We have compared the k- Nearest Neighbor Imputation algorithm with the proposed Evolutionary k- Nearest Neighbor Imputation algorithm. The two algorithms were tested using gene expression datasets. Certain percentages of values are randomly deleted in the datasets and recovered the missing values using the two algorithms. Results show that Evolutionary kNNImputation outperforms kNNImputation.

[1]  Hui-Hui Li,et al.  Semi-supervised imputation for microarray missing value estimation , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[2]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[3]  Hong Yan,et al.  Missing value imputation for gene expression data: computational techniques to recover missing data from available information , 2011, Briefings Bioinform..

[4]  William Perrizo,et al.  Gene Function Prediction , 2009, SEDE.

[5]  Gustavo E. A. P. A. Batista,et al.  A Study of K-Nearest Neighbour as an Imputation Method , 2002, HIS.

[6]  Gene H. Golub,et al.  Missing value estimation for DNA microarray gene expression data: local least squares imputation , 2005, Bioinform..

[7]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[8]  Scott M. Thede An introduction to genetic algorithms , 2004 .

[9]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[10]  Edgar Acuña,et al.  The Treatment of Missing Values and its Effect on Classifier Accuracy , 2004 .

[11]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[12]  Jiri Kaiser Algorithm for Missing Values Imputation in Categorical Data with Use of Association Rules , 2012, ArXiv.

[13]  Wan-Chi Siu,et al.  Use of biclustering for missing value imputation in gene expression data , 2013, Artif. Intell. Res..

[14]  Ahmet Arslan,et al.  A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm , 2013, Inf. Sci..

[15]  Andrew Gelman,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models: Missing-data imputation , 2006 .

[16]  Audris Mockus,et al.  Missing Data in Software Engineering , 2008, Guide to Advanced Empirical Software Engineering.

[17]  Ajith Abraham,et al.  Design and Application of Hybrid Intelligent Systems , 2004 .

[18]  D. Botstein,et al.  Genomic expression responses to DNA-damaging agents and the regulatory role of the yeast ATR homolog Mec1p. , 2001, Molecular biology of the cell.

[19]  Werasak Kurutach,et al.  Cluster-based KNN missing value imputation for DNA microarray data , 2012, 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[20]  Chi Zhang,et al.  The Nearest Neighbor Algorithm of Filling Missing Data Based on Cluster Analysis , 2013 .

[21]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[22]  William Eberle,et al.  Genetic algorithms in feature and instance selection , 2013, Knowl. Based Syst..

[23]  Shin Ishii,et al.  A Bayesian missing value estimation method for gene expression profile data , 2003, Bioinform..

[24]  Shmuel Friedland,et al.  An Algorithm for Missing Value Estimation for DNA Microarray Data , 2005, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.