An ensemble based missing value estimation in DNA microarray using artificial neural network

DNA microarrays are normally used to measure the expression values of thousands of several genes simultaneously in the form of large matrices. This raw gene expression data may contain some missing cells. These missing values may affect the analysis performed subsequently on these gene expression data. Several imputation methods, like K-Nearest Neighbor Imputation (KNNImpute), Singular Value Decomposition Imputation (SVDImpute), Local Least Square Imputation (LLSImpute), Bayesian Principal Component Analysis (BPCAImpute) etc. have already been proposed to impute those missing values. In this work we have proposed an ensemble classifier based Artificial Neural Network implementation, ANNImpute, to enhance the accuracy of the missing value imputation technique by applying Two Layer Perceptron Learning algorithm. Ensemble classification is done on the parameters such as learning rate a, weight vector & bias. We have applied our algorithm on two benchmark datasets like SPELLMAN and Tumour (GDS2932) and the results show that this approach performs well compared to the other existing methods as far as RMSE measures are concerned.

[1]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[2]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Kashi Nath Dey,et al.  Missing Value Estimation in DNA Microarrays Using B-Splines , 2013 .

[4]  Sumaira Tasnim,et al.  Ensemble Classifiers and Their Applications: A Review , 2014, ArXiv.

[5]  Kashi Nath Dey,et al.  Missing Value Estimation in DNA Microarrays using Linear Regression and Fuzzy Approach , 2016 .

[6]  Mehran Amiri,et al.  Missing data imputation using fuzzy-rough methods , 2016, Neurocomputing.

[7]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[8]  A. Roli Artificial Neural Networks , 2012, Lecture Notes in Computer Science.

[9]  Prabhat,et al.  Artificial Neural Network , 2018, Encyclopedia of GIS.

[10]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[11]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[12]  Sujay Saha,et al.  Missing Value Estimation in DNA Microarray – A Fuzzy Approach , 2012 .

[13]  Ravi Kiran Reddy Kalathur,et al.  Expression profiling of genes regulated by TGF-beta: Differential regulation in normal and tumour cells , 2007, BMC Genomics.

[14]  Gene H. Golub,et al.  Missing value estimation for DNA microarray gene expression data: local least squares imputation , 2005, Bioinform..

[15]  B. Yegnanarayana,et al.  Artificial Neural Networks , 2004 .

[16]  Tommi S. Jaakkola,et al.  Continuous Representations of Time-Series Gene Expression Data , 2003, J. Comput. Biol..

[17]  Shigang Liu,et al.  Examination of Reliability of Missing Value Recovery in Data Mining , 2014, 2014 IEEE International Conference on Data Mining Workshop.

[18]  Taesung Park,et al.  Robust imputation method for missing values in microarray data , 2007, BMC Bioinformatics.

[19]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[20]  Bo Wen Wang,et al.  IMPROVING MISSING-VALUE ESTIMATION IN MICROARRAY DATA WITH COLLABORATIVE FILTERING BASED ON ROUGH-SET THEORY , 2012 .

[21]  Shin Ishii,et al.  A Bayesian missing value estimation method for gene expression profile data , 2003, Bioinform..