Missing Value Estimation in DNA Microarrays Using B-Splines

Gene expression profiles generated by the high- throughput microarray experiments are usually in the form of large matrices with high dimensionality. Unfortunately, microarray experiments can generate data sets with multiple missing values, which significantly affect the performance of subsequent statistical analysis and machine learning algorithms. Numerous imputation algorithms have been proposed to estimate the missing values. However, most of these algorithms fail to take into account the fact that gene expressions are continuous time series, and deal with gene expression profiles in terms of discrete data. In this paper, we present a new approach, FDVSplineImpute, for time series gene expression analysis that permits the estimation of missing observations using B-splines of similar genes from fuzzy difference vectors. We have used smoothing splines to relax the fit of the splines so that they are less prone to over fitting the data. Our algorithm shows significant improvement over the current state-of-the-art methods in use. 