An Algorithm for Imputing Missing Values in Microarray Gene Expression Data

In recent years, rapid developments in genomics and proteomics have generated a large amount of biological data. Dealing with such huge data has become extremely challenging with traditional data analysis techniques. Bioinformatics is the interdisciplinary science of interpreting biological data that uses the tools and techniques of information technology and computer science. However, gene expressions generated by the high-throughput microarray experiments often contain missing values, which significantly affect the performance of subsequent statistical analysis and clustering algorithms. So there is a great need for estimating or imputing these missing values as accurately as possible. In general the missing values could be imputed by means of various methods namely ignoring the tuple, using the attribute mean to fill the missing value, using a global constant to fill in the missing value. In this paper a new approach called JAD (Java Application Development) imputation is proposed for missing values that can be estimated more accurately. The results show that the proposed method called JAD imputation provides a better solution to complete the microarray gene expression data.