Improving missing value imputation of microarray data by using spot quality weights

BackgroundMicroarray technology has become popular for gene expression profiling, and many analysis tools have been developed for data interpretation. Most of these tools require complete data, but measurement values are often missing A way to overcome the problem of incomplete data is to impute the missing data before analysis. Many imputation methods have been suggested, some naïve and other more sophisticated taking into account correlation in data. However, these methods are binary in the sense that each spot is considered either missing or present. Hence, they are depending on a cutoff separating poor spots from good spots. We suggest a different approach in which a continuous spot quality weight is built into the imputation methods, allowing for smooth imputations of all spots to larger or lesser degree.ResultsWe assessed several imputation methods on three data sets containing replicate measurements, and found that weighted methods performed better than non-weighted methods. Of the compared methods, best performance and robustness were achieved with the weighted nearest neighbours method (WeNNI), in which both spot quality and correlations between genes were included in the imputation.ConclusionIncluding a measure of spot quality improves the accuracy of the missing value imputation. WeNNI, the proposed method is more accurate and less sensitive to parameters than the widely used kNNimpute and LSimpute algorithms.

[1]  Patrik Edén,et al.  Molecular signatures in childhood acute leukemia and their correlations to expression patterns in normal hematopoietic subpopulations. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[3]  M. Bittner,et al.  Gene expression profiling of alveolar rhabdomyosarcoma with cDNA microarrays. , 1998, Cancer research.

[4]  Shin Ishii,et al.  A Bayesian missing value estimation method for gene expression profile data , 2003, Bioinform..

[5]  J. Sudbø,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[6]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[7]  Tomas Johansson,et al.  Global patterns of gene regulation associated with the development of ectomycorrhiza between birch (Betula pendula Roth.) and Paxillus involutus (Batsch) Fr. , 2005, Molecular plant-microbe interactions : MPMI.

[8]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[9]  T. H. Bø,et al.  LSimpute: accurate estimation of missing values in microarray data with least squares methods. , 2004, Nucleic acids research.

[10]  Iqbal Gondal,et al.  Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data , 2005, Bioinform..

[11]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[12]  Gene H. Golub,et al.  Missing value estimation for DNA microarray gene expression data: local least squares imputation , 2005, Bioinform..

[13]  L. Penland,et al.  Use of a cDNA microarray to analyse gene expression patterns in human cancer , 1996, Nature Genetics.

[14]  Wiel H. Janssen,et al.  Evaluation studies , 1993, Generic Intelligent Driver Support.

[15]  Ken W. Y. Cho,et al.  Microarray optimizations: increasing spot accuracy and automated identification of true microarray signals. , 2002, Nucleic acids research.

[16]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[17]  Ida Scheel,et al.  The influence of missing value imputation on detection of differentially expressed genes from microarray data , 2005, Bioinform..

[18]  Patrik Edén,et al.  Intratumor versus intertumor heterogeneity in gene expression profiles of soft‐tissue sarcomas , 2005, Genes, chromosomes & cancer.

[19]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Markus Ringnér,et al.  Microarray expression profiling in melanoma reveals a BRAF mutation signature , 2004, Oncogene.

[21]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[22]  S. Gruvberger,et al.  BioArray Software Environment (BASE): a platform for comprehensive management and analysis of microarray data , 2002, Genome Biology.

[23]  B Johansson,et al.  Gene expression profiling of leukemic cell lines reveals conserved molecular signatures among subtypes with specific genetic aberrations , 2005, Leukemia.

[24]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[25]  Å. Borg,et al.  Gene expression profiles relate to SS18/SSX fusion type in synovial sarcoma , 2006, International journal of cancer.

[26]  Yan Wu,et al.  Quantitative Quality Control in Microarray Experiments and the Application in Data Filtering, Normalization and False Positive Rate Prediction , 2003, Bioinform..

[27]  Michael L. Bittner,et al.  Ratio statistics of gene expression levels and applications to microarray data analysis , 2002, Bioinform..

[28]  Ming Ouyang,et al.  Gaussian mixture clustering and imputation of microarray data , 2004, Bioinform..

[29]  Daniel Eriksson,et al.  MASQOT: a method for cDNA microarray spot quality control , 2005, BMC Bioinformatics.

[30]  Ki-Yeol Kim,et al.  Reuse of imputed data in microarray analysis increases imputation efficiency , 2004, BMC Bioinformatics.