Regularized Sparse Modelling for Microarray Missing Value Estimation

The existence of missing values in microarray data inevitably hinders downstream biological analyses that expect complete data as input, therefore how to effectively explore the underlying structure of data to accurately estimate missing entries remains crucial and meaningful. In this study, we formalize the problem under a regularized sparse framework and accordingly propose local learning-based imputation models to capture the relationships that are hidden in gene expression profiles towards better imputation. Specifically, in view of the simultaneous variable selection and grouping effect of the elastic net penalty, we present an elastic net regularized local least squares-based imputation method to estimate the missing entries of a target gene with its neighbors. Besides, we investigate different similarity filtering metrics to select neighbor genes and develop another four imputation methods under the framework. Furthermore, the proposed methods process the target genes in ascending order of their associated missing rates. Finally, extensive comparative experiments against other eight commonly-used methods are conducted on multiple microarray datasets having varying missing rates. Results indicate the power of sparse regularization techniques and the superiority of elastic net over its competitors in terms of statistical analysis metrics.

[1]  Gene H. Golub,et al.  Missing value estimation for DNA microarray gene expression data: local least squares imputation , 2005, Bioinform..

[2]  Ming Ouyang,et al.  DNA microarray data imputation and significance analysis of differential expression , 2005, Bioinform..

[3]  Wei Zhang,et al.  Triple imputation for microarray missing value estimation , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[4]  Serge A. Hazout,et al.  Influence of microarrays experiments missing values on the stability of gene groups by hierarchical clustering , 2004, BMC Bioinformatics.

[5]  Yuzheng Zhang,et al.  Cross-Species Antibody Microarray Interrogation Identifies a 3-Protein Panel of Plasma Biomarkers for Early Diagnosis of Pancreas Cancer , 2015, Clinical Cancer Research.

[6]  Hong Yan,et al.  Missing value imputation for gene expression data: computational techniques to recover missing data from available information , 2011, Briefings Bioinform..

[7]  Yan Wu,et al.  A New Filter Feature Selection Based on Criteria Fusion for Gene Microarray Data , 2018, IEEE Access.

[8]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[9]  Caroline Truntzer,et al.  Importance of data structure in comparing two dimension reduction methods for classification of microarray gene expression data , 2007, BMC Bioinformatics.

[10]  T. H. Bø,et al.  LSimpute: accurate estimation of missing values in microarray data with least squares methods. , 2004, Nucleic acids research.

[11]  Hamid Reza Karimi,et al.  Missing Value Estimation for Microarray Data by Bayesian Principal Component Analysis and Iterative Local Least Squares , 2013 .

[12]  Roland R. Draxler,et al.  Root mean square error (RMSE) or mean absolute error (MAE) , 2014 .

[13]  Wei-Sheng Wu,et al.  Shrinkage regression-based methods for microarray missing value imputation , 2013, BMC Systems Biology.

[14]  Ning An,et al.  Microarray Missing Value Imputation: A Regularized Local Learning Method , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[15]  Tianwei Yu,et al.  Incorporating Nonlinear Relationships in Microarray Missing Value Imputation , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  Xiaofeng Song,et al.  Sequential local least squares imputation estimating missing value of microarray data , 2008, Comput. Biol. Medicine.

[17]  Shin Ishii,et al.  A Bayesian missing value estimation method for gene expression profile data , 2003, Bioinform..

[18]  Asit Kumar Das,et al.  Missing value estimation for microarray data through cluster analysis , 2017, Knowledge and Information Systems.

[19]  Lili Jiang,et al.  A global learning with local preservation method for microarray data imputation , 2016, Comput. Biol. Medicine.

[20]  Ki-Yeol Kim,et al.  Reuse of imputed data in microarray analysis increases imputation efficiency , 2004, BMC Bioinformatics.

[21]  R. Devi Priya,et al.  Pre-processing of microarray gene expression data for classification using adaptive feature selection and imputation of non-ignorable missing values , 2016, Int. J. Data Min. Bioinform..

[22]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[23]  Andreas Schuppert,et al.  Principal components analysis and the reported low intrinsic dimensionality of gene expression microarray data , 2016, Scientific Reports.

[24]  Christian A. Rees,et al.  Systematic variation in gene expression patterns in human cancer cell lines , 2000, Nature Genetics.

[25]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[26]  C. Artieri,et al.  Molecular evidence for increased regulatory conservation during metamorphosis, and against deleterious cascading effects of hybrid breakdown in Drosophila , 2010, BMC Biology.

[27]  Shuiwang Ji,et al.  SLEP: Sparse Learning with Efficient Projections , 2011 .

[28]  D. Botstein,et al.  Genomic expression responses to DNA-damaging agents and the regulatory role of the yeast ATR homolog Mec1p. , 2001, Molecular biology of the cell.

[29]  Lígia P. Brás,et al.  Improving cluster-based missing value estimation of DNA microarray data. , 2007, Biomolecular engineering.

[30]  Gil Alterovitz,et al.  Subtype dependent biomarker identification and tumor classification from gene expression profiles , 2018, Knowl. Based Syst..

[31]  Ivan G. Costa,et al.  Impact of missing data imputation methods on gene expression clustering and classification , 2015, BMC Bioinformatics.

[32]  S. Thorgeirsson,et al.  Definition of ubiquitination modulator COP1 as a novel therapeutic target in human hepatocellular carcinoma. , 2010, Cancer research.

[33]  A. Malpertuy,et al.  Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments , 2010, BMC Genomics.

[34]  Samiran Chattopadhyay,et al.  A novel biclustering based missing value prediction method for microarray gene expression data , 2015, 2015 International Conference on Man and Machine Interfacing (MAMI).

[35]  Hong Yan,et al.  A Bicluster-Based Bayesian Principal Component Analysis Method for Microarray Missing Value Estimation , 2014, IEEE Journal of Biomedical and Health Informatics.

[36]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Guo-Zheng Li,et al.  A hybrid imputation approach for microarray missing value estimation , 2015, BMC Genomics.

[38]  Yihua Zhu,et al.  gwSPIA: Improved Signaling Pathway Impact Analysis With Gene Weights , 2019, IEEE Access.

[39]  Yang Yang,et al.  Missing value imputation for microRNA expression data by using a GO-based similarity measure , 2016, BMC Bioinformatics.

[40]  Ming Ouyang,et al.  Gaussian mixture clustering and imputation of microarray data , 2004, Bioinform..

[41]  Jiang Wang,et al.  Missing value imputation for microarray gene expression data using histone acetylation information , 2008, BMC Bioinformatics.