Imputing Structured Missing Values in Spatial Data with Clustered Adversarial Matrix Factorization

Missing data problem often poses a significant challenge as it may introduce uncertainties into the data analysis. Recent advances in matrix completion have shown competitive imputation performance when applied to many real-world domains. However, there are two major limitations when applying matrix completion methods to spatial data. First, they make a strong assumption that the entries are missing-at-random, which may not hold for spatial data. Second, they may not effectively utilize the underlying spatial structure of the data. To address these limitations, this paper presents a novel clustered adversarial matrix factorization method to explore and exploit the underlying cluster structure of the spatial data in order to facilitate effective imputation. The proposed method utilizes an adversarial network to learn the joint probability distribution of the variables and improve the imputation performance for the missing entries that are not randomly sampled.

[1]  P. Marjoram,et al.  Fine-scale mapping of disease genes with multiple mutations via spatial clustering techniques. , 2003, American journal of human genetics.

[2]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[3]  Ming Ouyang,et al.  DNA microarray data imputation and significance analysis of differential expression , 2005, Bioinform..

[4]  Hiroshi Ishikawa,et al.  Globally and locally consistent image completion , 2017, ACM Trans. Graph..

[5]  Domonkos Tikk,et al.  Matrix factorization and neighbor based algorithms for the netflix prize problem , 2008, RecSys '08.

[6]  Stephen P. Boyd,et al.  Network Lasso: Clustering and Optimization in Large Graphs , 2015, KDD.

[7]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[8]  Inderjit S. Dhillon,et al.  Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems , 2012, 2012 IEEE 12th International Conference on Data Mining.

[9]  Patrick Seemann,et al.  Matrix Factorization Techniques for Recommender Systems , 2014 .

[10]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[11]  Douglas Manuel,et al.  A comparison of measures of socioeconomic status for adolescents in a Canadian national health survey. , 2005, Chronic diseases in Canada.

[12]  Nicholas L. Crookston,et al.  The roles of nearest neighbor methods in imputing missing data in forest inventory and monitoring databases , 2009 .

[13]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Danh V. Nguyen,et al.  Evaluation of Missing Value Estimation for Microarray Data , 2004, Journal of Data Science.

[15]  Prateek Jain,et al.  Low-rank matrix completion using alternating minimization , 2012, STOC '13.

[16]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[17]  Francis R. Bach,et al.  Clusterpath: an Algorithm for Clustering using Convex Fusion Penalties , 2011, ICML.

[18]  Wadood Yahya Image reconstruction from a limited number of samples: a matrix-completion-based approach , 2012 .

[19]  W. W. Jones,et al.  LAGOS-NE: a multi-scaled geospatial and temporal database of lake ecological context and water quality for thousands of US lakes , 2017, GigaScience.

[20]  Michel Verleysen,et al.  K nearest neighbours with mutual information for simultaneous classification and missing data imputation , 2009, Neurocomputing.