Transformed Distribution Matching for Missing Value Imputation

We study the problem of imputing missing values in a dataset, which has important applications in many domains. The key to missing value imputation is to capture the data distribution with incomplete samples and impute the missing values accordingly. In this paper, by leveraging the fact that any two batches of data with missing values come from the same data distribution, we propose to impute the missing values of two batches of samples by transforming them into a latent space through deep invertible functions and matching them distributionally. To learn the transformations and impute the missing values simultaneously, a simple and well-motivated algorithm is proposed. Our algorithm has fewer hyperparameters to fine-tune and generates high-quality imputations regardless of how missing values are generated. Extensive experiments over a large number of datasets and competing benchmark algorithms show that our method achieves state-of-the-art performance.

[1]  Edwin V. Bonilla,et al.  Learning Directed Graphical Models with Optimal Transport , 2023, 2305.15927.

[2]  Dinh Q. Phung,et al.  Vector Quantized Wasserstein Auto-Encoder , 2023, ICML.

[3]  P. Zhao,et al.  Handling Missing Data via Max-Entropy Regularized Graph Autoencoder , 2022, AAAI.

[4]  Zhiqi Bu,et al.  Multiple Imputation with Neural Network Gaussian Process for High-dimensional Incomplete Data , 2022, ACML.

[5]  Senzhang Wang,et al.  Generative-Free Urban Flow Imputation , 2022, CIKM.

[6]  H. Zha,et al.  Adaptive Distribution Calibration for Few-Shot Learning with Hierarchical Optimal Transport , 2022, NeurIPS.

[7]  Xiuqin Liang,et al.  GEDI: A Graph-based End-to-end Data Imputation Framework , 2022, 2023 IEEE 35th International Conference on Tools with Artificial Intelligence (ICTAI).

[8]  H. Zha,et al.  Learning to Re-weight Examples with Optimal Transport for Imbalanced Classification , 2022, NeurIPS.

[9]  N. Dobigeon,et al.  Learning Optimal Transport Between two Empirical Distributions with Normalizing Flows , 2022, ECML/PKDD.

[10]  M. Schaar,et al.  HyperImpute: Generalized Iterative Imputation with Automatic Model Selection , 2022, ICML.

[11]  Fang Fang,et al.  FragmGAN: Generative Adversarial Nets for Fragmentary Data Imputation and Prediction , 2022, Statistical Theory and Related Fields.

[12]  Mingyuan Zhou,et al.  Representing Mixtures of Word Embeddings with Mixtures of Topic Embeddings , 2022, ICLR.

[13]  Quan Hung Tran,et al.  A Unified Wasserstein Distributional Robustness Framework for Adversarial Training , 2022, ICLR.

[14]  José Miguel Hernández-Lobato,et al.  Missing Data Imputation and Acquisition with Deep Hierarchical Models and Hamiltonian Monte Carlo , 2022, NeurIPS.

[15]  Zhiqi Bu,et al.  Multiple Imputation via Generative Adversarial Network for High-dimensional Blockwise Missing Value Problems , 2021, 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA).

[16]  Ramon Vinas,et al.  A Graph-based Imputation Method for Sparse Medical Records , 2021, ArXiv.

[17]  Mihaela van der Schaar,et al.  MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms , 2021, NeurIPS.

[18]  H. Zha,et al.  Learning Prototype-oriented Set Representations for Meta-Learning , 2021, ICLR.

[19]  Simon L. Peyton Jones,et al.  Simultaneous Missing Value Imputation and Structure Learning with Groups , 2021, NeurIPS.

[20]  Cong Xu,et al.  Missingness Augmentation: A General Approach for Improving Generative Imputation Models , 2021, ArXiv.

[21]  S. Ghosh,et al.  EMFlow: Data Imputation in Latent Space via EM and Deep Flow Models , 2021, ArXiv.

[22]  Zeming Li,et al.  OTA: Optimal Transport Assignment for Object Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jiawei He,et al.  Variational Selective Autoencoder: Learning from Partially-Observed Heterogeneous Data , 2021, AISTATS.

[24]  Jure Leskovec,et al.  Handling Missing Data with Graph Representation Learning , 2020, NeurIPS.

[25]  Trung Le,et al.  Neural Topic Model via Optimal Transport , 2020, ICLR.

[26]  Sanghoon Sull,et al.  GAMIN: Generative Adversarial Multiple Imputation Network for Highly Missing Data , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Ivan Kobyzev,et al.  Normalizing Flows: An Introduction and Review of Current Methods , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Wencheng Wu,et al.  McFlow: Monte Carlo Flow Models for Data Imputation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Chunhua Shen,et al.  DeepEMD: Differentiable Earth Mover's Distance for Few-Shot Learning , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Julie Josse,et al.  Missing Data Imputation using Optimal Transport , 2020, ICML.

[31]  Eric Nalisnick,et al.  Normalizing Flows for Probabilistic Modeling and Inference , 2019, J. Mach. Learn. Res..

[32]  Nicholas J. Tierney,et al.  R-miss-tastic: a unified platform for missing values methods and workflows , 2019, R J..

[33]  Michael Tschannen,et al.  On Mutual Information Maximization for Representation Learning , 2019, ICLR.

[34]  R Devon Hjelm,et al.  Learning Representations by Maximizing Mutual Information Across Views , 2019, NeurIPS.

[35]  Marco Cuturi,et al.  Computational Optimal Transport: With Applications to Data Science , 2019 .

[36]  Jes Frellsen,et al.  MIWAE: Deep Generative Modelling and Imputation of Incomplete Data Sets , 2019, ICML.

[37]  Alexander A. Alemi,et al.  On Variational Bounds of Mutual Information , 2019, ICML.

[38]  Bo Jiang,et al.  MisGAN: Learning from Incomplete Data with Generative Adversarial Networks , 2019, ICLR.

[39]  Matthias Bethge,et al.  Excessive Invariance Causes Adversarial Vulnerability , 2018, ICLR.

[40]  Alain Trouvé,et al.  Interpolating between Optimal Transport and MMD using Sinkhorn Divergences , 2018, AISTATS.

[41]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[42]  Ullrich Köthe,et al.  Analyzing Inverse Problems with Invertible Neural Networks , 2018, ICLR.

[43]  Pablo M. Olmos,et al.  Handling Incomplete Heterogeneous Data using VAEs , 2018, Pattern Recognit..

[44]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[45]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[46]  Mihaela van der Schaar,et al.  GAIN: Missing Data Imputation using Generative Adversarial Nets , 2018, ICML.

[47]  Dmitry Vetrov,et al.  Variational Autoencoder with Arbitrary Conditioning , 2018, ICLR.

[48]  Alexander Gasnikov,et al.  Computational Optimal Transport: Complexity by Accelerated Gradient Descent Is Better Than by Sinkhorn's Algorithm , 2018, ICML.

[49]  Raquel Urtasun,et al.  The Reversible Residual Network: Backpropagation Without Storing Activations , 2017, NIPS.

[50]  Gabriel Peyré,et al.  Learning Generative Models with Sinkhorn Divergences , 2017, AISTATS.

[51]  Jason Altschuler,et al.  Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration , 2017, NIPS.

[52]  Lovedeep Gondara,et al.  Multiple Imputation Using Deep Denoising Autoencoders , 2017, ArXiv.

[53]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[54]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[55]  T. Raghunathan,et al.  Convergence Properties of a Sequential Regression Multiple Imputation Algorithm , 2015 .

[56]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[57]  Trevor J. Hastie,et al.  Matrix completion and low-rank SVD via fast alternating least squares , 2014, J. Mach. Learn. Res..

[58]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[59]  Arnaud Doucet,et al.  Fast Computation of Wasserstein Barycenters , 2013, ICML.

[60]  Dan Jackson,et al.  What Is Meant by "Missing at Random"? , 2013, 1306.2812.

[61]  Stef van Buuren,et al.  Flexible Imputation of Missing Data , 2012 .

[62]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[63]  Wolfgang Heidrich,et al.  Displacement interpolation using Lagrangian mass transport , 2011, ACM Trans. Graph..

[64]  X. Nguyen Wasserstein distances for discrete measures and convergence in nonparametric mixture models 1 , 2011 .

[65]  Peter Bühlmann,et al.  MissForest - non-parametric missing value imputation for mixed-type data , 2011, Bioinform..

[66]  C. Bordenave,et al.  Combinatorial Optimization Over Two Random Point Sets , 2011, 1103.2734.

[67]  A. Gelman,et al.  ON THE STATIONARY DISTRIBUTION OF ITERATIVE IMPUTATIONS , 2010, 1012.2902.

[68]  Robert Tibshirani,et al.  Spectral Regularization Algorithms for Learning Large Incomplete Matrices , 2010, J. Mach. Learn. Res..

[69]  D. Rubin,et al.  Fully conditional specification in multivariate imputation , 2006 .

[70]  A. Gelman Parameterization and Bayesian Modeling , 2004 .

[71]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[72]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[73]  Xiao-Li Meng,et al.  Applications of multiple imputation in medical studies: from AIDS to NHANES , 1999, Statistical methods in medical research.

[74]  D. Rubin Multiple imputation for nonresponse in surveys , 1989 .

[75]  Ralph Linsker,et al.  Self-organization in a perceptual network , 1988, Computer.

[76]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[77]  Huanhuan Chen,et al.  Graph Neural Networks for Missing Value Classification in a Task-Driven Metric Space , 2023, IEEE Transactions on Knowledge and Data Engineering.

[78]  Quan Hung Tran,et al.  Cycle class consistency with distributional optimal transport and knowledge distillation for unsupervised domain adaptation , 2022, UAI.

[79]  Nicolas Courty,et al.  POT: Python Optimal Transport , 2021, J. Mach. Learn. Res..

[80]  Quan Hung Tran,et al.  Most: multi-source domain adaptation via optimal transport for student-teacher learning , 2021, UAI.

[81]  Dinh Phung,et al.  OTLDA: A Geometry-aware Optimal Transport Approach for Topic Modeling , 2020, NeurIPS.

[82]  Kilian Q. Weinberger,et al.  Graphical Models for Inference with Missing Data , 2014 .

[83]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[84]  Ravindra K. Ahuja,et al.  Applications of network optimization , 1992 .