论文信息 - Transformed Distribution Matching for Missing Value Imputation

Transformed Distribution Matching for Missing Value Imputation

We study the problem of imputing missing values in a dataset, which has important applications in many domains. The key to missing value imputation is to capture the data distribution with incomplete samples and impute the missing values accordingly. In this paper, by leveraging the fact that any two batches of data with missing values come from the same data distribution, we propose to impute the missing values of two batches of samples by transforming them into a latent space through deep invertible functions and matching them distributionally. To learn the transformations and impute the missing values simultaneously, a simple and well-motivated algorithm is proposed. Our algorithm has fewer hyperparameters to fine-tune and generates high-quality imputations regardless of how missing values are generated. Extensive experiments over a large number of datasets and competing benchmark algorithms show that our method achieves state-of-the-art performance.

Edwin V. Bonilla | Ke Sun | A. Dezfouli | He Zhao

[1] Edwin V. Bonilla,et al. Learning Directed Graphical Models with Optimal Transport , 2023, 2305.15927.

[2] Dinh Q. Phung,et al. Vector Quantized Wasserstein Auto-Encoder , 2023, ICML.

[3] P. Zhao,et al. Handling Missing Data via Max-Entropy Regularized Graph Autoencoder , 2022, AAAI.

[4] Zhiqi Bu,et al. Multiple Imputation with Neural Network Gaussian Process for High-dimensional Incomplete Data , 2022, ACML.

[5] Senzhang Wang,et al. Generative-Free Urban Flow Imputation , 2022, CIKM.

[6] H. Zha,et al. Adaptive Distribution Calibration for Few-Shot Learning with Hierarchical Optimal Transport , 2022, NeurIPS.

[7] Xiuqin Liang,et al. GEDI: A Graph-based End-to-end Data Imputation Framework , 2022, 2023 IEEE 35th International Conference on Tools with Artificial Intelligence (ICTAI).

[8] H. Zha,et al. Learning to Re-weight Examples with Optimal Transport for Imbalanced Classification , 2022, NeurIPS.

[9] N. Dobigeon,et al. Learning Optimal Transport Between two Empirical Distributions with Normalizing Flows , 2022, ECML/PKDD.

[10] M. Schaar,et al. HyperImpute: Generalized Iterative Imputation with Automatic Model Selection , 2022, ICML.

[11] Fang Fang,et al. FragmGAN: Generative Adversarial Nets for Fragmentary Data Imputation and Prediction , 2022, Statistical Theory and Related Fields.

[12] Mingyuan Zhou,et al. Representing Mixtures of Word Embeddings with Mixtures of Topic Embeddings , 2022, ICLR.

[13] Quan Hung Tran,et al. A Unified Wasserstein Distributional Robustness Framework for Adversarial Training , 2022, ICLR.

[14] José Miguel Hernández-Lobato,et al. Missing Data Imputation and Acquisition with Deep Hierarchical Models and Hamiltonian Monte Carlo , 2022, NeurIPS.

[15] Zhiqi Bu,et al. Multiple Imputation via Generative Adversarial Network for High-dimensional Blockwise Missing Value Problems , 2021, 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA).

[16] Ramon Vinas,et al. A Graph-based Imputation Method for Sparse Medical Records , 2021, ArXiv.

[17] Mihaela van der Schaar,et al. MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms , 2021, NeurIPS.

[18] H. Zha,et al. Learning Prototype-oriented Set Representations for Meta-Learning , 2021, ICLR.

[19] Simon L. Peyton Jones,et al. Simultaneous Missing Value Imputation and Structure Learning with Groups , 2021, NeurIPS.

[20] Cong Xu,et al. Missingness Augmentation: A General Approach for Improving Generative Imputation Models , 2021, ArXiv.

[21] S. Ghosh,et al. EMFlow: Data Imputation in Latent Space via EM and Deep Flow Models , 2021, ArXiv.

[22] Zeming Li,et al. OTA: Optimal Transport Assignment for Object Detection , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Jiawei He,et al. Variational Selective Autoencoder: Learning from Partially-Observed Heterogeneous Data , 2021, AISTATS.

[24] Jure Leskovec,et al. Handling Missing Data with Graph Representation Learning , 2020, NeurIPS.

[25] Trung Le,et al. Neural Topic Model via Optimal Transport , 2020, ICLR.

[26] Sanghoon Sull,et al. GAMIN: Generative Adversarial Multiple Imputation Network for Highly Missing Data , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Ivan Kobyzev,et al. Normalizing Flows: An Introduction and Review of Current Methods , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28] Wencheng Wu,et al. McFlow: Monte Carlo Flow Models for Data Imputation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Chunhua Shen,et al. DeepEMD: Differentiable Earth Mover's Distance for Few-Shot Learning , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30] Julie Josse,et al. Missing Data Imputation using Optimal Transport , 2020, ICML.

[31] Eric Nalisnick,et al. Normalizing Flows for Probabilistic Modeling and Inference , 2019, J. Mach. Learn. Res..

[32] Nicholas J. Tierney,et al. R-miss-tastic: a unified platform for missing values methods and workflows , 2019, R J..

[33] Michael Tschannen,et al. On Mutual Information Maximization for Representation Learning , 2019, ICLR.

[34] R Devon Hjelm,et al. Learning Representations by Maximizing Mutual Information Across Views , 2019, NeurIPS.

[35] Marco Cuturi,et al. Computational Optimal Transport: With Applications to Data Science , 2019 .

[36] Jes Frellsen,et al. MIWAE: Deep Generative Modelling and Imputation of Incomplete Data Sets , 2019, ICML.

[37] Alexander A. Alemi,et al. On Variational Bounds of Mutual Information , 2019, ICML.

[38] Bo Jiang,et al. MisGAN: Learning from Incomplete Data with Generative Adversarial Networks , 2019, ICLR.

[39] Matthias Bethge,et al. Excessive Invariance Causes Adversarial Vulnerability , 2018, ICLR.

[40] Alain Trouvé,et al. Interpolating between Optimal Transport and MMD using Sinkhorn Divergences , 2018, AISTATS.