Permuted and Unlinked Monotone Regression in $\mathbb{R}^d$: an approach based on mixture modeling and optimal transport

Suppose that we have a regression problem with response variable Y ∈ R and predictor X ∈ R, for d ≥ 1. In permuted or unlinked regression we have access to separate unordered data on X and Y , as opposed to data on (X,Y )-pairs in usual regression. So far in the literature the case d = 1 has received attention, see e.g., the recent papers by Rigollet and Weed [Information & Inference, 8, 619–717] and Balabdaoui et al. [J. Mach. Learn. Res., 22(172), 1–60]. In this paper, we consider the general multivariate setting with d ≥ 1. We show that the notion of cyclical monotonicity of the regression function is sufficient for identification and estimation in the permuted/unlinked regression model. We study permutation recovery in the permuted regression setting and develop a computationally efficient and easy-to-use algorithm for denoising based on the Kiefer-Wolfowitz [Ann. Math. Statist., 27, 887–906] nonparametric maximum likelihood estimator and techniques from the theory of optimal transport. We provide explicit upper bounds on the associated mean squared denoising error for Gaussian noise. As in previous work on the case d = 1, the permuted/unlinked setting involves slow (logarithmic) rates of convergence rooting in the underlying deconvolution problem. Numerical studies corroborate our theoretical analysis and show that the proposed approach performs at least on par with the methods in the aforementioned prior work in the case d = 1 while achieving substantial reductions in terms of computational complexity.

[1]  Ping Li,et al.  A Two-Stage Approach to Multivariate Linear Regression with Sparsely Mismatched Data , 2019, ArXiv.

[2]  B. Lindsay The Geometry of Mixture Likelihoods: A General Theory , 1983 .

[3]  Yuanming Shi,et al.  An Algebraic-Geometric Approach to Shuffled Linear Regression , 2018, ArXiv.

[4]  Prem K. Goel,et al.  Estimation of the Correlation Coefficient from a Broken Random Sample , 1980 .

[5]  B. Sen,et al.  Multivariate Ranks and Quantiles using Optimal Transportation and Applications to Goodness-of-fit Testing , 2019 .

[6]  R. Koenker,et al.  CONVEX OPTIMIZATION, SHAPE CONSTRAINTS, COMPOUND DECISIONS, AND EMPIRICAL BAYES RULES , 2013 .

[7]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[8]  Guoqing Diao,et al.  A Pseudo-Likelihood Approach to Linear Regression With Partially Shuffled Data , 2019, J. Comput. Graph. Stat..

[9]  Martin Vetterli,et al.  Unlabeled Sensing With Random Linear Measurements , 2015, IEEE Transactions on Information Theory.

[10]  R. Rockafellar Characterization of the subdifferentials of convex functions , 1966 .

[11]  Hongzhe Li,et al.  Optimal Permutation Recovery in Permuted Monotone Matrix Model , 2019, Journal of the American Statistical Association.

[12]  Jianqing Fan On the Optimal Rates of Convergence for Nonparametric Deconvolution Problems , 1991 .

[13]  Adityanand Guntuboyina,et al.  On the nonparametric maximum likelihood estimator for Gaussian location mixture densities with application to Gaussian denoising , 2017, The Annals of Statistics.

[14]  Peter Hall,et al.  Estimation of distributions, moments and quantiles in deconvolution problems , 2008, 0810.4821.

[15]  Fritz Scheuren,et al.  Regression Analysis of Data Files that Are Computer Matched , 1993 .

[16]  Xiaorui Sun,et al.  Linear regression without correspondence , 2017, NIPS.

[17]  K. Brown,et al.  Graduate Texts in Mathematics , 1982 .

[18]  Manolis C. Tsakiris,et al.  Homomorphic Sensing , 2019, ICML.

[19]  Sivaraman Balakrishnan,et al.  Plugin Estimation of Smooth Optimal Transport Maps , 2021 .

[20]  Wenhua Jiang,et al.  General maximum likelihood empirical Bayes estimation of normal means , 2009, 0908.1709.

[21]  Martin J. Wainwright,et al.  Denoising linear models with permuted data , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[22]  T. Cai,et al.  Optimal estimation of bacterial growth rates based on a permuted monotone matrix , 2020 .

[23]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[24]  A. Meister Deconvolution Problems in Nonparametric Statistics , 2009 .

[25]  Nestor Guillen,et al.  Five lectures on optimal transportation: Geometry, regularity and applications , 2010, 1011.2911.

[26]  Innar Liiv,et al.  Seriation and matrix reordering methods: An historical overview , 2010, Stat. Anal. Data Min..

[27]  S. Nickel,et al.  IBM ILOG CPLEX Optimization Studio , 2020 .

[28]  Alexandre d'Aspremont,et al.  Regularity as Regularization: Smooth and Strongly Convex Brenier Potentials in Optimal Transport , 2019, AISTATS.

[29]  P. Rigollet,et al.  Optimal rates of statistical seriation , 2016, Bernoulli.

[30]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[31]  Bodhisattva Sen,et al.  Rates of Estimation of Optimal Transport Maps using Plug-in Estimators via Barycentric Projections , 2021, NeurIPS.

[32]  S. Kakade,et al.  On the duality of strong convexity and strong smoothness : Learning applications and matrix regularization , 2009 .

[33]  J. Kiefer,et al.  CONSISTENCY OF THE MAXIMUM LIKELIHOOD ESTIMATOR IN THE PRESENCE OF INFINITELY MANY INCIDENTAL PARAMETERS , 1956 .

[34]  Philippe Rigollet,et al.  Minimax estimation of smooth optimal transport maps , 2021 .

[35]  Alexandra Carpentier,et al.  Learning Relationships between Data Obtained Independently , 2016, AISTATS.

[36]  Peter Christen,et al.  Data Matching , 2012, Data-Centric Systems and Applications.

[37]  Ping Li,et al.  The Benefits of Diversity: Permutation Recovery in Unlabeled Sensing From Multiple Measurement Vectors , 2022, IEEE Transactions on Information Theory.

[38]  William E. Winkler,et al.  Matching and record linkage , 2011 .

[39]  Jean-Philippe Vert,et al.  Differentiable Ranking and Sorting using Optimal Transport , 2019, NeurIPS.

[40]  Sihai Dave Zhao,et al.  High-dimensional classification via nonparametric empirical Bayes and maximum likelihood inference , 2016 .

[41]  James Y. Zou,et al.  Linear Regression with Shuffled Labels , 2017, 1705.01342.

[42]  Enno Mammen,et al.  Uncoupled Isotonic Regression with Discrete Errors , 2021, Advances in Contemporary Statistics and Econometrics.

[43]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[44]  Hang Zhang,et al.  Optimal Estimator for Unlabeled Linear Regression , 2020, ICML.

[45]  P. Lahiri,et al.  Regression Analysis With Linked Data , 2005 .

[46]  Arnak S. Dalalyan,et al.  Minimax Rates in Permutation Estimation for Feature Matching , 2013, J. Mach. Learn. Res..

[47]  Andrew Zisserman,et al.  Multiple View Geometry in Computer Vision (2nd ed) , 2003 .

[48]  Edouard Grave,et al.  Unsupervised Alignment of Embeddings with Wasserstein Procrustes , 2018, AISTATS.

[49]  Lénaïc Chizat,et al.  Faster Wasserstein Distance Estimation with the Sinkhorn Divergence , 2020, NeurIPS.

[50]  C. Villani Optimal Transport: Old and New , 2008 .

[51]  P. Rigollet,et al.  Uncoupled isotonic regression via minimum Wasserstein deconvolution , 2018, Information and Inference: A Journal of the IMA.

[52]  I. Dattner,et al.  ON DECONVOLUTION OF DISTRIBUTION FUNCTIONS , 2010, 1006.3918.

[53]  Unlinked Monotone Regression , 2020, J. Mach. Learn. Res..

[54]  Dimitri P. Bertsekas,et al.  A forward/reverse auction algorithm for asymmetric assignment problems , 1992, Comput. Optim. Appl..

[55]  P. Rigollet,et al.  Entropic optimal transport is maximum-likelihood deconvolution , 2018, Comptes Rendus Mathematique.

[56]  The broken sample problem , 2005 .

[57]  Marco Cuturi,et al.  Computational Optimal Transport: With Applications to Data Science , 2019 .

[58]  Manolis C. Tsakiris,et al.  Homomorphic Sensing: Sparsity and Noise , 2021, ICML.

[59]  William E. Winkler,et al.  Data quality and record linkage techniques , 2007 .

[60]  Katherine Clark Matchmaking , 2000, Science.

[61]  Filippo Santambrogio,et al.  Optimal Transport for Applied Mathematicians , 2015 .

[62]  Latanya Sweeney,et al.  Computational disclosure control: a primer on data privacy protection , 2001 .

[63]  X. Nguyen Convergence of latent mixing measures in finite and infinite mixture models , 2011, 1109.3250.

[64]  Cun-Hui Zhang,et al.  GENERALIZED MAXIMUM LIKELIHOOD ESTIMATION OF NORMAL MIXTURE DENSITIES , 2009 .

[65]  M. Slawski,et al.  Linear regression with sparsely permuted data , 2017, Electronic Journal of Statistics.

[66]  Martin J. Wainwright,et al.  Linear Regression With Shuffled Data: Statistical and Computational Limits of Permutation Recovery , 2018, IEEE Transactions on Information Theory.

[67]  Aad van der Vaart,et al.  Posterior contraction rates for deconvolution of Dirichlet-Laplace mixtures , 2016 .

[68]  Tianxi Cai,et al.  Spherical Regression Under Mismatch Corruption With Application to Automated Knowledge Translation , 2018, Journal of the American Statistical Association.