Recovering Data Permutations From Noisy Observations: The Linear Regime

This article considers a noisy data structure recovery problem. The goal is to investigate the following question: given a noisy observation of a permuted data set, according to which permutation was the original data sorted? The focus is on scenarios where data is generated according to an isotropic Gaussian distribution, and the noise is additive Gaussian with an arbitrary covariance matrix. This problem is posed within a hypothesis testing framework. The objective is to study the linear regime in which the optimal decoder has a polynomial complexity in the data size, and it declares the permutation by simply computing a permutation-independent linear function of the noisy observations. The main result of this article is a complete characterization of the linear regime in terms of the noise covariance matrix. Specifically, it is shown that this matrix must have a very flat spectrum with at most three distinct eigenvalues to induce the linear regime. Several practically relevant implications of this result are discussed, and the error probability incurred by the decision criterion in the linear regime is also characterized. A core technical component consists of using linear algebraic and geometric tools, such as Steiner symmetrization.

[1]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[2]  Jean Bourgain,et al.  Estimates related to steiner symmetrizations , 1989 .

[3]  S. Kay Fundamentals of statistical signal processing: estimation theory , 1993 .

[4]  W. Clem Karl,et al.  Reconstructing Ellipsoids from Projections , 1994, CVGIP Graph. Model. Image Process..

[5]  Fuzhen Zhang The Schur complement and its applications , 2005 .

[6]  J. Alonso,et al.  Convex and Discrete Geometry , 2009 .

[7]  Daniel A. Klain Steiner symmetrization using a finite set of directions , 2012, Adv. Appl. Math..

[8]  Rémi Gribonval,et al.  Compressed sensing with unknown sensor permutation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Arnak S. Dalalyan,et al.  Minimax Rates in Permutation Estimation for Feature Matching , 2013, J. Mach. Learn. Res..

[10]  Xiaorui Sun,et al.  Linear regression without correspondence , 2017, NIPS.

[11]  Martin J. Wainwright,et al.  Denoising linear models with permuted data , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[12]  James Y. Zou,et al.  Linear Regression with Shuffled Labels , 2017, 1705.01342.

[13]  Abubakar Abid,et al.  A Stochastic Expectation-Maximization Approach to Shuffled Linear Regression , 2018, 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[14]  Martin J. Wainwright,et al.  Linear Regression With Shuffled Data: Statistical and Computational Limits of Permutation Recovery , 2018, IEEE Transactions on Information Theory.

[15]  Giuseppe Caire,et al.  Signal Recovery From Unlabeled Samples , 2017, IEEE Transactions on Signal Processing.

[16]  Martin Vetterli,et al.  Unlabeled Sensing With Random Linear Measurements , 2015, IEEE Transactions on Information Theory.

[17]  Rick S. Blum,et al.  Signal Amplitude Estimation and Detection From Unlabeled Binary Quantized Samples , 2018, IEEE Transactions on Signal Processing.

[18]  Manolis C. Tsakiris Eigenspace conditions for homomorphic sensing , 2018, ArXiv.

[19]  Ping Li,et al.  Permutation Recovery from Multiple Measurement Vectors in Unlabeled Sensing , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[20]  P. Rigollet,et al.  Uncoupled isotonic regression via minimum Wasserstein deconvolution , 2018, Information and Inference: A Journal of the IMA.

[21]  Manolis C. Tsakiris,et al.  Homomorphic Sensing , 2019, ICML.

[22]  Ivan Dokmanić,et al.  Permutations Unlabeled Beyond Sampling Unknown , 2018, IEEE Signal Processing Letters.

[23]  P. Rigollet,et al.  Optimal rates of statistical seriation , 2016, Bernoulli.

[24]  Ping Li,et al.  A Sparse Representation-Based Approach to Linear Regression with Partially Shuffled Labels , 2019, UAI.

[25]  Samer S. Saab,et al.  Shuffled Linear Regression with Erroneous Observations , 2019, 2019 53rd Annual Conference on Information Sciences and Systems (CISS).

[26]  M. Slawski,et al.  Linear regression with sparsely permuted data , 2017, Electronic Journal of Statistics.

[27]  Peter K. Willett,et al.  Algorithms and Fundamental Limits for Unlabeled Detection Using Types , 2018, IEEE Transactions on Signal Processing.

[28]  H. Vincent Poor,et al.  On Estimation under Noisy Order Statistics , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[29]  Martina Cardone,et al.  Recovering Structure of Noisy Data through Hypothesis Testing , 2020, 2020 IEEE International Symposium on Information Theory (ISIT).

[30]  Manolis C. Tsakiris,et al.  Determinantal conditions for homomorphic sensing , 2020, Linear Algebra and its Applications.

[31]  Tianxi Cai,et al.  Spherical Regression Under Mismatch Corruption With Application to Automated Knowledge Translation , 2018, Journal of the American Statistical Association.

[32]  T. Tony Cai,et al.  Optimal Permutation Recovery in Permuted Monotone Matrix Model , 2019, Journal of the American Statistical Association.