论文信息 - Linear Regression With Shuffled Data: Statistical and Computational Limits of Permutation Recovery

Linear Regression With Shuffled Data: Statistical and Computational Limits of Permutation Recovery

Consider a noisy linear observation model with an unknown permutation, based on observing <inline-formula> <tex-math notation="LaTeX">$y = \Pi ^{*} A x^{*} + w$ </tex-math></inline-formula>, where <inline-formula> <tex-math notation="LaTeX">$x^{*} \in {\mathbb {R}} ^{d}$ </tex-math></inline-formula> is an unknown vector, <inline-formula> <tex-math notation="LaTeX">$\Pi ^{*}$ </tex-math></inline-formula> is an unknown <inline-formula> <tex-math notation="LaTeX">$n \times n$ </tex-math></inline-formula> permutation matrix, and <inline-formula> <tex-math notation="LaTeX">$w \in {\mathbb {R}} ^{n}$ </tex-math></inline-formula> is additive Gaussian noise. We analyze the problem of permutation recovery in a random design setting in which the entries of matrix <inline-formula> <tex-math notation="LaTeX">$A$ </tex-math></inline-formula> are drawn independently from a standard Gaussian distribution and establish sharp conditions on the signal-to-noise ratio, sample size <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula>, and dimension <inline-formula> <tex-math notation="LaTeX">$d$ </tex-math></inline-formula> under which <inline-formula> <tex-math notation="LaTeX">$\Pi ^{*}$ </tex-math></inline-formula> is exactly and approximately recoverable. On the computational front, we show that the maximum likelihood estimate of <inline-formula> <tex-math notation="LaTeX">$\Pi ^{*}$ </tex-math></inline-formula> is NP-hard to compute for general <inline-formula> <tex-math notation="LaTeX">$d$ </tex-math></inline-formula>, while also providing a polynomial time algorithm when <inline-formula> <tex-math notation="LaTeX">$d =1$ </tex-math></inline-formula>.

[1] Vincent Gripon,et al. Reconstructing a graph from path traces , 2013, 2013 IEEE International Symposium on Information Theory.

[2] João Paulo Costeira,et al. Subspace matching: Unique solution to point matching with geometric constraints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[3] Xiaorui Sun,et al. Linear regression without correspondence , 2017, NIPS.

[4] Philip David,et al. SoftPOSIT: Simultaneous Pose and Correspondence Determination , 2002, ECCV.

[5] Martin Vetterli,et al. Unlabeled sensing: Reconstruction algorithm and theoretical guarantees , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6] Po-Ling Loh,et al. Corrupted and missing predictors: Minimax bounds for high-dimensional linear regression , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[7] W. S. Robinson. A Method for Chronologically Ordering Archaeological Deposits , 1951, American Antiquity.

[8] Thomas Blumensath,et al. Sampling and Reconstructing Signals From a Union of Linear Subspaces , 2009, IEEE Transactions on Information Theory.

[9] V.,et al. On the Problem of Time Jitter in Sampling * , 1998 .

[10] Leonidas J. Guibas,et al. Fourier Theoretic Probabilistic Inference over Permutations , 2009, J. Mach. Learn. Res..

[11] Martin J. Wainwright,et al. Worst-case vs Average-case Design for Estimation from Fixed Pairwise Comparisons , 2017, ArXiv.

[12] C. Shannon. Probability of error for optimal codes in a Gaussian channel , 1959 .

[13] Martin J. Wainwright,et al. Denoising linear models with permuted data , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[14] Giuseppe Caire,et al. Signal recovery from unlabeled samples , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[15] Alexandre d'Aspremont,et al. Convex Relaxations for Permutation Problems , 2013, SIAM J. Matrix Anal. Appl..

[16] X. Huang,et al. CAP3: A DNA sequence assembly program. , 1999, Genome research.

[17] Vitaly Shmatikov,et al. Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[18] Minh N. Do,et al. A Theory for Sampling Signals from a Union of Subspaces , 2022 .

[19] Martin J. Wainwright,et al. Information-Theoretic Limits on Sparsity Recovery in the High-Dimensional and Noisy Setting , 2007, IEEE Transactions on Information Theory.

[20] Martin J. Wainwright,et al. Linear regression with an unknown permutation: Statistical and computational limits , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[21] Arnak S. Dalalyan,et al. Minimax Rates in Permutation Estimation for Feature Matching , 2013, J. Mach. Learn. Res..

[22] Martin Vetterli,et al. Unlabeled Sensing With Random Linear Measurements , 2015, IEEE Transactions on Information Theory.

[23] Robert D. Nowak,et al. Network Inference From Co-Occurrences , 2006, IEEE Transactions on Information Theory.

[24] Nair Maria Maia de Abreu,et al. A survey for the quadratic assignment problem , 2007, Eur. J. Oper. Res..

[25] Rémi Gribonval,et al. Compressed sensing with unknown sensor permutation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26] Andrew Vince. A rearrangement inequality and the permutahedron , 1990 .

[27] Kenneth Steiglitz,et al. Combinatorial Optimization: Algorithms and Complexity , 1981 .

[28] P. Rigollet,et al. Optimal rates of statistical seriation , 2016, Bernoulli.

[29] James Y. Zou,et al. Linear Regression with Shuffled Labels , 2017, 1705.01342.

[30] Martin J. Wainwright,et al. Stochastically Transitive Models for Pairwise Comparisons: Statistical and Computational Issues , 2015, IEEE Transactions on Information Theory.

[31] Aubrey B. Poore,et al. Some assignment problems arising from multiple target tracking , 2006, Math. Comput. Model..

[32] Cyrill Stachniss,et al. Simultaneous Localization and Mapping , 2016, Springer Handbook of Robotics, 2nd Ed..

[33] Sanjoy Dasgupta,et al. An elementary proof of a theorem of Johnson and Lindenstrauss , 2003, Random Struct. Algorithms.

[34] Suhas N. Diggavi,et al. Identity Aware Sensor Networks , 2009, IEEE INFOCOM 2009.

[35] S. Chatterjee,et al. Matrix estimation by Universal Singular Value Thresholding , 2012, 1212.1247.

[36] M. Luby,et al. Asymptotically Good Codes Correcting Insertions, Deletions, and Transpositions , 1999 .

[37] Emmanuel J. Candès,et al. Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? , 2004, IEEE Transactions on Information Theory.

[38] Christopher Rose,et al. Timing Channels with Multiple Identical Quanta , 2012, ArXiv.

[39] Ken-ichi Yoshihara,et al. Simple proofs for the strong converse theorems in some channels , 1964 .