论文信息 - Sublinear Least-Squares Value Iteration via Locality Sensitive Hashing

Sublinear Least-Squares Value Iteration via Locality Sensitive Hashing

We present the first provable Least-Squares Value Iteration (LSVI) algorithms that achieves runtime complexity sublinear in the number of actions. We formulate the value function estimation procedure in value iteration as an approximate maximum inner product search problem and propose a locality sensitive hashing (LSH) [Indyk and Motwani STOC’98, Andoni and Razenshteyn STOC’15, Andoni, Laarhoven, Razenshteyn and Waingarten SODA’17] type data structure to solve this problem with sublinear time complexity. Moreover, we build the connections between the theory of approximate maximum inner product search and the regret analysis of reinforcement learning. We prove that, with our choice of approximation factor, our Sublinear LSVI algorithms maintain the same regret as the original LSVI algorithms while reducing the runtime complexity to sublinear in the number of actions. To the best of our knowledge, this is the first work that combines LSH with reinforcement learning resulting in provable improvements. We hope that our novel way of combining data structures and iterative algorithm will open the door for further study into cost reduction in optimization. ∗ anshumali@rice.edu. Rice University. † zhaos@ias.edu. Institute for Advanced Study, Princeton University. ‡ zx22@rice.edu. Rice University.

[1] Inderjit S. Dhillon,et al. Linear Bandit Algorithms with Sublinear Time Complexity , 2021, ArXiv.

[2] Lin F. Yang,et al. A Provably Efficient Algorithm for Linear Markov Decision Process with Low Switching Cost , 2021, ArXiv.

[3] Omri Weinstein,et al. Training (Overparametrized) Neural Networks in Near-Linear Time , 2021, ITCS.

[4] Parikshit Ram,et al. Maximum inner-product search using cone trees , 2012, KDD.

[5] Tobias Christiani,et al. A Framework for Similarity Search with Space-Time Tradeoffs using Locality-Sensitive Filtering , 2016, SODA.

[6] Chi Jin,et al. Provably Efficient Exploration in Policy Optimization , 2019, ICML.

[7] Simon S. Du,et al. Near-Optimal Randomized Exploration for Tabular Markov Decision Processes , 2021, NeurIPS.

[8] Omri Weinstein,et al. Faster Dynamic Matrix Inverse for Faster LPs , 2020, ArXiv.

[9] Anshumali Shrivastava,et al. Climbing the WOL: Training for Cheaper Inference , 2020, ArXiv.

[10] Cho-Jui Hsieh,et al. A Fast Sampling Algorithm for Maximum Inner Product Search , 2019, AISTATS.

[11] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[12] W. B. Johnson,et al. Extensions of Lipschitz mappings into Hilbert space , 1984 .

[13] Lijie Chen,et al. On The Hardness of Approximate and Exact (Bichromatic) Maximum Inner Product , 2018, Electron. Colloquium Comput. Complex..

[14] Ryan Williams,et al. An Equivalence Class for Orthogonal Vectors , 2018, SODA.

[15] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.

[16] Ruosong Wang,et al. Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning , 2020, NeurIPS.

[17] Ping Li,et al. Möbius Transformation for Fast Inner Product Search on Graph , 2019, NeurIPS.

[18] Yu Bai,et al. Provably Efficient Q-Learning with Low Switching Cost , 2019, NeurIPS.

[19] Ryan Williams,et al. On the Difference Between Closest, Furthest, and Orthogonal Pairs: Nearly-Linear vs Barely-Subquadratic Complexity , 2017, SODA.

[20] Ameya Velingker,et al. Scaling up Kernel Ridge Regression via Locality Sensitive Hashing , 2020, AISTATS.

[21] Moses Charikar,et al. Hashing-Based-Estimators for Kernel Density in High Dimensions , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[22] Shachar Lovett,et al. Bilinear Classes: A Structural Framework for Provable Generalization in RL , 2021, ICML.

[23] Zhao Song,et al. Solving tall dense linear programs in nearly linear time , 2020, STOC.

[24] Alexandr Andoni,et al. Data-dependent hashing via nonlinear spectral gaps , 2018, STOC.

[25] Nathan Srebro,et al. On Symmetric and Asymmetric LSHs for Inner Product Search , 2014, ICML.

[26] Nicholas Jing Yuan,et al. DRN: A Deep Reinforcement Learning Framework for News Recommendation , 2018, WWW.

[27] Alexandr Andoni,et al. LSH Forest: Practical Algorithms Made Theoretical , 2017, SODA.

[28] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[29] Piotr Indyk,et al. Efficient Density Evaluation for Smooth Kernels , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[30] Philip Levis,et al. Rehashing Kernel Evaluation in High Dimensions , 2019, ICML.

[31] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[32] Alexandr Andoni,et al. Optimal Data-Dependent Hashing for Approximate Near Neighbors , 2015, STOC.

[33] Sally Dong,et al. A nearly-linear time algorithm for linear programs with small treewidth: a multiscale representation of robust central path , 2021, STOC.

[34] Zhao Song,et al. Efficient Model-free Reinforcement Learning in Metric Spaces , 2019, ArXiv.

[35] Jinfeng Li,et al. Norm-Ranging LSH for Maximum Inner Product Search , 2018, NeurIPS.

[36] Moses Charikar,et al. Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[37] Ilya P. Razenshteyn. High-dimensional similarity search and sketching: algorithms and hardness , 2017 .

[38] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[39] Ruosong Wang,et al. Planning with General Objective Functions: Going Beyond Total Rewards , 2020, NeurIPS.

[40] Piotr Indyk,et al. Learning Space Partitions for Nearest Neighbor Search , 2019, ICLR.

[41] Russell Impagliazzo,et al. Complexity of k-SAT , 1999, Proceedings. Fourteenth Annual IEEE Conference on Computational Complexity (Formerly: Structure in Complexity Theory Conference) (Cat.No.99CB36317).

[42] Rasmus Pagh. Locality-sensitive Hashing without False Negatives , 2016, SODA.

[43] Ping Li,et al. Asymmetric Minwise Hashing for Indexing Binary Inner Products and Set Containment , 2015, WWW.

[44] Jan van den Brand. A Deterministic Linear Program Solver in Current Matrix Multiplication Time , 2020, SODA.

[45] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[46] Ping Li,et al. Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) , 2014, NIPS.

[47] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[48] Ryan Williams,et al. A new algorithm for optimal 2-constraint satisfaction and its implications , 2005, Theor. Comput. Sci..

[49] Yin Tat Lee,et al. Solving linear programs in the current matrix multiplication time , 2018, STOC.

[50] Zhe Wang,et al. Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[51] Yin Tat Lee,et al. An improved cutting plane method for convex optimization, convex-concave games, and its applications , 2020, STOC.

[52] Vasileios Nakos,et al. (Nearly) Sample-Optimal Sparse Fourier Transform in Any Dimension; RIPless and Filterless , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[53] Ruosong Wang,et al. Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning? , 2020, ICLR.

[54] Ping Li,et al. Improved Asymmetric Locality Sensitive Hashing (ALSH) for Maximum Inner Product Search (MIPS) , 2014, UAI.

[55] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.

[56] Mengdi Wang,et al. Reinforcement Leaning in Feature Space: Matrix Bandit, Kernels, and Regret Bound , 2019, ICML.

[57] Alexandr Andoni,et al. Approximate near neighbors for general symmetric norms , 2016, STOC.

[58] Anshumali Shrivastava,et al. Lsh-Sampling breaks the Computational chicken-and-egg Loop in adaptive stochastic Gradient estimation , 2018, ICLR.

[59] Richard Ryan Williams,et al. Distributed PCP Theorems for Hardness of Approximation in P , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[60] Anshumali Shrivastava,et al. SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems , 2019, MLSys.

[61] Yin Tat Lee,et al. Solving Empirical Risk Minimization in the Current Matrix Multiplication Time , 2019, COLT.

[62] Jianfeng Gao,et al. Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[63] Sanjiv Kumar,et al. Quantization based Fast Inner Product Search , 2015, AISTATS.

[64] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[65] François Le Gall,et al. Powers of tensors and fast matrix multiplication , 2014, ISSAC.

[66] Paris Siminelakis,et al. Kernel Density Estimation through Density Constrained Near Neighbor Search , 2020, 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS).

[67] Alexandr Andoni,et al. Practical and Optimal LSH for Angular Distance , 2015, NIPS.

[68] Roman Vershynin,et al. Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[69] Trevor Darrell,et al. Nearest-Neighbor Methods in Learning and Vision , 2008, IEEE Trans. Neural Networks.

[70] Nicole Immorlica,et al. Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[71] Sanjiv Kumar,et al. Accelerating Large-Scale Inference with Anisotropic Vector Quantization , 2019, ICML.

[72] Francisco S. Melo,et al. Q -Learning with Linear Function Approximation , 2007, COLT.

[73] Alexander Wei,et al. Optimal Las Vegas Approximate Near Neighbors in ℓp , 2018, SODA.

[74] Xiangyang Ji,et al. Almost Optimal Model-Free Reinforcement Learning via Reference-Advantage Decomposition , 2020, NeurIPS.

[75] Ping Li,et al. On Efficient Retrieval of Top Similarity Vectors , 2019, EMNLP.

[76] Alexandr Andoni,et al. Approximate Nearest Neighbor Search in High Dimensions , 2018, Proceedings of the International Congress of Mathematicians (ICM 2018).

[77] Chen Luo,et al. Scaling-up Split-Merge MCMC with Locality Sensitive Sampling (LSS) , 2018, AAAI.

[78] Anshumali Shrivastava,et al. Accelerating SLIDE Deep Learning on Modern CPUs: Vectorization, Quantizations, Memory Optimizations, and More , 2021, MLSys.

[79] Inderjit S. Dhillon,et al. A Greedy Approach for Budgeted Maximum Inner Product Search , 2016, NIPS.

[80] Richard Peng,et al. Bipartite Matching in Nearly-linear Time on Moderately Dense Graphs , 2020, 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS).

[81] Oblivious Sketching-based Central Path Method for Solving Linear Programming Problems , 2020 .

[82] Ruosong Wang,et al. On Reward-Free Reinforcement Learning with Linear Function Approximation , 2020, NeurIPS.

[83] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[84] Piotr Indyk,et al. Space and Time Efficient Kernel Density Estimation in High Dimensions , 2019, NeurIPS.

[85] Alexandr Andoni,et al. Nearest neighbor search : the old, the new, and the impossible , 2009 .

[86] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[87] Alexandr Andoni,et al. Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[88] Alexandr Andoni,et al. Optimal Hashing-based Time-Space Trade-offs for Approximate Near Neighbors , 2016, SODA.

[89] Virginia Vassilevska Williams,et al. Multiplying matrices faster than coppersmith-winograd , 2012, STOC '12.

[90] David P. Woodruff,et al. A Framework for Adversarially Robust Streaming Algorithms , 2020, SIGMOD Rec..

[91] Alexandr Andoni,et al. Hölder Homeomorphisms and Approximate Nearest Neighbors , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[92] Tri Dao,et al. MONGOOSE: A Learnable LSH Framework for Efficient Neural Network Training , 2021, ICLR.

[93] Artem Babenko,et al. Non-metric Similarity Graphs for Maximum Inner Product Search , 2018, NeurIPS.