论文信息 - Action Assembly: Sparse Imitation Learning for Text Based Games with Combinatorial Action Spaces

Action Assembly: Sparse Imitation Learning for Text Based Games with Combinatorial Action Spaces

We propose a computationally efficient algorithm that combines compressed sensing with imitation learning to solve text-based games with combinatorial action spaces. Specifically, we introduce a new compressed sensing algorithm, named IK-OMP, which can be seen as an extension to the Orthogonal Matching Pursuit (OMP). We incorporate IK-OMP into a supervised imitation learning setting and show that the combined approach (Sparse Imitation Learning, Sparse-IL) solves the entire text-based game of Zork1 with an action space of approximately 10 million actions given both perfect and noisy demonstrations.

[1] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[2] Gitta Kutyniok,et al. Compressed Sensing for Finite-Valued Signals , 2016, 1609.09450.

[3] Mohammed Bennamoun,et al. Generating Bags of Words from the Sums of Their Word Embeddings , 2016, CICLing.

[4] Mike E. Davies,et al. Gradient Pursuits , 2008, IEEE Transactions on Signal Processing.

[5] Michael Elad,et al. Sparse and Redundant Representations - From Theory to Applications in Signal and Image Processing , 2010 .

[6] Michael Elad,et al. Stable recovery of sparse overcomplete representations in the presence of noise , 2006, IEEE Transactions on Information Theory.

[7] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[8] Mikael Skoglund,et al. Look ahead orthogonal matching pursuit , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9] Marc Teboulle,et al. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[10] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[11] Michael Elad,et al. Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[12] R. Calderbank,et al. Compressed Learning : Universal Sparse Dimensionality Reduction and Learning in the Measurement Domain , 2009 .

[13] Regina Barzilay,et al. Language Understanding for Text-based Games using Deep Reinforcement Learning , 2015, EMNLP.

[14] Robert F. H. Fischer,et al. Soft-feedback OMP for the recovery of discrete-valued sparse signals , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[15] Mikhail Khodak,et al. A Compressed Sensing View of Unsupervised Text Embeddings, Bag-of-n-Grams, and LSTMs , 2018, ICLR.

[16] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[17] Emmanuel J. Candès,et al. Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[18] Romain Laroche,et al. Counting to Explore and Generalize in Text-based Games , 2018, ArXiv.

[19] Michael Elad,et al. A Plurality of Sparse Representations Is Better Than the Sparsest One Alone , 2009, IEEE Transactions on Information Theory.

[20] Shang-Ho Tsai,et al. A K-best orthogonal matching pursuit for compressive sensing , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21] Robert F. H. Fischer,et al. MMSE-based version of OMP for recovery of discrete-valued sparse signals , 2016 .

[22] Shie Mannor,et al. Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning , 2018, NeurIPS.

[23] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[24] John O. Greene. Action Assembly Theory , 2015 .

[25] Mikulás Zelinka. Using reinforcement learning to learn how to play text-based games , 2018, ArXiv.

[26] Yonina C. Eldar,et al. Spatial Compressive Sensing for MIMO Radar , 2013, IEEE Transactions on Signal Processing.

[27] Arkadi Nemirovski,et al. Lectures on modern convex optimization - analysis, algorithms, and engineering applications , 2001, MPS-SIAM series on optimization.

[28] E. Candès,et al. Error correction via linear programming , 2005, FOCS 2005.

[29] Zheng Wen,et al. Optimal Demand Response Using Device-Based Reinforcement Learning , 2014, IEEE Transactions on Smart Grid.

[30] John O. Greene. A cognitive approach to human communication: An action assembly theory , 1984 .

[31] Matthew J. Hausknecht,et al. TextWorld: A Learning Environment for Text-based Games , 2018, CGW@IJCAI.

[32] Erik G. Larsson,et al. Spectrum Sensing for Cognitive Radio : State-of-the-Art and Recent Advances , 2012, IEEE Signal Processing Magazine.

[33] Mike E. Davies,et al. Iterative Hard Thresholding for Compressed Sensing , 2008, ArXiv.

[34] Joelle Pineau,et al. An Actor-Critic Algorithm for Sequence Prediction , 2016, ICLR.

[35] Davy Preuveneers,et al. The intelligent industry of the future: A survey on emerging trends, research challenges and opportunities in Industry 4.0 , 2017, J. Ambient Intell. Smart Environ..

[36] Michael A. Saunders,et al. Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[37] Laura Rebollo-Neira,et al. A swapping-based refinement of orthogonal matching pursuit strategies , 2006, Signal Process..

[38] Marc'Aurelio Ranzato,et al. Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[39] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[40] Jianfeng Gao,et al. Deep Reinforcement Learning with a Natural Language Action Space , 2015, ACL.

[41] Richard Evans,et al. Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.

[42] Marc-Alexandre Côté,et al. Towards Solving Text-based Games by Producing Adaptive Action Spaces , 2018, ArXiv.

[43] Gitta Kutyniok,et al. 1 . 2 Sparsity : A Reasonable Assumption ? , 2012 .