Solving Continual Combinatorial Selection via Deep Reinforcement Learning

We consider the Markov Decision Process (MDP) of selecting a subset of items at each step, termed the Select-MDP (S-MDP). The large state and action spaces of S-MDPs make them intractable to solve with typical reinforcement learning (RL) algorithms especially when the number of items is huge. In this paper, we present a deep RL algorithm to solve this issue by adopting the following key ideas. First, we convert the original S-MDP into an Iterative Select-MDP (IS-MDP), which is equivalent to the S-MDP in terms of optimal actions. IS-MDP decomposes a joint action of selecting K items simultaneously into K iterative selections resulting in the decrease of actions at the expense of an exponential increase of states. Second, we overcome this state space explo-sion by exploiting a special symmetry in IS-MDPs with novel weight shared Q-networks, which prov-ably maintain sufficient expressive power. Various experiments demonstrate that our approach works well even when the item space is large and that it scales to environments with item spaces different from those used in training.

[1]  Surya Ganguli,et al.  Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.

[2]  Nicolas Usunier,et al.  Episodic Exploration for Deep Deterministic Policies for StarCraft Micromanagement , 2016, ICLR.

[3]  Shuai Li,et al.  Contextual Combinatorial Cascading Bandits , 2016, ICML.

[4]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[6]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[7]  Thomas Laurent,et al.  Deep Linear Networks with Arbitrary Loss: All Local Minima Are Global , 2017, ICML.

[8]  Barnabás Póczos,et al.  Equivariance Through Parameter-Sharing , 2017, ICML.

[9]  Max Welling,et al.  Attention, Learn to Solve Routing Problems! , 2018, ICLR.

[10]  Yaron Lipman,et al.  On the Universality of Invariant Networks , 2019, ICML.

[11]  Xiaoyan Zhu,et al.  Contextual Combinatorial Bandit and its Application on Diversified Online Recommendation , 2014, SDM.

[12]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[13]  Lior Rokach,et al.  Recommender Systems: Introduction and Challenges , 2015, Recommender Systems Handbook.

[14]  Dmitry Yarotsky,et al.  Universal Approximations of Invariant Maps by Neural Networks , 2018, Constructive Approximation.

[15]  Tucker R. Balch,et al.  Symmetry in Markov Decision Processes and its Implications for Single Agent and Multiagent Learning , 2001, ICML.

[16]  Nicholas Jing Yuan,et al.  DRN: A Deep Reinforcement Learning Framework for News Recommendation , 2018, WWW.

[17]  Kenji Kawaguchi,et al.  Deep Learning without Poor Local Minima , 2016, NIPS.

[18]  Philip A. Whiting,et al.  Convergence of proportional-fair sharing algorithms under general conditions , 2004, IEEE Transactions on Wireless Communications.

[19]  Yuan Qi,et al.  Generative Adversarial User Model for Reinforcement Learning Based Recommendation System , 2018, ICML.

[20]  Barnabás Póczos,et al.  Deep Learning with Sets and Point Clouds , 2016, ICLR.

[21]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[22]  Alexander J. Smola,et al.  Deep Sets , 2017, 1703.06114.

[23]  D. Costarelli,et al.  Constructive Approximation by Superposition of Sigmoidal Functions , 2013 .

[24]  Yuan Qi,et al.  Neural Model-Based Reinforcement Learning for Recommendation , 2018, ArXiv.

[25]  Kevin Leyton-Brown,et al.  Deep Models of Interactions Across Sets , 2018, ICML.

[26]  Le Song,et al.  2 Common Formulation for Greedy Algorithms on Graphs , 2018 .

[27]  Ludovic Denoyer,et al.  Structured prediction with reinforcement learning , 2009, Machine Learning.

[28]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..