Multi-agent Q-learning and regression trees for automated pricing decisions

We study the use of the reinforcement learning algorithm Q-learning with regression tree function approximation to learn pricing strategies in a competitive marketplace of economic software agents. Q-learning is an algorithm for learning to estimate the long-term expected reward for a given state-action pair. In the case of a stationary environment with a lookup table representing the Q-function, the learning procedure is guaranteed to converge to an optimal policy. However, utilizing Q-learning in multi-agent systems presents special challenges. The simultaneous adaptation of multiple agents creates a non-stationary environment for each agent, hence there are no theoretical guarantees of convergence or optimality. Also, large multi-agent systems may have state spaces too large to represent with lookup tables, necessitating the use of function approximation.