On the Properties of the Softmax Function with Application in Game Theory and Reinforcement Learning

In this paper, we utilize results from convex analysis and monotone operator theory to derive additional properties of the softmax function that have not yet been covered in the existing literature. In particular, we show that the softmax function is the monotone gradient map of the log-sum-exp function. By exploiting this connection, we show that the inverse temperature parameter determines the Lipschitz and co-coercivity properties of the softmax function. We then demonstrate the usefulness of these properties through an application in game-theoretic reinforcement learning.

[1]  R. Luce,et al.  Individual Choice Behavior: A Theoretical Analysis. , 1960 .

[2]  John L. Wyatt,et al.  The Softmax Nonlinearity: Derivation Using Statistical Mechanics and Useful Properties as a Multiterminal Analog Circuit Element , 1993, NIPS.

[3]  Jörgen W. Weibull,et al.  Evolutionary Game Theory , 1996 .

[4]  Josef Hofbauer,et al.  Evolutionary Games and Population Dynamics , 1998 .

[5]  Alan L. Yuille,et al.  Winner-take-all mechanisms , 1998 .

[6]  E. Hopkins Two Competing Models of How People Learn in Games (first version) , 1999 .

[7]  Anand Rangarajan,et al.  Self-annealing and self-annihilation: unifying deterministic annealing and relaxation labeling , 2000, Pattern Recognit..

[8]  R. Zunino,et al.  Analog implementation of the SoftMax function , 2002, 2002 IEEE International Symposium on Circuits and Systems. Proceedings (Cat. No.02CH37353).

[9]  Tom Lenaerts,et al.  A selection-mutation model for q-learning in multi-agent systems , 2003, AAMAS '03.

[10]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[11]  J. Crutchfield,et al.  Coupled replicator equations for the dynamics of learning in multiagent systems. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[13]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[14]  Felipe Alvarez,et al.  Hessian Riemannian Gradient Flows in Convex Programming , 2018, SIAM J. Control. Optim..

[15]  Josef Hofbauer,et al.  Learning in perturbed asymmetric games , 2005, Games Econ. Behav..

[16]  David S. Leslie,et al.  Individual Q-Learning in Normal Form Games , 2005, SIAM J. Control. Optim..

[17]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[18]  Angela J. Yu,et al.  Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[19]  William H. Sandholm,et al.  The projection dynamic and the replicator dynamic , 2008, Games Econ. Behav..

[20]  A. S. Xanthopoulos,et al.  Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems , 2008, Appl. Math. Comput..

[21]  Marc Harper,et al.  The Replicator Equation as an Inference Dynamic , 2009, ArXiv.

[22]  Josef Hofbauer,et al.  Stable games and their dynamics , 2009, J. Econ. Theory.

[23]  Chunhua Shen,et al.  On the Dual Formulation of Boosting Algorithms , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  William H. Sandholm,et al.  Population Games And Evolutionary Dynamics , 2010, Economic learning and social evolution.

[25]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[26]  Aram Galstyan,et al.  Dynamics of Boltzmann Q learning in two-player two-action games. , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  Aris L. Moustakas,et al.  Matrix exponential learning: Distributed optimization in MIMO systems , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[28]  Rida Laraki,et al.  Higher order game dynamics , 2012, J. Econ. Theory.

[29]  J. Peypouquet Convex Optimization in Normed Spaces: Theory, Methods and Examples , 2015 .

[30]  P. Bossaerts,et al.  From behavioural economics to neuroeconomics to decision neuroscience: the ascent of biology in research on human decision making , 2015, Current Opinion in Behavioral Sciences.

[31]  Karl Tuyls,et al.  Evolutionary Dynamics of Multi-Agent Learning: A Survey , 2015, J. Artif. Intell. Res..

[32]  Michalis K. Titsias,et al.  One-vs-Each Approximation to Softmax for Scalable Estimation of Probabilities , 2016, NIPS.

[33]  Daniel A. Braun,et al.  Bio-inspired feedback-circuit implementation of discrete, free energy optimizing, winner-take-all computations , 2016, Biological Cybernetics.

[34]  Charles A. Holt,et al.  Quantal Response Equilibrium: A Stochastic Theory of Games , 2016 .

[35]  William H. Sandholm,et al.  Learning in Games via Reinforcement and Regularization , 2014, Math. Oper. Res..

[36]  Ramón Fernández Astudillo,et al.  From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification , 2016, ICML.

[37]  Naomi Ehrich Leonard,et al.  Parameter Estimation in Softmax Decision-Making Models With Linear Objective Functions , 2015, IEEE Transactions on Automation Science and Engineering.

[38]  Kavosh Asadi,et al.  An Alternative Softmax Operator for Reinforcement Learning , 2016, ICML.

[39]  Lacra Pavel,et al.  On Passivity, Reinforcement Learning, and Higher Order Learning in Multiagent Finite Games , 2018, IEEE Transactions on Automatic Control.