Drawing an inspiration from behavioral studies of human decision making, we propose here a general parametric framework for a reinforcement learning problem, which extends the standard Q-learning approach to incorporate a two-stream framework of reward processing with biases biologically associated with several neurological and psychiatric conditions, including Parkinson's and Alzheimer's diseases, attention-deficit/hyperactivity disorder (ADHD), addiction, and chronic pain. For AI community, the development of agents that react differently to different types of rewards can enable us to understand a wide spectrum of multi-agent interactions in complex real-world socioeconomic systems. Moreover, from the behavioral modeling perspective, our parametric framework can be viewed as a first step towards a unifying computational model capturing reward processing abnormalities across multiple mental conditions and user preferences in long-term recommendation systems.
[1]
Pieter Abbeel,et al.
Apprenticeship learning via inverse reinforcement learning
,
2004,
ICML.
[2]
Djallel Bouneffouf,et al.
Bandit Models of Human Behavior: Reward Processing in Mental Disorders
,
2017,
AGI.
[3]
Michael J. Frank,et al.
By Carrot or by Stick: Cognitive Reinforcement Learning in Parkinsonism
,
2004,
Science.
[4]
Jadin C. Jackson,et al.
Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling.
,
2007,
Psychological review.
[5]
Andreas Krause,et al.
Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting
,
2009,
IEEE Transactions on Information Theory.
[6]
J. Kramer,et al.
Reward processing in neurodegenerative disease
,
2015,
Neurocase.