Simulating bout-and-pause patterns with reinforcement learning

Animal responses occur according to a specific temporal structure composed of two states, where a bout is followed by a long pause until the next bout. Such about-and-pause pattern has three components: the bout length, the within-bout response rate, and the bout initiation rate. Previous studies have investigated how these three components are affected by experimental manipulations. However, it remains unknown what underlying mechanisms cause bout-and-pause patterns. In this article, we propose two mechanisms and examine computational models developed based on reinforcement learning. The model is characterized by two mechanisms. The first mechanism is choice—an agent makes a choice between operant and other behaviors. The second mechanism is cost—a cost is associated with the changeover of behaviors. These two mechanisms are extracted from past experimental findings. Simulation results suggested that both the choice and cost mechanisms are required to generate bout-and-pause patterns and if either of them is knocked out, the model does not generate bout-and-pause patterns. We further analyzed the proposed model and found that it reproduced the relationships between experimental manipulations and the three components that have been reported by previous studies. In addition, we showed alternative models can generate bout-and-pause patterns as long as they implement the two mechanisms.

[1]  R. Shull,et al.  Response rate viewed as engagement bouts: effects of relative reinforcement and schedule type. , 2001, Journal of the experimental analysis of behavior.

[2]  Federico Sanabria,et al.  The isolation of motivational, motoric, and schedule effects on operant performance: a modeling approach. , 2011, Journal of the experimental analysis of behavior.

[3]  Gonzalo G. de Polavieja,et al.  The Origin of Behavioral Bursts in Decision-Making Circuitry , 2011, PLoS Comput. Biol..

[4]  R. Herrnstein On the law of effect. , 1970, Journal of the experimental analysis of behavior.

[5]  Yutaka Sakai,et al.  The Actor-Critic Learning Is Behind the Matching Law: Matching Versus Optimal Behaviors , 2008, Neural Computation.

[6]  J. J McDowell,et al.  A computational model of selection by consequences: Log survivor plots , 2008, Behavioural Processes.

[7]  H S HOFFMAN,et al.  A progression for generating variable-interval schedules. , 1962, Journal of the experimental analysis of behavior.

[8]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[9]  I. Kyriazakis,et al.  To split behaviour into bouts, log-transform the intervals , 1999, Animal Behaviour.

[10]  Albert-László Barabási,et al.  The origin of bursts and heavy tails in human dynamics , 2005, Nature.

[11]  R. Shull,et al.  Response rate viewed as engagement bouts: resistance to extinction. , 2002, Journal of the experimental analysis of behavior.

[12]  J. Adam Bennett,et al.  Effects of methamphetamine on response rate: A microstructural analysis , 2007, Behavioural Processes.

[13]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[14]  R. Shull,et al.  Bouts of responding on variable-interval schedules: effects of deprivation level. , 2004, Journal of the experimental analysis of behavior.

[15]  Ryan D Ward,et al.  Resistance to change of responding maintained by unsignaled delays to reinforcement: a response-bout analysis. , 2006, Journal of the experimental analysis of behavior.

[16]  Takayuki Tanno Response-bout analysis of interresponse times in variable-ratio and variable-interval schedules , 2016, Behavioural Processes.

[17]  Federico Sanabria,et al.  Extinction learning deficit in a rodent model of attention-deficit hyperactivity disorder , 2012, Behavioral and Brain Functions.

[18]  Federico Sanabria,et al.  Extinction under a behavioral microscope: Isolating the sources of decline in operant response rate , 2012, Behavioural Processes.

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20]  J J McDowell,et al.  A computational model of selection by consequences. , 2004, Journal of the experimental analysis of behavior.

[21]  P. Killeen,et al.  Molecular analyses of the principal components of response strength. , 2002, Journal of the experimental analysis of behavior.

[22]  A. Nieder Counting on neurons: the neurobiology of numerical competence , 2005, Nature Reviews Neuroscience.

[23]  Takayuki Sakagami,et al.  Modeling bout–pause response patterns in variable-ratio and variable-interval schedules using hierarchical Bayesian methodology , 2018, Behavioural Processes.

[24]  Sumio Watanabe,et al.  Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory , 2010, J. Mach. Learn. Res..

[25]  R. Shull,et al.  Bouts of responding: the relation between bout rate and the rate of variable-interval reinforcement. , 2004, Journal of the experimental analysis of behavior.

[26]  Raymond C Pitts,et al.  Concurrent performance as bouts of behavior. , 2014, Journal of the experimental analysis of behavior.

[27]  Matthew T Bowers,et al.  Interresponse time structures in variable-ratio and variable-interval schedules. , 2008, Journal of the experimental analysis of behavior.

[28]  T. Stankowich Behavior , 2009, The Quarterly Review of Biology.

[29]  J J McDowell,et al.  On the classic and modern theories of matching. , 2005, Journal of the experimental analysis of behavior.