The dynamics of operant conditioning.

Existing models of operant learning are relatively insensitive to historical properties of behavior and applicable to only limited data sets. This article proposes a minimal set of principles based on short-term and long-term memory mechanisms that can explain the major static and dynamic properties of operant behavior in both single-choice and multiresponse situations. The critical features of the theory are as follows: (a) The key property of conditioning is assessment of the degree of association between responses and reinforcement and between stimuli and reinforcement; (b) the contingent reinforcement is represented by learning expectancy, which is the combined prediction of response-reinforcement and stimulus-reinforcement associations; (c) the operant response is controlled by the interplay between facilitatory and suppressive variables that integrate differences between expected (long-term) and experienced (short-term) events; and (d) very-long-term effects are encoded by a consolidated memory that is sensitive to the entire reinforcement history. The model predicts the major qualitative features of operant phenomena and then suggests an experimental test of theoretical predictions about the joint effects of reinforcement probability and amount of training on operant choice. We hypothesize that the set of elementary principles that we propose may help resolve the long-standing debate about the fundamental variables controlling operant conditioning.

[1]  Gordon H. Bower,et al.  A contrast effect in differential conditioning. , 1961 .

[2]  A. Amsel,et al.  The generalized PRE: Within-S PRF and CRF training in different runways, at different times of day, by different experimenters , 1968 .

[3]  Kenneth W. Spence,et al.  The nature of the response in discrimination learning. , 1952 .

[4]  R N Wilton,et al.  Behavioral contrast as a function of the duration of an immediately preceding period of extinction. , 1971, Journal of the experimental analysis of behavior.

[5]  B. Skinner,et al.  Principles of Behavior , 1944 .

[6]  V. Lollo,et al.  Negative contrast effect as a function of magnitude of reward decrement , 1966 .

[7]  J. Myerson,et al.  The kinetics of choice: An operant systems analysis. , 1980 .

[8]  J E Staddon,et al.  Matching, maximizing, and hill-climbing. , 1983, Journal of the experimental analysis of behavior.

[9]  W. Baum,et al.  Matching, undermatching, and overmatching in studies of choice. , 1979, Journal of the experimental analysis of behavior.

[10]  R. Weisman Some determinants of inhibitory stimulus control. , 1969, Journal of the experimental analysis of behavior.

[11]  N. Mackintosh Further analysis of the overtraining reversal effect. , 1969, Journal of comparative and physiological psychology.

[12]  K. Edward Renner,et al.  Influence of deprivation and availability of goal box cues on the temporal gradient of reinforcement. , 1963 .

[13]  J. Staddon,et al.  Limits to action, the allocation of individual behavior , 1982 .

[14]  J. H. McHose Relative Reinforcement Effects: S1/S2 and S1/S1 Paradigms in Instrumental Conditioning. , 1970 .

[15]  N. Mackintosh,et al.  Mechanisms of animal discrimination learning , 1971 .

[16]  B A Williams,et al.  The following schedule of reinforcement as a fundamental determinant of steady state contrast in multiple schedules. , 1981, Journal of the experimental analysis of behavior.

[17]  E. Capaldi,et al.  Response reversal following different amounts of training. , 1957, Journal of comparative and physiological psychology.

[18]  R. Boakes,et al.  Behavioral contrast and response independent reinforcement. , 1971, Journal of the experimental analysis of behavior.

[19]  R. Hooper Variables controlling the overlearning reversal effect (ORE). , 1967, Journal of experimental psychology.

[20]  J. Gray The neuropsychology of anxiety. , 1985, Issues in mental health nursing.

[21]  N. Mackintosh The psychology of animal learning , 1974 .

[22]  E. Tolman Purposive behavior in animals and men , 1932 .

[23]  John R. Krebs,et al.  Foraging in a changing environment: An experiment with starlings ("sturnus vulgaris"). , 1987 .

[24]  S. Weinstock Acquisition and extinction of a partially reinforced running response at a 24-hour intertrial interval. , 1958, Journal of experimental psychology.

[25]  J. Davenport,et al.  The interaction of magnitude and delay of reinforcement in spatial discrimination. , 1962, Journal of comparative and physiological psychology.

[26]  Christopher D. Adams Variations in the Sensitivity of Instrumental Responding to Reinforcer Devaluation , 1982 .

[27]  P. L. Carlton,et al.  Partial reinforcement effects in a runway: Between- and within-Ss , 1965 .

[28]  R. Herrnstein On the law of effect. , 1970, Journal of the experimental analysis of behavior.

[29]  W. Russell,et al.  The traumatic amnesias. , 1968, International journal of neurology.

[30]  L. S. Reid,et al.  The development of noncontinuity behavior through continuity learning. , 1953, Journal of experimental psychology.

[31]  A. Amsel Frustrative nonreward in partial reinforcement and discrimination learning: some recent history and a theoretical extension. , 1962, Psychological review.

[32]  B. Williams,et al.  A test of the melioration theory of matching. , 1989 .

[33]  G. Stratton University of California publications in psychology , 1976 .

[34]  Michael Davison The matching law , 1987 .

[35]  R. Bolles Reinforcement, expectancy, and learning. , 1972 .

[36]  J E Staddon,et al.  Quasi-dynamic choice models: Melioration and ratio invariance. , 1988, Journal of the experimental analysis of behavior.

[37]  S. Weinstock,et al.  Resistance to extinction of a running response following partial reinforcement under widely spaced trials. , 1954, Journal of comparative and physiological psychology.

[38]  A Amsel,et al.  Partial reinforcement effects within subject and between subjects. , 1966, Psychological monographs.

[39]  B. Shepp,et al.  Simultaneous and successive discrimination-reversal in the rat. , 1961, The American journal of psychology.

[40]  V. F. Sheffield,et al.  Extinction as a function of partial reinforcement and distribution of practice. , 1949, Journal of experimental psychology.

[41]  K L Wheatley,et al.  Matching to relative reinforcement frequency in multiple schedules with a short component duration. , 1971, Journal of the experimental analysis of behavior.

[42]  A G Barto,et al.  Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[43]  David C. Palmer,et al.  Learning and Complex Behavior , 1993 .

[44]  Robert A. Rescorla,et al.  Effect of reinforcer devaluation on discriminative control of instrumental behavior. , 1990, Journal of experimental psychology. Animal behavior processes.

[45]  W. E. Bacon Partial-reinforcement extinction effect following different amounts of training. , 1962, Journal of comparative and physiological psychology.

[46]  P. Killeen Mathematical principles of reinforcement , 1994 .

[47]  S. Grossberg Processing of Expected and Unexpected Events During Conditioning and Attention: A Psychophysiological Theory , 1982 .

[48]  D. W. Murray,et al.  Positive and negative successive contrast effects following multiple shifts in reward magnitude under high drive and immediate reinforcement , 1976 .

[49]  E. Capaldi,et al.  Repeated shifts in reward magnitude: evidence in favor of an associational and absolute (noncontextual) interpretation. , 1967, Journal of experimental psychology.

[50]  Frederick Mosteller,et al.  Stochastic Models for Learning , 1956 .

[51]  L. Crespi Quantitative variation of incentive and performance in the white rat. , 1942 .

[52]  J. Hinson,et al.  Behavioral competition: a mechanism for schedule interactions. , 1978, Science.

[53]  Capaldi Ej Sequential versus nonsequential variables in partial delay of reward. , 1967 .

[54]  R. Rescorla,et al.  Instrumental responding remains sensitive to reinforcer devaluation after extensive training , 1985 .

[55]  R. Church,et al.  Scalar expectancy theory and choice between delayed rewards. , 1988, Psychological review.

[56]  W. Estes Statistical theory of spontaneous recovery and regression. , 1955, Psychological review.

[57]  J. Gray,et al.  Précis of The neuropsychology of anxiety: An enquiry into the functions of the septo-hippocampal system , 1982, Behavioral and Brain Sciences.

[58]  A. Klopf A neuronal model of classical conditioning , 1988 .

[59]  K. Spence Behavior Theory and Conditioning , 1978 .

[60]  S. Lea,et al.  The Integration of Reinforcements over Time , 1984, Annals of the New York Academy of Sciences.

[61]  D Macewen,et al.  The effects of terminal-link fixed-interval and variable-interval schedules on responding under concurrent chained schedules. , 1972, Journal of the experimental analysis of behavior.

[62]  The role of probability of reinforcement in models of choice. , 1994, Psychological review.

[63]  A. Amsel,et al.  Two tests of the Sheffield hypothesis concerning resistance to extinction, partial reinforcement, and distribution of practice. , 1955, Journal of experimental psychology.

[64]  J. E. Mazur,et al.  Choice behavior in transition: Development of preference in a free-operant procedure , 1991 .

[65]  G. Ainslie Specious reward: a behavioral theory of impulsiveness and impulse control. , 1975, Psychological bulletin.

[66]  H. M. Jenkins,et al.  Resistance to extinction when partial reinforcement is followed by regular reinforcement. , 1962, Journal of experimental psychology.

[67]  N. Schmajuk,et al.  Stimulus configuration, classical conditioning, and hippocampal function. , 1992, Psychological review.

[68]  R. Herrnstein,et al.  Choice and delay of reinforcement. , 1967, Journal of the experimental analysis of behavior.

[69]  S. Weinstock,et al.  EFFECTS OF DELAY ON SUBSEQUENT RUNNING UNDER IMMEDIATE REINFORCEMENT. , 1963, Journal of experimental psychology.

[70]  R. Herrnstein,et al.  CHAPTER 5 – Melioration and Behavioral Allocation1 , 1980 .

[71]  A. Dickinson,et al.  Reinforcer specificity of the suppression of instrumental performance on a non-contingent schedule , 1989, Behavioural Processes.

[72]  J. Staddon,et al.  Probabilistic choice: A simple invariance , 1987, Behavioural Processes.

[73]  R. Herrnstein,et al.  Preference reversal and delayed reinforcement , 1981 .

[74]  D R MEYER,et al.  The effects of differential rewards on discrimination reversal learning by monkeys. , 1951, Journal of experimental psychology.

[75]  John A. Nevin,et al.  Behavioral momentum and the partial reinforcement effect , 1988 .

[76]  J. E. Mazur Choice behavior in transition: development of preference with ratio and interval schedules. , 1992, Journal of experimental psychology. Animal behavior processes.

[77]  T. Belke Stimulus preference and the transitivity of preference , 1992 .

[78]  E. J. Capaldi,et al.  Chapter 3 – MEMORY AND LEARNING: A SEQUENTIAL VIEWPOINT1 , 1971 .

[79]  P. Killeen On the measurement of reinforcement frequency in the study of preference. , 1968, Journal of the experimental analysis of behavior.

[80]  Richard S. Sutton,et al.  Time-Derivative Models of Pavlovian Reinforcement , 1990 .

[81]  J. Gibbon Dynamics of time matching: Arousal makes better seem worse , 1995, Psychonomic bulletin & review.

[82]  P. Mikulka,et al.  Effect of interpolated extinction and level of training on the "depression effect". , 1966, Journal of experimental psychology.

[83]  P. Killeen Preference for fixed-interval schedules of reinforcement. , 1970, Journal of the experimental analysis of behavior.

[84]  L. J. Hammond,et al.  Signaling unearned reinforcers removes the suppression produced by a zero correlation in an operant paradigm , 1984 .

[85]  M. Bitterman,et al.  Spaced-trials partial reinforcement effect as a function of contrast. , 1969 .

[86]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[87]  L R Squire,et al.  Retrograde amnesia: temporal gradient in very long term memory following electroconvulsive therapy. , 1975, Science.

[88]  N. Mackintosh Distribution of trials and the partial reinforcement effect in the rat. , 1970 .

[89]  T. A. Mark,et al.  Kinetics of matching. , 1994, Journal of experimental psychology. Animal behavior processes.

[90]  E. Capaldi Successive negative contrast effect: Intertrial interval, type of shift, and four sources of generalization decrement. , 1972 .

[91]  J. Staddon,et al.  The "supersitition" experiment: A reexamination of its implications for the principles of adaptive behavior. , 1971 .

[92]  M. Bitterman,et al.  The effect of partial and delayed reinforcement on resistance to extinction. , 1951, The American journal of psychology.

[93]  W. Cox,et al.  A Review of Recent Incentive Contrast Studies Involving Discrete-Trial Procedures , 1975 .

[94]  Helen B. Daly,et al.  A mathematical model of reward and aversive nonreward : its application in over 30 appetitive learning situations , 1982 .

[95]  R. Rescorla,et al.  Postconditioning devaluation of a reinforcer affects instrumental responding. , 1985 .

[96]  C. Koch,et al.  Recurrent excitation in neocortical circuits , 1995, Science.

[97]  M. Gabriel,et al.  Learning and Computational Neuroscience: Foundations of Adaptive Networks , 1990 .

[98]  T. Bloomfield Some temporal properties of behavioral contrast. , 1967, Journal of the experimental analysis of behavior.

[99]  Valentin Dragoi,et al.  A Dynamic Theory of Acquisition and Extinction in Operant Learning , 1997, Neural Networks.

[100]  S. Robbins Mechanisms underlying spontaneous recovery in autoshaping. , 1990 .

[101]  J. E. Mazur Development of preference and spontaneous recovery in choice behavior with concurrent variable-interval schedules , 1995 .

[102]  P. Mikulka,et al.  Effect of reinforcement schedules on reward shifts. , 1967, Journal of experimental psychology.

[103]  D. G. Davis,et al.  Memory for Reward in Probabilistic Choice: Markovian and Non-Markovian Properties , 1990 .

[104]  D. G. Davis,et al.  The process of recurrent choice. , 1993, Psychological review.

[105]  C. H. Honzik,et al.  Degrees of hunger, reward and non-reward, and maze learning in rats, and Introduction and removal of reward, and maze performance in rats , 1930 .

[106]  H. Harlow,et al.  The formation of learning sets. , 1949, Psychological review.

[107]  R. Rescorla,et al.  Associative Structures In Instrumental Learning , 1986 .

[108]  S. E. Sperling The ore in simultaneous and differential reversal: Acquisition task, acquisition criterion, and reversal task. , 1970 .

[109]  R J HERRNSTEIN,et al.  Relative and absolute strength of response as a function of frequency of reinforcement. , 1961, Journal of the experimental analysis of behavior.

[110]  J J Franchina,et al.  Reward magnitude shift effects in rats with hippocampal lesions. , 1971, Journal of comparative and physiological psychology.

[111]  J. Pearce,et al.  A model for Pavlovian learning: Variations in the effectiveness of conditioned but not of unconditioned stimuli. , 1980 .

[112]  P ROZIN Temperature Independence of an Arbitrary Temporal Discrimination in the Goldfish , 1965, Science.

[113]  D. J. Lewis Acquisition, extinction, and spontaneous recovery as a function of percentage of reinforcement and intertrial intervals. , 1956, Journal of experimental psychology.

[114]  B A Williams,et al.  Another look at contrast in multiple schedules. , 1983, Journal of the experimental analysis of behavior.

[115]  M Stemmler,et al.  Lateral interactions in primary visual cortex: a model bridging physiology and psychophysics. , 1995, Science.

[116]  C. Gallistel The organization of learning , 1990 .

[117]  W. Pavlik,et al.  Magnitude and Schedule of Reinforcement in Rats' Resistance to Extinction: Within Subjects , 1977 .

[118]  D. R. Williams,et al.  Time-dependent contrast effects in a multiple schedule of food reinforcement. , 1967, Journal of the experimental analysis of behavior.

[119]  W. F. Prokasy,et al.  Classical conditioning II: Current research and theory. , 1972 .

[120]  James L. McClelland,et al.  Distributed memory and the representation of general and specific information. , 1985, Journal of experimental psychology. General.

[121]  S. L. Cohen,et al.  Tests of behavior momentum in simple and multiple schedules with rats and pigeons. , 1993, Journal of the experimental analysis of behavior.

[122]  J A Nevin,et al.  An analysis of contrast effects in multiple schedules. , 1966, Journal of the experimental analysis of behavior.

[123]  R. Duncan Luce,et al.  Individual Choice Behavior: A Theoretical Analysis , 1979 .

[124]  Michael L. Commons,et al.  Matching and maximizing accounts , 1982 .