Locally Bayesian Learning John K. Kruschke (kruschke@indiana.edu) Department of Psychological and Brain Sciences; Indiana University Bloomington IN 47405 USA The system starts with some prior distribution of belief over the joint hypotheses, p(θ L , . . . , θ 1 ). That distribution is updated each time that an input-output datum is experienced. For input x 1 , suppose that the correct outcome, as observed in the environment, is t L . Bayes’ theorem indicates that the appropriate beliefs after witnessing the item ht L , x 1 i are Abstract This article is concerned with trial-by-trial, online learning of cue-outcome mappings. In models structured as succes- sions of component functions, an external target can be back- propagated such that the lower layer’s target is the input to the higher layer that maximizes the probability of the higher layer’s target. Each layer then does locally Bayesian learning. The resulting parameter updating is not globally Bayesian, but can better capture human behavior. The approach is imple- mented for an associative learning model that first maps inputs to attentionally filtered inputs, and then maps attentionally fil- tered inputs to outputs. The model is applied to the human- learning phenomenon called highlighting, which is challeng- ing to other extant Bayesian models, including the rational model of Anderson, the Kalman filter model of Dayan and Kakade et al., the noisy-OR model of Tenenbaum and Grif- fiths et al., and the sigmoid-belief networks of Courville et al. Further details and applications are provided by Kruschke (in press); the present article reports new simulations of the Kalman filter and rational model. p(θ L , . . . , θ 1 |t L , x 1 ) p(t L |θ L , . . . , θ 1 , x 1 ) p(θ L , . . . , θ 1 ) = R dθ L . . . dθ 1 p(t L |θ L , . . . , θ 1 , x 1 ) p(θ L , . . . , θ 1 ) The probability of the outcome given the input, p(t L |θ L , . . . , θ 1 , x 1 ), is determined by the particular functions in each layer. The updating of the belief distribution over the joint parameter space is referred to as globally Bayesian learning. Cognition Modeled as a Succession of Transformations Locally Bayesian Learning Cognitive models are often conceived to be successions of transformations from an input representation, through vari- ous internal representations, to an output or response repre- sentation. Each transformation is a formal operation, typi- cally having various parameter values that are tuned by expe- rience. A well-know example is Marr’s (1982) modeling of vision as a succession from a representation of image inten- sity to a “primal sketch” to a “2 12 -D sketch” to a 3-D model representation. An alternative approach comes from considering the local en- vironment of each layer. Each layer only has contact with its own input and output. If a layer had a specific target and in- put, then the layer could apply Bayesian updating to its own parameters, without worrying about the other layers. A local updating scheme proceeds as follows. When an in- put x 1 is presented at the bottom layer, the input is propagated Globally Bayesian Learning y l+1 ∼ p(y l+1 |θ l+1 , x l+1 ) In Bayesian approaches to cognitive modeling, each transfor- mation in the hierarchy takes an input and generates a dis- tribution of possible outputs. Figure 1 shows the input x l at layer l being transformed into the output y l , which has a probability distribution p(y l ). The input at the first layer is denoted x 1 , and the output at the last layer is denoted y L . The specifics of the distribution are governed by the values of the parameters θ l . Each value of the parameters θ l represents a particular hy- pothesis about how inputs (stimuli) and outputs (outcomes or responses) are related. The combinations of all possible values of θ l span the possible beliefs of the model. The core ontological notion in Bayesian approaches is that knowledge consists of the degree of belief in each possible value of the parameters θ l . That distribution of beliefs in each layer is denoted p(θ l ). θ l+1 ∼ p(θ l+1 ) x l+1 y l ∼ p(y l |θ l , x l ) θ l ∼ p(θ l ) x l Figure 1. Architecture of successive functions. Vertical arrows in- dicate a mapping from input to output within a layer, parameterized by θ. The notation “θ ∼ p(θ)” means that θ is distributed accord- ing to the probability distribution p(θ). In the globally Bayesian approach, x l+1 = y l . In the locally Bayesian approach, x l+1 = y ¯ l .
[1]
Nozer D. Singpurwalla,et al.
Understanding the Kalman Filter
,
1983
.
[2]
D. Medin,et al.
Problem structure and the use of base-rate information from experience.
,
1988,
Journal of experimental psychology. General.
[3]
John R. Anderson.
The Adaptive Character of Thought
,
1990
.
[4]
D. Medin,et al.
Sensitivity to changes in base-rate information
,
1991
.
[5]
David R. Shanks.
Connectionist Accounts of the Inverse Base-rate Effect in Categorization
,
1992
.
[6]
J. Kruschke,et al.
ALCOVE: an exemplar-based connectionist model of category learning.
,
1992,
Psychological review.
[7]
J. Kruschke.
Base rates in category learning.
,
1996,
Journal of experimental psychology. Learning, memory, and cognition.
[8]
John K. Kruschke,et al.
Shifting attention in cued recall
,
1998
.
[9]
J. Kruschke,et al.
Rules and exemplars in category learning.
,
1998,
Journal of experimental psychology. General.
[10]
John K. Kruschke,et al.
Associative learning in baboons (Papio papio) and humans (Homo sapiens): species differences in learned attention to visual features
,
1998,
Animal Cognition.
[11]
J. Kruschke,et al.
A model of probabilistic category learning.
,
1999,
Journal of experimental psychology. Learning, memory, and cognition.
[12]
S. Kakade,et al.
Learning and selective attention
,
2000,
Nature Neuroscience.
[13]
Peter Dayan,et al.
Explaining Away in Weight Space
,
2000,
NIPS.
[14]
P Juslin,et al.
High-level reasoning and base-rate use: do we need cue-competition to explain the inverse base-rate effect?
,
2001,
Journal of experimental psychology. Learning, memory, and cognition.
[15]
J. Kruschke.
Toward a unified model of attention in associative learning
,
2001
.
[16]
Refractor.
Vision
,
2000,
The Lancet.
[17]
S. Kakade,et al.
Acquisition and extinction in autoshaping
,
2002
.
[18]
Thomas L. Griffiths,et al.
Theory-Based Causal Inference
,
2002,
NIPS.
[19]
S. Kakade,et al.
Acquisition and extinction in autoshaping.
,
2002,
Psychological review.
[20]
David S. Touretzky,et al.
Model Uncertainty in Classical Conditioning
,
2003,
NIPS.
[21]
Peter Dayan,et al.
Uncertainty and Learning
,
2003
.
[22]
Stephan Lewandowsky,et al.
Population of linear experts: knowledge partitioning and function learning.
,
2004,
Psychological review.
[23]
Alison Gopnik,et al.
Children's causal inferences from indirect evidence: Backwards blocking and Bayesian reasoning in preschoolers
,
2004,
Cogn. Sci..
[24]
David S. Touretzky,et al.
Similarity and Discrimination in Classical Conditioning: A Latent Variable Account
,
2004,
NIPS.
[25]
Amos Storkey,et al.
Advances in Neural Information Processing Systems 20
,
2007
.
[26]
J. Kruschke,et al.
Eye gaze and individual differences consistent with learned attention in associative blocking and highlighting.
,
2005,
Journal of experimental psychology. Learning, memory, and cognition.
[27]
J. Kruschke.
Locally Bayesian learning with applications to retrospective revaluation and highlighting.
,
2006,
Psychological review.
[28]
R. Sutton.
Gain Adaptation Beats Least Squares
,
2006
.