TRACX 2.0: A memory-based, biologically-plausible model of sequence segmentation and chunk extraction Robert M. French (robert.french@u-bourgogne.fr) LEAD-CNRS UMR 5022, Universite de Bourgogne 21000 Dijon, France Garrison W. Cottrell (gary@ucsd.edu) Computer Science and Engineering, UCSD La Jolla, CA 92093-0404, USA Abstract TRACX (French, Addyman, & Mareschal, 2011) is a recursive connectionist system that implicitly extracts chunks from sequence data. It can account for experiments on infant statistical learning and adult implicit learning, as well as real- world phoneme data, and an experiment using backward transitional probabilities that simple recurrent networks cannot account for. One criticism of TRACX, however, is the implausibility in a connectionist model of if-then-else statements. In particular, one of these statements controls what data is copied from the model’s internal memory into its input, based on a hard error threshold. We, therefore, developed a more biologically-plausible version of TRACX devoid of if-then-else statements, relying only on spreading activation and without any learning error threshold. This new model, TRACX 2.0, performs essentially as well as the original TRACX model and, in addition, has two fewer parameters than the original and accounts for the graded nature of chunks. Keywords: chunk extraction; statistical learning; implicit learning; recursive autoassociative memory; autoassociators. Introduction No one disputes that individuals learn to extract structure from their sensory environment. There is, however, a heated debate is over just how this is done. In what follows we will suggest a neurobiologically plausible, memory-based model that achieves this in the auditory domain. The model provides a strong hypothesis as to how people -- infants, as well as adults -- might segment continuous syllable streams into words. The model is an improvement of a recent connectionist memory-based model of sequence segmentation and chunking, TRACX (French, Addyman, & Mareschal, 2011). The new model improves TRACX by removing a crucial if-then-else statement in the model and replaces it with a simple connectionist mechanism. The mainstream view of how segmentation is done, one that has held sway for the nearly two decades, is based on the notion of prediction. This theory supposes that individuals, based on their previous experience with the world, are constantly in the process of making predictions about what is going to happen next in their environment. In so doing, they gradually learn to align their predictions with what actually happens in the world. In order to make these predictions, they must gradually learn the probabilities of successive events in the world. We learn that a flash of lightning will invariably be followed by a clap of thunder, that a “hello” will usually be reciprocated, that a phone call will sometimes be for us, but sometimes not, that the flashing light on a police car will usually be for someone else, but occasionally for us, and so on. This is the basis of the transitional probability (TP) theory of sequence segmentation. The idea is simple. In the syllable stream that an infant hears, many multi-syllable words will be repeated frequently (e.g., bay-bee, mah-mee, bah-tul, and so on) and, as a result, the infant will become better at predicting upcoming within-word syllables compared to upcoming between-word syllables. (The syllable pair bay-bee will be followed by the initial syllable of many different words, whereas as bay will be very frequently followed by bee. The infant thus learns the word bay-bee.) Thus, low syllable-to-syllable TPs (failures to predict) indicate word boundaries. High syllable-to-syllable TPs bind syllables together into words and facilitate their learning. An obvious connectionist candidate for this kind of transitional-probability based learning is the well-known Simple Recurrent Network (SRN, Elman, 1990). While we don’t doubt that prediction is an important aspect of cognition, there are other plausible explanations as to how infants (and adults) learn to segment continuous speech streams into words. Broadly speaking, there are four classes of models used to explain sequence segmentation and word extraction. These are: - Predictive connectionist models , most prominent among them the SRN (Elman, 1990; Cleeremans & McClelland, 1991; Servan-Schreiber, Cleeremans, & McClelland, 1991); - Chunking connectionist models , i.e., TRACX (French, et al., 2011); - Symbolic hybrid models , the best known of which are probably PARSER (Perruchet & Vinter, 1998, 2002) and the Competitive Chunker (Servan-Schreiberr & Anderson, 1990) - Normative statistical models (Frank, Goldwater, Griffiths & Tenenbaum, 2010; Goldwater, Griffiths, & Johnson, 2009; Borschinger, & Johnson, 2011). Recently, Kurumada, Meylan, and Frank (2013) ran a series
[1]
Pierre Perruchet,et al.
A role for backward transitional probabilities in word segmentation?
,
2008,
Memory & cognition.
[2]
Jordan B. Pollack,et al.
Recursive Distributed Representations
,
1990,
Artif. Intell..
[3]
Denis Mareschal,et al.
TRACX: a recognition-based connectionist framework for sequence segmentation and chunk extraction.
,
2011,
Psychological review.
[4]
Garrison W. Cottrell,et al.
Please Scroll down for Article Connection Science Learning Simple Arithmetic Procedures
,
2022
.
[5]
Garrison W. Cottrell,et al.
EMPATH: Face, Emotion, and Gender Recognition Using Holons
,
1990,
NIPS.
[6]
Michael C. Frank,et al.
Modeling human performance in statistical word segmentation
,
2010,
Cognition.
[7]
Michael C. Frank,et al.
Zipfian frequency distributions facilitate word segmentation in context
,
2013,
Cognition.
[8]
E. Rolls,et al.
Neural networks and brain function
,
1998
.
[9]
James L. McClelland,et al.
Graded state machines: The representation of temporal contingencies in simple recurrent networks
,
1991,
Machine Learning.
[10]
M. Goldsmith,et al.
Statistical Learning by 8-Month-Old Infants
,
1996
.
[11]
Martial Mermillod,et al.
The role of bottom-up processing in perceptual categorization by 3- to 4-month-old infants: simulations and data.
,
2004,
Journal of experimental psychology. General.
[12]
G. Cottrell,et al.
Cognitive Binding: A Computational-Modeling Analysis of a Distinction between Implicit and Explicit Memory
,
1992,
Journal of Cognitive Neuroscience.
[13]
James L. McClelland,et al.
Graded State Machines: The Representation of Temporal Contingencies in Simple Recurrent Networks
,
2005,
Machine Learning.
[14]
T. Griffiths,et al.
A Bayesian framework for word segmentation: Exploring the effects of context
,
2009,
Cognition.
[15]
M A Gluck,et al.
Computational models of the neural bases of learning and memory.
,
1993,
Annual review of neuroscience.
[16]
Jeffrey L. Elman,et al.
Finding Structure in Time
,
1990,
Cogn. Sci..
[17]
Douglas S. Blank,et al.
Exploring the Symbolic/Subsymbolic Continuum: A case study of RAAM
,
1992
.
[18]
John R. Anderson,et al.
Learning Artificial Grammars With Competitive Chunking
,
1990
.
[19]
R. O’Reilly,et al.
Computational Explorations in Cognitive Neuroscience: Understanding the Mind by Simulating the Brain
,
2000
.
[20]
Geoffrey E. Hinton,et al.
A Learning Algorithm for Boltzmann Machines
,
1985,
Cogn. Sci..
[21]
R. French,et al.
A connectionist account of asymmetric category learning in early infancy.
,
2000,
Developmental psychology.
[22]
Mark Johnson,et al.
A Particle Filter algorithm for Bayesian Wordsegmentation
,
2011,
ALTA.
[23]
S. Lewandowsky,et al.
An endogenous distributed model of ordering in serial recall
,
2002,
Psychonomic bulletin & review.
[24]
Stephen A. Ritz,et al.
Distinctive features, categorical perception, and probability learning: some applications of a neural model
,
1977
.
[25]
A. Vinter,et al.
The self-organizing consciousness.
,
2002,
The Behavioral and brain sciences.
[26]
E. N. Sokolov,et al.
Perception and the Conditioned Reflex
,
1965
.