Combining Simplicity and Likelihood in Language and Music Rens Bod (rens@science.uva.nl) Cognitive Science Center Amsterdam, University of Amsterdam Nieuwe Achtergracht 166, Amsterdam, The Netherlands Abstract It is widely accepted that the human cognitive system organizes perceptual input into complex hierarchical descriptions which can be represented by tree structures. Tree structures have been used to describe linguistic, musical and visual perception. In this paper, we will investigate whether there exists an underlying model that governs perceptual organization in general. Our key idea is that the cognitive system strives for the simplest structure (the simplicity principle ), but in doing so it is biased by the likelihood of previous experiences (the likelihood principle ). We will present a model which combines these two principles by balancing the notion of most likely tree with the notion of shortest derivation. Experiments with linguistic and musical benchmarks (Penn Treebank and Essen Folksong Collection) show that such a combination outperforms models that are based on either simplicity or likelihood alone. that the linguistic tree structure is labeled with syntactic categories, whereas the musical and visual tree structures are unlabeled. This is because in language there are syntactic constraints on how words can be combined into larger constituents, while in music (and to a lesser extent in vision) there are no such restrictions: in principle any note may be combined with any other note. List the sales of products in 1973 S NP NP PP NP V DT N P PP N P N List the sales of products in 1973 Introduction It is widely accepted that the human cognitive system organizes perceptual input into complex, hierarchical descriptions which can be represented by tree structures. Tree structures have been used to describe linguistic perception (e.g. Chomsky 1965), musical perception (e.g. Lerdahl & Jackendoff 1983) and visual perception (e.g. Marr 1982). Yet, there seems to be little or no work which emphasizes the commonalities between these different forms of perception and which searches for a general, underlying mechanism which governs all perceptual organization (cf. Leyton 2001). This paper aims to study exactly that question: acknowledging the differences between linguistic, musical and visual information, is there a general, unifying model which can predict the perceived tree structure for sensory input? In studying this question, we will use a strongly empirical methodology: any model that we might hypothesize will be tested against benchmarks such as the linguistically annotated Penn Treebank (Marcus et al. 1993) and the musically annotated Essen Folksong Collection (Schaffrath 1995). While we will argue for a unified model of language, music and vision, we will carry out experiments only with linguistic and musical benchmarks, since no benchmarks of visual tree structures are currently available, to the best of our knowledge. Figure 1 gives three simple examples of linguistic, musical and visual input with their corresponding tree structures given below. Thus a tree structure describes how parts of the input combine into constituents and how these constituents combine into a representation for the whole input. Note Figure 1: Examples of tree structures. Apart from these differences, there is also a fundamental commonality: the perceptual input undergoes a process of hierarchical structuring which is not found in the input itself. The main problem is thus: how can we derive the perceived tree structure for a given input? That this problem is not trivial may be illustrated by the fact that the inputs above can also be assigned the following, alternative tree structures in figure 2. S NP NP V DT PP N P PP N P N List the sales of products in 1973
[1]
Rens Bod,et al.
Memory-Based Models of Melodic Analysis: Challenging the Gestalt Principles
,
2002
.
[2]
N. Chater.
The Search for Simplicity: A Fundamental Cognitive Principle?
,
1999
.
[3]
Rens Bod,et al.
Beyond Grammar: An Experience-Based Theory of Language
,
1998
.
[4]
M W Crocker,et al.
Wide-Coverage Probabilistic Sentence Processing
,
2000,
Journal of psycholinguistic research.
[5]
Joshua Goodman,et al.
1 Efficient parsing of DOP with PCFG-reductions – DRAFT
,
2001
.
[6]
John Hale,et al.
A Probabilistic Earley Parser as a Psycholinguistic Model
,
2001,
NAACL.
[7]
Eugene Charniak,et al.
Statistical language learning
,
1997
.
[8]
Rens Bod,et al.
Parsing with the Shortest Derivation
,
2000,
COLING.
[9]
F. Restle,et al.
Analysis of ambiguity in visual pattern completion.
,
1983,
Journal of experimental psychology. Human perception and performance.
[10]
Daniel Jurafsky,et al.
A Probabilistic Model of Lexical and Syntactic Access and Disambiguation
,
1996,
Cogn. Sci..
[11]
Boris Cormons.
Analyse et desambiguisation : une approche a base de corpus (data-oriented parsing) pour les representations lexicales fonctionnelles
,
1999
.
[12]
Eugene Charniak,et al.
Statistical Techniques for Natural Language Parsing
,
1997,
AI Mag..
[13]
Ralph Grishman,et al.
A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars
,
1991,
HLT.
[14]
Rens Bod.
A memory-based model for music analysis
,
2001
.
[15]
Peter A. van der Helm,et al.
Simplicity versus likelihood in visual perception: from surprisals to precisals.
,
2000
.
[16]
Eugene Charniak,et al.
A Maximum-Entropy-Inspired Parser
,
2000,
ANLP.
[17]
Rens Bod,et al.
Using an Annotated Language Corpus as a Virtual Stochastic Grammar
,
1993,
AAAI.
[18]
Rens Bod.
What is the Minimal Set of Fragments that Achieves Maximal Parse Accuracy?
,
2001,
ACL.
[19]
Paul Gorrell.
Syntax and Parsing
,
1995
.
[20]
Noam Chomsky,et al.
वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax
,
1965
.
[21]
Lyn Frazier,et al.
ON COMPREHENDING SENTENCES: SYNTACTIC PARSING STRATEGIES.
,
1979
.