Phonotactic learning without a priori constraints : Arabic root cooccurrence restrictions revisited

1 Introduction Much work in generative linguistics is nativist in the sense that the fundamental mechanisms for computing linguistic processes are assumed to be innate. In Optimality Theory (OT), for example, the building blocks of grammar, well-formedness constraints, are universal and innate ((Prince & Smolensky 1993/2004), (McCarthy & Prince 1999)). Cross-linguistic differences are accounted for by reranking these fixed universal constraints. While it is fairly certain that some aspects of language are innate in humans, it is also far from clear which aspects are innate and which simply evolve in the natural course of language development. Results from a host of different research paradigms have shown that many language processes can be learned directly from the statistical structure of experience ((Elman et al. 1996), (Spencer et al. 2009)), including nontrivial ones like dependencies between nonadjacent elements ((Gomez 2002), (Newport & Aslin 2004)). Perhaps at least some of the constraints of OT grammars can be learned from experience too. In a sense, recent work in computational language learning in phonology anticipates this issue. Initial computational work in OT showed that, with a finite set of fixed constraints, complex linguistic systems can be learned within an OT 2009)), modify how constraint-based grammars predict output forms, but they retain the assumption that the constraints themselves are given in advance of learning. More recently, however, (Hayes & Wilson 2008) call into question this assumption. In their theory, constraints can be induced from the data by search heuristics that select a small number of highly predictive constraints from a quasi-infinite constraint set. While this approach is used more as an inductive baseline to motivate the introduction of more abstract structures, it is notable in that it makes learning the constraints themselves a nontrivial part of learning. We seek to continue this line of research by providing an additional mechanism of inducing constraints from data. In particular, we develop a connectionist architecture for learning phonotactic constraints. Below we motivate this cognitive architecture and apply it to the problem of learning root occurrence restrictions, or 'OCP effects', in Arabic. Arabic is chosen because large datasets exist, i.e., root lists and psycholinguistic experiments (Frisch et al.

[1]  S. Frisch,et al.  The Psychological Reality of OCP-Place in Arabic , 2001 .

[2]  J. McCarthy Feature Geometry and Dependency: A Review , 1988 .

[3]  David B Pisoni,et al.  Perception of Wordlikeness: Effects of Segment Probability and Length on the Processing of Nonwords. , 2000, Journal of memory and language.

[4]  R. Gómez Variability and Detection of Invariant Structure , 2002, Psychological science.

[5]  Bruce Tesar,et al.  Using Inconsistency Detection to Overcome Structural Ambiguity in Language Learning , 2000 .

[6]  The Role of Similarity in Hungarian Vowel Harmony , 1992 .

[7]  Michael I. Jordan Serial Order: A Parallel Distributed Processing Approach , 1997 .

[8]  P. Boersma,et al.  Empirical Tests of the Gradual Learning Algorithm , 2001, Linguistic Inquiry.

[9]  E. Newport,et al.  Learning at a distance I. Statistical learning of non-adjacent dependencies , 2004, Cognitive Psychology.

[10]  James L. McClelland,et al.  Distributed memory and the representation of general and specific information. , 1985, Journal of experimental psychology. General.

[11]  Joe Pater The harmonic mind : from neural computation to optimality-theoretic grammar , 2009 .

[12]  P. Smolensky,et al.  Optimality Theory: Constraint Interaction in Generative Grammar , 2004 .

[13]  J. Greenberg The Patterning of Root Morphemes in Semitic , 1950, On Language.

[14]  Mary Hare,et al.  The Role of Similarity in Hungarian Vowel Harmony: a Connectionist Account , 1990 .

[15]  J. Elman,et al.  Rethinking Innateness: A Connectionist Perspective on Development , 1996 .

[16]  P. Smolensky On the proper treatment of connectionism , 1988, Behavioral and Brain Sciences.

[17]  Robert M. Gonyea,et al.  Learning at a Distance : , 2009 .

[18]  Joe Pater,et al.  Weighted Constraints in Generative Linguistics , 2009, Cogn. Sci..

[19]  James L. McClelland,et al.  The TRACE model of speech perception , 1986, Cognitive Psychology.

[20]  Bruce Hayes,et al.  A Maximum Entropy Model of Phonotactics and Phonotactic Learning , 2008, Linguistic Inquiry.

[21]  Géraldine Legendre,et al.  Can Connectionism Contribute to Syntax? Harmonic Grammar, with an Application ; CU-CS-485-90 , 1990 .

[22]  Stefan A. Frisch,et al.  Phonetically Based Phonology: Language processing and segmental OCP effects , 2004 .

[23]  Jane K. Cowan,et al.  Hans Wehr: a dictionary of modern written Arabic , 1977 .

[24]  J. McCarthy The phonetics and phonology of Semitic pharyngeals , 1994 .

[25]  Sharon Rose Rethinking Geminates, Long-Distance Geminates, and the OCP , 2000, Linguistic Inquiry.

[26]  L. Samuelson,et al.  Seeing the world through a third eye: Developmental systems theory looks beyond the nativist-empiricist debate. , 2009, Child development perspectives.

[27]  Alan S. Prince,et al.  Faithfulness and Identity in Prosodic Morphology , 1999 .