Roles and representations of systematic fine phonetic detail in speech understanding

Abstract This paper aims to show how we can make progress in elucidating how people understand speech by changing our focus of inquiry from abstraction of formal units of linguistic analysis to a detailed analysis of global aspects of the communicative situation, of which speech is just one part. It uses evidence of (a) the communicative importance of fine phonetic detail and (b) exemplar memory for speech to explore the idea that, in certain normal, easy conversations at least, one may interpret the meaning of an utterance directly from the global sound pattern; reference to formal linguistic units of analysis, such as phonemes, words, and grammar, is incidental; circumstances dictate whether such reference takes place at all, and if it takes place, whether it does so after the meaning has been understood, before it has been understood, or simultaneously with the construction of meaning. The implications of this position are that speech perception does not demand early reference to abstract linguistic units, but instead, to flexible, dynamic organization of multi-modal (and modality-specific) memories; and that models of speech perception should reflect the multi-purpose function of phonetic information, and the polysystemic nature of speech within language. A preliminary model that reflects this theoretical position, Polsyp, is described. Polysp has intellectual antecedents in Hebbian principles, and current relevance to adaptive resonance theory (ART). Neuronal bases for initial processing of exemplars are briefly discussed. Hierarchical and more abstract processing arises in an ART-like, self-organizing dynamic system in which, once processing has begun, the sensory input is not effectively distinguishable from top-down knowledge. Understanding meaning is more important than identifying linguistic structure, and processing is strongly guided by rhythmic and attentional factors.

[1]  D. Pisoni,et al.  Speech Perception as a Talker-Contingent Process , 1993, Psychological science.

[2]  S. Goldinger Echoes of echoes? An episodic theory of lexical access. , 1998, Psychological review.

[3]  S. Grossberg,et al.  Towards a theory of the laminar architecture of cerebral cortex: computational clues from the visual system. , 2003, Cerebral cortex.

[4]  R J Wise,et al.  Separate neural subsystems within 'Wernicke's area'. , 2001, Brain : a journal of neurology.

[5]  Daniel P. W. Ellis,et al.  The auditory organization of speech and other sources in listeners and computational models , 2001, Speech Commun..

[6]  D. Stuss,et al.  Principles of frontal lobe function , 2002 .

[7]  S. Grossberg,et al.  Neural dynamics of perceptual order and context effects for variable-rate speech syllables , 1999, Perception & psychophysics.

[8]  Sarah Hawkins,et al.  ARGUMENTS FOR A NONSEGMENTAL VIEW OF SPEECH PERCEPTION , 1995 .

[9]  Sharon Y. Manuel,et al.  Speakers nasalize /∂/ after /n/, but listeners still hear /∂/ , 1995 .

[10]  P. Howell,et al.  Some properties of auditory memory for rapid formant transitions , 1977, Memory & cognition.

[11]  Sarah Hawkins,et al.  Influence of syllable-coda voicing on the acoustic properties of syllable-onset /l/ in English , 2004, J. Phonetics.

[12]  John Local,et al.  Variable domains and variable relevance: interpreting phonetic exponents , 2003, J. Phonetics.

[13]  Albert S. Bregman,et al.  The Auditory Scene. (Book Reviews: Auditory Scene Analysis. The Perceptual Organization of Sound.) , 1990 .

[14]  Robert E. Remez,et al.  Establishing and maintaining perceptual coherence: unimodal and multimodal evidence , 2003, J. Phonetics.

[15]  S. M. Sheffert,et al.  Contributions of surface and conceptual information to recognition memory , 1998, Perception & psychophysics.

[16]  Klaus J. Kohler Domains of temporal control in speech and language From utterance to segment , 2003 .

[17]  L. Nakatani,et al.  Locus of segmental cues for word juncture. , 1977, The Journal of the Acoustical Society of America.

[18]  E. Maguire,et al.  The Human Hippocampus and Spatial and Episodic Memory , 2002, Neuron.

[19]  Friedemann Pulvermüller,et al.  Neuromagnetic evidence for early semantic access in word recognition , 2001, The European journal of neuroscience.

[20]  C. Best The emergence of native-language phonological influences in infants: A perceptual assimilation model. , 1994 .

[21]  C. Fowler An event approach to the study of speech perception from a direct realist perspective , 1986 .

[22]  Matthew H. Davis,et al.  Hierarchical Processing in Spoken Language Comprehension , 2003, The Journal of Neuroscience.

[23]  T. Hartley,et al.  A Linguistically Constrained Model of Short-Term Memory for Nonwords ☆ , 1996 .

[24]  Taehong Cho,et al.  Domain-initial articulatory strengthening in four languages , 2003 .

[25]  Matthew P. Aylett,et al.  Proceedings of the XIVth International Congress of Phonetic Sciences , 1999 .

[26]  Stuart N. Wrigley,et al.  Synfire chains as a neural mechanism for auditory grouping , 1999 .

[27]  Richard S. J. Frackowiak,et al.  Differential activation of right and left posterior sylvian regions by semantic and phonological tasks: a positron-emission tomography study in normal human subjects , 1994, Neuroscience Letters.

[28]  Sophie K. Scott,et al.  PET and fMRI studies of the neural basis of speech perception , 2003, Speech Commun..

[29]  P. Keating,et al.  Articulatory strengthening at edges of prosodic domains. , 1997, The Journal of the Acoustical Society of America.

[30]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[31]  Robert F. Port,et al.  Meter and speech , 2003, J. Phonetics.

[32]  John Coleman,et al.  Discovering the acoustic correlates of phonological contrasts , 2003, J. Phonetics.

[33]  Alison Tunley,et al.  Coarticulatory influences of liquids on vowels in English , 1999 .

[34]  Sarah Hawkins,et al.  Spread of CV and v-to-v coarticulation in british English: implications for the intelligibility of synthetic speech , 1994, ICSLP.

[35]  A. Huggins,et al.  Just noticeable differences for segment duration in natural speech. , 1969, The Journal of the Acoustical Society of America.

[36]  R. Cole,et al.  How are syllables used to recognize words? , 1980, The Journal of the Acoustical Society of America.

[37]  Joan L. Bybee Phonology and Language Use , 2004, Phonetica.

[38]  Stephen Grossberg,et al.  Parallel auditory filtering by sustained and transient channels separates coarticulated vowels and consonants , 1997, IEEE Trans. Speech Audio Process..

[39]  Kenneth N Stevens,et al.  Toward a model for lexical access based on acoustic landmarks and distinctive features. , 2002, The Journal of the Acoustical Society of America.

[40]  A. Liberman,et al.  The motor theory of speech perception revised , 1985, Cognition.

[41]  H. Damasio Human Brain Anatomy in Computerized Images , 1995 .

[42]  T. Shallice,et al.  Category specific semantic impairments , 1984 .

[43]  Paula West,et al.  Perception of distributed coarticulatory properties of English /l/ and /r/ , 1999 .

[44]  N. Viemeister,et al.  Temporal integration and multiple looks. , 1991, The Journal of the Acoustical Society of America.

[45]  Sarah Hawkins,et al.  polysp: a polysystemic, phonetically-rich approach to speech understanding , 2001 .

[46]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[47]  John A. King,et al.  Memory for events and their spatial context: models and experiments. , 2001, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[48]  Brian MacWhinney,et al.  The emergence of language. , 1999 .

[49]  D. Pisoni,et al.  Reaction times to comparisons within and across phonetic categories , 1974, Perception & psychophysics.

[50]  No Value Proceedings of the 14th international congress of phonetic sciences , 2000 .

[51]  Paul Foulkes,et al.  Descriptive adequacy in phonology: a variationist perspective , 1997, Journal of Linguistics.

[52]  Richard Ogden,et al.  A declarative account of strong and weak auxiliaries in English , 1999, Phonology.

[53]  Leslie G. Ungerleider,et al.  Neural correlates of category-specific knowledge , 1996, Nature.

[54]  D. Pisoni,et al.  Talker-specific learning in speech perception , 1998, Perception & psychophysics.

[55]  Sarah Hawkins,et al.  SYNTHESIZING SYSTEMATIC VARIATION AT BOUNDARIES BETWEEN VOWELS AND OBSTRUENTS , 1999 .

[56]  P. Jusczyk From general to language-specific capacities: the WRAPSA Model of how speech perception develops , 1993 .

[57]  B. Murdock,et al.  Memory for Serial Order , 1989 .

[58]  E. Warrington,et al.  Categories of knowledge. Further fractionations and an attempted integration. , 1987, Brain : a journal of neurology.

[59]  J. R. Firth,et al.  Studies in Linguistic Analysis. , 1974 .

[60]  B. Lindblom Role of articulation in speech perception: clues from production. , 1996, The Journal of the Acoustical Society of America.

[61]  J. Elman Learning and development in neural networks: the importance of starting small , 1993, Cognition.

[62]  W. Strange Speech perception and linguistic experience : issues in cross-language research , 1995 .

[63]  Neil Burgess,et al.  The hippocampus, space, and viewpoints in episodic memory , 2002, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[64]  H. Nusbaum,et al.  The development of speech perception : the transition from speech sounds to spoken words , 1997 .

[65]  Tessa Bent,et al.  The clear speech effect for non-native listeners. , 2002, The Journal of the Acoustical Society of America.

[66]  V.W. Zue,et al.  The use of speech knowledge in automatic speech recognition , 1985, Proceedings of the IEEE.

[67]  R. Remez A guide to research on the perception of speech. , 1994 .

[68]  Bruce W. A. Whittlesea On the Construction of Behavior and Subjective Experience: the Production and Evaluation of Performance , 2002 .

[69]  S A Duffy,et al.  Comprehension of Synthetic Speech Produced by Rule: A Review and Theoretical Interpretation , 1992, Language and speech.

[70]  Jeffrey S. Bowers,et al.  Rethinking Implicit Memory , 2002 .

[71]  M. R. Jones,et al.  Time, our lost dimension: toward a new theory of perception, attention, and memory. , 1976, Psychological review.

[72]  Friedemann Pulvermüller,et al.  The Neuroscience of Language: On Brain Circuits of Words and Serial Order , 2003 .

[73]  P. J. Alfonso,et al.  Dynamics of Vowel Articulation , 1982 .

[74]  Sarah Hawkins,et al.  Perception of coda voicing from properties of the onset and nucleus of 'led' and 'let' , 2001, INTERSPEECH.

[75]  N. Burgess,et al.  Memory for serial order: A network model of the phonological loop and its timing , 1999 .

[76]  D B Pisoni,et al.  Some experiments on perceptual learning of mirror-image acoustic patterns , 1982, Perception & psychophysics.

[77]  Stephen Grossberg,et al.  Resonant neural dynamics of speech perception , 2003, J. Phonetics.

[78]  A. Huggins,et al.  On the perception of temporal phenomena in speech. , 1972, The Journal of the Acoustical Society of America.

[79]  John Kingston,et al.  Papers in Laboratory Phonology: Index of names , 1990 .

[80]  R. Patterson,et al.  Encoding of the temporal regularity of sound in the human brainstem , 2001, Nature Neuroscience.

[81]  Jennifer S. Pardo,et al.  On the perceptual organization of speech. , 1994, Psychological review.

[82]  Brian C. J. Moore,et al.  Temporal integration and context effects in hearing , 2003, J. Phonetics.

[83]  Sarah Hawkins,et al.  Phonetic influences on the intelligibility of conversational speech , 1994 .

[84]  Moshe Abeles,et al.  Corticonics: Neural Circuits of Cerebral Cortex , 1991 .

[85]  C. Best A direct realist view of cross-language speech perception , 1995 .

[86]  W. Marslen-Wilson,et al.  Continuous uptake of acoustic cues in spoken word recognition , 1987, Perception & psychophysics.

[87]  Björn Lindblom,et al.  Speech transforms , 1992, Speech Commun..

[88]  Ann R. Bradlow Confluent talker- and listener-oriented forces in clear speech production , 2002 .

[89]  Friedemann Pulverm Uuml,et al.  Words in the brain's language , 1999 .

[90]  Janet B. Pierrehumbert,et al.  Word-specific phonetics , 2001 .

[91]  R. Nosofsky Exemplar-Based Accounts of Relations Between Classification, Recognition, and Typicality , 1988 .

[92]  Jacques Durand,et al.  Phonetics, phonology, and cognition , 2002 .

[93]  E. Tulving,et al.  Episodic and semantic memory , 1972 .

[94]  Betty Tuller,et al.  Computational models in speech perception , 2003, J. Phonetics.

[95]  F. Craik,et al.  Aging, memory, and frontal lobe functioning. , 2002 .

[96]  R. Nosofsky Tests of an exemplar model for relating perceptual classification and recognition memory. , 1991, Journal of experimental psychology. Human perception and performance.

[97]  D. Poeppel,et al.  Towards a functional neuroanatomy of speech perception , 2000, Trends in Cognitive Sciences.

[98]  S. Scott,et al.  Identification of a pathway for intelligible speech in the left temporal lobe. , 2000, Brain : a journal of neurology.

[99]  Dennis H. Klatt,et al.  Speech perception: a model of acoustic–phonetic analysis and lexical access , 1979 .

[100]  F. Pulvermüller,et al.  Words in the brain's language , 1999, Behavioral and Brain Sciences.

[101]  S. Hawkins,et al.  Phonetic Interpretation Papers in Laboratory Phonology VI: Effects on word recognition of syllable-onset cues to syllable-coda voicing , 2004 .

[102]  John Local,et al.  Disentangling Autosegments from Prosodies: A Note on the Misrepresentation of a Research Tradition in Phonology. , 1994 .

[103]  M. Abeles Local Cortical Circuits: An Electrophysiological Study , 1982 .

[104]  E. Bard,et al.  Controlling the Intelligibility of Referring Expressions in Dialogue , 2000 .

[105]  Richard S. J. Frackowiak,et al.  Analysis of temporal structure in sound by the human brain , 1998, Nature Neuroscience.

[106]  C. J. Darwin,et al.  Which harmonics contribute to the estimation of first formant frequency? , 1985, Speech Commun..

[107]  Tom Hartley Syllabic Phase: A bottom-up representation of the structure of speech , 2002 .

[108]  Elizabeth A. Strand,et al.  Auditory–visual integration of talker gender in vowel perception , 1999 .

[109]  E. Tulving,et al.  Organization of memory. , 1973 .

[110]  J. Mullennix,et al.  Talker Variability in Speech Processing , 1997 .

[111]  John Coleman,et al.  Cognitive reality and the phonological lexicon: A review , 1998, Journal of Neurolinguistics.

[112]  E. Large,et al.  The dynamics of attending: How people track time-varying events. , 1999 .

[113]  Shihab Shamma Physiological foundations of temporal integration in the perception of speech , 2003, J. Phonetics.

[114]  Joanne L. Miller,et al.  Speech Perception , 1990, Springer Handbook of Auditory Research.

[115]  Neil Burgess,et al.  Models of spatial cognition , 2003 .

[116]  Tamiko Azuma,et al.  Puzzle-solving science: the quixotic quest for units in speech perception , 2003, J. Phonetics.

[117]  David C. Plaut,et al.  The emergence of phonology from the interplay of speech comprehension and production ; A distributed connectionist approach , 1998 .

[118]  J. Hay Causes and Consequences of Word Structure , 2003 .

[119]  Mark Huckvale,et al.  ProSynth: an integrated prosodic approach to device-independent, natural-sounding speech synthesis , 1998, Comput. Speech Lang..