Chapter 6 – Speech Perception within a Biologically Realistic Information-Theoretic Framework

Publisher Summary Conceptualizing speech perception as a process by which phonemes are retrieved from acoustic signals is tradition. Within this tradition, research in speech perception has been focused often on problems concerning segmentation and lack of invariance. The problem of segmentation refers to the fact that if phonetic units exist, they are not like typed letters on a page. Instead, they overlap extensively in time much like cursive handwriting. The problem of lack of invariance is related to the segmentation problem. Because speech sounds are produced such that articulations for one consonant or vowel overlaps with the production of preceding ones and vice versa, every consonant and vowel produced in fluent connected speech is dramatically colored by its neighbors. Some of the most recalcitrant problems in the study of speech perception are the consequence of adopting discrete phonetic units as a level of analysis, a level that is not discrete and may not be real. In connected speech, acoustic realization of the beginning and end of one word also overlaps with sounds of preceding and following words; hence the problems of invariance and segmentation are not restricted to phonetic units. Speech perception follows a handful of general principles that are implemented in both sophisticated and not-so-sophisticated ways through the chain of processing from periphery through central nervous system.

[1]  Harvey b. Fletcher,et al.  Speech and hearing in communication , 1953 .

[2]  J. Werker,et al.  Developmental changes across childhood in the perception of non-native speech sounds. , 1983, Canadian journal of psychology.

[3]  B. Cardozo Ohm's Law and Masking , 1967 .

[4]  M. S. Keshner 1/f noise , 1982, Proceedings of the IEEE.

[5]  R. Diehl,et al.  Patterns of acoustic variance in native and non-native phonemes: The case of Japanese production of /r/ and /l/ , 2003 .

[6]  D. Pisoni,et al.  Speech perception without traditional speech cues. , 1981, Science.

[7]  Perception of cross‐language vowel differences: A longitudinal study of native Spanish learners of English , 2002 .

[8]  N. Viemeister,et al.  Forward masking by enhanced components in harmonic complexes. , 1982, The Journal of the Acoustical Society of America.

[9]  R. Diehl,et al.  On the Objects of Speech Perception , 1989 .

[10]  R L Smith,et al.  Adaptation, saturation, and physiological masking in single auditory-nerve fibers. , 1979, The Journal of the Acoustical Society of America.

[11]  R N Aslin,et al.  Statistical Learning by 8-Month-Old Infants , 1996, Science.

[12]  B. Yandell,et al.  Magnetic resonance imaging procedures to study the concurrent anatomic development of vocal tract structures: preliminary results. , 1999, International journal of pediatric otorhinolaryngology.

[13]  C. Fowler An event approach to the study of speech perception from a direct realist perspective , 1986 .

[14]  A. Lotto,et al.  General contrast effects in speech perception: Effect of preceding liquid on stop consonant identification , 1998, Perception & psychophysics.

[15]  P. Bertelson,et al.  Literacy training and speech segmentation , 1986, Cognition.

[16]  Bertrand Delgutte,et al.  Representation of speech-like sounds in the discharge patterns of auditory-nerve fibers. , 1979 .

[17]  Dennis H. Klatt,et al.  Software for a cascade/parallel formant synthesizer , 1980 .

[18]  J. C. R. Licklider,et al.  Detection of a Pulsed Sinusoid in Noise as a Function of Frequency , 1959 .

[19]  David J. Freedman,et al.  Categorical representation of visual stimuli in the primate prefrontal cortex. , 2001, Science.

[20]  Xiaoqin Wang,et al.  Contrast Tuning in Auditory Cortex , 2003, Science.

[21]  David L. Webb,et al.  One cannot hear the shape of a drum , 1992, math/9207215.

[22]  R. Voss,et al.  ’’1/f noise’’ in music: Music from 1/f noise , 1978 .

[23]  Michael Kiefte,et al.  The relative importance of spectral tilt in monophthongs and diphthongs. , 2005, The Journal of the Acoustical Society of America.

[24]  Keith R. Kluender,et al.  Virtues and perils of an empiricist approach to speech perception , 1999 .

[25]  James L. McClelland,et al.  Structure and deterioration of semantic memory: a neuropsychological and computational investigation. , 2004, Psychological review.

[26]  C. C. Wood Discriminability, response bias, and phoneme categories in discrimination of voice onset time. , 1976, The Journal of the Acoustical Society of America.

[27]  J. Sussman,et al.  Perception of formant transition cues to place of articulation in children with language impairments. , 1993, Journal of speech and hearing research.

[28]  L. Polka Cross-language speech perception in adults: phonemic, phonetic, and acoustic contributions. , 1991, The Journal of the Acoustical Society of America.

[29]  S. Grossberg,et al.  Neural network models of categorical perception , 2000, Perception & psychophysics.

[30]  I. Nelken,et al.  Processing of low-probability sounds by cortical neurons , 2003, Nature Neuroscience.

[31]  P. Luce,et al.  Similarity neighbourhoods of words in young children's lexicons , 1990, Journal of Child Language.

[32]  P. Benson,et al.  Are faces of different species perceived categorically by human observers? , 1997, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[33]  Keith R. Kluender,et al.  Speech perception as a tractable problem in cognitive science. , 1994 .

[34]  P. Bertelson,et al.  Does awareness of speech as a sequence of phones arise spontaneously? , 1979, Cognition.

[35]  Carol A. Fowler,et al.  Young infants’ perception of liquid coarticulatory influences on following stop consonants , 1990, Perception & psychophysics.

[36]  Philip J. Benson,et al.  Categorical Perception of Facial Expressions: Categories and their Internal Structure , 1997 .

[37]  Jonathan Winawer,et al.  Image segmentation and lightness perception , 2005, Nature.

[38]  Hudson Hoagland,et al.  QUANTITATIVE ASPECTS OF CUTANEOUS SENSORY ADAPTATION. I , 1933, The Journal of general physiology.

[39]  Virginia A. Mann,et al.  Distinguishing universal and language-dependent levels of speech perception: Evidence from Japanese listeners' perception of English “l” and “r” , 1986, Cognition.

[40]  J S Logan,et al.  Cross-language evidence for three factors in speech perception , 1985, Perception & psychophysics.

[41]  R. Frisina,et al.  Sensitivity of auditory-nerve fibers to changes in intensity: a dichotomy between decrements and increments. , 1985, The Journal of the Acoustical Society of America.

[42]  J A Kelso,et al.  The nonlinear dynamics of speech categorization. , 1994, Journal of experimental psychology. Human perception and performance.

[43]  W. James,et al.  The will to believe (From The Will to Believe and Other Essays in Popular Philosophy). , 2000 .

[44]  A. Lotto,et al.  Role of experience for language-specific functional mappings of vowel sounds. , 1998, The Journal of the Acoustical Society of America.

[45]  L. Riggs,et al.  The disappearance of steadily fixated visual test objects. , 1953, Journal of the Optical Society of America.

[46]  V. Mann,et al.  Influence of preceding fricative on stop consonant perception. , 1981, The Journal of the Acoustical Society of America.

[47]  Stephen A. Ritz,et al.  Distinctive features, categorical perception, and probability learning: some applications of a neural model , 1977 .

[48]  M. Hauser,et al.  Segmentation of the speech stream in a non-human primate: statistical learning in cotton-top tamarins , 2001, Cognition.

[49]  Christman Rj,et al.  Shifts in pitch as a function of prolonged stimulation with pure tones. , 1954 .

[50]  F. Keil,et al.  Categorical effects in the perception of faces , 1995, Cognition.

[51]  A. Liberman,et al.  The motor theory of speech perception revised , 1985, Cognition.

[52]  M. Studdert-Kennedy,et al.  Theoretical notes. Motor theory of speech perception: a reply to Lane's critical review. , 1970, Psychological review.

[53]  Reply to "An analytical error invalidates the `depolarization' of the perceptual magnet effect" [J. Acoust. Soc. Am. 107, 3576-3577 (2000)] , 2000 .

[54]  H B Barlow,et al.  The knowledge used in vision and where it comes from. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[55]  A. Liberman,et al.  An effect of linguistic experience: The discrimination of [r] and [l] by native speakers of Japanese and English , 1975 .

[56]  Bertrand Delgutte,et al.  Auditory Neural Processing of Speech , 2002 .

[57]  P R Killeen,et al.  Japanese quail can learn phonetic categories. , 1987, Science.

[58]  S. Scott,et al.  Identification of a pathway for intelligible speech in the left temporal lobe. , 2000, Brain : a journal of neurology.

[59]  Jeffrey R Binder,et al.  Human brain regions involved in recognizing environmental sounds. , 2004, Cerebral cortex.

[60]  Suzanne Curtin,et al.  PRIMIR: A Developmental Framework of Infant Speech Processing , 2005 .

[61]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[62]  Robert I. Damper,et al.  Extracting Phonetic Knowledge from Learning Systems: Perceptrons, Support Vector Machines and Linear Discriminants , 2004, Applied Intelligence.

[63]  J M Festen,et al.  Relations between auditory functions in normal hearing. , 1981, The Journal of the Acoustical Society of America.

[64]  L. Lisker Rapid versus rabid: A catalogue of acoustic features that may cue the distinction , 1977 .

[65]  W. D. Ward,et al.  Categorical Perception of Musical Intervals , 1974 .

[66]  J. D. Smith,et al.  What child is this? What interval was that? Familiar tunes and music perception in novice listeners , 1994, Cognition.

[67]  B. Delgutte,et al.  Speech coding in the auditory nerve: IV. Sounds with consonant-like dynamic characteristics. , 1984, The Journal of the Acoustical Society of America.

[68]  Terrance M. Nearey,et al.  Speech perception as pattern recognition. , 1995, The Journal of the Acoustical Society of America.

[69]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[70]  R. Voss,et al.  ‘1/fnoise’ in music and speech , 1975, Nature.

[71]  P. Luce,et al.  An examination of similarity neighbourhoods in young children's receptive vocabularies , 1995, Journal of Child Language.

[72]  Robert L. Goldstone Influences of categorization on perceptual discrimination. , 1994 .

[73]  G. A. Miller The Perception of Speech. , 1951 .

[74]  T. M. Nearey,et al.  Identification of resynthesized /hVd/ utterances: effects of formant contour. , 1999, The Journal of the Acoustical Society of America.

[75]  P. Kuhl Perception of auditory equivalence classes for speech in early infancy , 1983 .

[76]  Contributions of gross spectral properties and duration of spectral change to perception of stop consonants , 2005 .

[77]  Scott P. Johnson,et al.  Visual statistical learning in infancy: evidence for a domain general learning mechanism , 2002, Cognition.

[78]  F. Attneave Applications of information theory to psychology: A summary of basic concepts, methods, and results. , 1961 .

[79]  R V Shannon,et al.  Speech Recognition with Primarily Temporal Cues , 1995, Science.

[80]  A. Lotto,et al.  Neighboring spectral content influences vowel identification. , 2000, The Journal of the Acoustical Society of America.

[81]  W. D. Ward,et al.  Categorical perception--phenomenon or epiphenomenon: evidence from experiments in the perception of melodic musical intervals. , 1978, The Journal of the Acoustical Society of America.

[82]  S. Lively,et al.  An examination of the perceptual magnet effect , 1993 .

[83]  B. Lindblom Spectrographic Study of Vowel Reduction , 1963 .

[84]  A. Lotto,et al.  Perceptual compensation for coarticulation by Japanese quail (Coturnix coturnix japonica). , 1997, The Journal of the Acoustical Society of America.

[85]  J. Werker,et al.  Cross-language speech perception: Evidence for perceptual reorganization during the first year of life , 1984 .

[86]  Richard N Aslin,et al.  Statistical learning of new visual feature combinations by infants , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[87]  J. Werker,et al.  Phonemic and phonetic factors in adult cross-language speech perception. , 1984, The Journal of the Acoustical Society of America.

[88]  J. Binder,et al.  The new neuroanatomy of speech perception. , 2000, Brain : a journal of neurology.

[89]  K. Stevens,et al.  Linguistic experience alters phonetic perception in infants by 6 months of age. , 1992, Science.

[90]  Amanda C. Walley,et al.  The Role of Vocabulary Development in Children′s Spoken Word Recognition and Segmentation Ability , 1993 .

[91]  B. C. Griffith,et al.  The discrimination of speech sounds within and across phoneme boundaries. , 1957, Journal of experimental psychology.

[92]  Janet F. Werker,et al.  Cross-language speech perception: Initial capabilities and developmental change. , 1988 .

[93]  M. Studdert-Kennedy,et al.  On the role of formant transitions in vowel recognition. , 1967, The Journal of the Acoustical Society of America.

[94]  Mark S. Seidenberg,et al.  Explaining derivational morphology as the convergence of codes , 2000, Trends in Cognitive Sciences.

[95]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[96]  M. Goldstein,et al.  Multivariate Analysis: Methods and Applications , 1984 .

[97]  L. Holt Temporally Nonadjacent Nonlinguistic Sounds Affect Speech Categorization , 2005, Psychological science.

[98]  D Sutton,et al.  Relation of psychophysical data to histopathology in monkeys with cochlear implants. , 1981, Acta oto-laryngologica.

[99]  D. Jamieson,et al.  Training non-native speech contrasts in adults: Acquisition of the English /ð/-/θ/ contrast by francophones , 1986 .

[100]  Q Summerfield,et al.  Perceiving vowels from uniform spectra: Phonetic exploration of an auditory aftereffect , 1984, Perception & psychophysics.

[101]  L Polka,et al.  Characterizing the influence of native language experience on adult speech perception , 1992, Perception & psychophysics.

[102]  P. Kuhl Speech perception in early infancy: perceptual constancy for spectrally dissimilar vowel categories. , 1979, The Journal of the Acoustical Society of America.

[103]  Raymond D. Kent,et al.  Development of vocal tract length during early childhood: a magnetic resonance imaging study. , 2005, The Journal of the Acoustical Society of America.

[104]  Neal F. Viemeister,et al.  Adaptation of Masking , 1980 .

[105]  V. Mann Influence of preceding liquid on stop-consonant perception , 1980 .

[106]  J. Werker,et al.  Developmental aspects of cross-language speech perception. , 1981, Child development.

[107]  C. Best,et al.  Examination of perceptual reorganization for nonnative speech contrasts: Zulu click discrimination by English-speaking adults and infants. , 1988, Journal of experimental psychology. Human perception and performance.

[108]  John J. Magee,et al.  Categorical perception of facial expressions , 1992, Cognition.

[109]  Q. Summerfield,et al.  Auditory enhancement of changes in spectral amplitude. , 1987, The Journal of the Acoustical Society of America.

[110]  D. Perrett,et al.  Facial expression megamix: Tests of dimensional and category accounts of emotion recognition , 1997, Cognition.

[111]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[112]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[113]  B. Moore,et al.  Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. , 1983, The Journal of the Acoustical Society of America.

[114]  W. R. Garner,et al.  The amount of information in absolute judgments. , 1951 .

[115]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[116]  S. Locke,et al.  Categorical perception in a non-linguistic mode. , 1973, Cortex; a journal devoted to the study of the nervous system and behavior.

[117]  David A. Medler,et al.  Cerebral Cortex doi:10.1093/cercor/bhi040 Cerebral Cortex Advance Access published February 9, 2005 , 2022 .

[118]  B. Repp Phonetic trading relations and context effects : new experimental evidence for a speech mode of perception , 1982 .

[119]  Julia L. Evans,et al.  Categorical perception of speech by children with specific language impairments. , 2005, Journal of speech, language, and hearing research : JSLHR.

[120]  D. Perrett,et al.  Categorical Perception of Morphed Facial Expressions , 1996 .

[121]  Elizabeth K. Johnson,et al.  Statistical learning of tone sequences by human infants and adults , 1999, Cognition.

[122]  W. S. Rhode,et al.  Effects of contrast between onsets of speech and other complex spectra. , 2003, Journal of the Acoustical Society of America.

[123]  Eero P. Simoncelli,et al.  Natural image statistics and neural representation. , 2001, Annual review of neuroscience.

[124]  Eero P. Simoncelli,et al.  Natural signal statistics and sensory gain control , 2001, Nature Neuroscience.

[125]  Amanda C. Walley,et al.  Spoken vocabulary growth: Its role in the development of phoneme awareness and early reading ability , 2003 .

[126]  Robert J. Zatorre,et al.  12 – Functional and Structural Imaging of the Human Auditory System , 2000 .

[127]  H. Barlow The exploitation of regularities in the environment by the brain. , 2001, The Behavioral and brain sciences.

[128]  H. Zwaardemaker,et al.  Die Physiologie des Geruchs , 1895 .

[129]  R. Jenison On Acoustic Information for Motion , 1997 .

[130]  Mark S. Seidenberg,et al.  Phonology, reading acquisition, and dyslexia: insights from connectionist models. , 1999 .

[131]  J. Flege,et al.  Effects of experience on non-native speakers' production and perception of English vowels , 1997 .

[132]  J J Zwislocki,et al.  Responses of some neurons of the cochlear nucleus to tone-intensity increments. , 1971, The Journal of the Acoustical Society of America.

[133]  H. Sussman,et al.  Performance on a Test of Categorical Perception of Speech in Normal and Communication Disordered Children. , 1979 .

[134]  T. Houtgast Psychophysical evidence for lateral inhibition in hearing. , 1972, The Journal of the Acoustical Society of America.

[135]  F. Attneave Some informational aspects of visual perception. , 1954, Psychological review.