What information is necessary for speech categorization? Harnessing variability in the speech signal by integrating cues computed relative to expectations.

Most theories of categorization emphasize how continuous perceptual information is mapped to categories. However, equally important are the informational assumptions of a model, the type of information subserving this mapping. This is crucial in speech perception where the signal is variable and context dependent. This study assessed the informational assumptions of several models of speech categorization, in particular, the number of cues that are the basis of categorization and whether these cues represent the input veridically or have undergone compensation. We collected a corpus of 2,880 fricative productions (Jongman, Wayland, & Wong, 2000) spanning many talker and vowel contexts and measured 24 cues for each. A subset was also presented to listeners in an 8AFC phoneme categorization task. We then trained a common classification model based on logistic regression to categorize the fricative from the cue values and manipulated the information in the training set to contrast (a) models based on a small number of invariant cues, (b) models using all cues without compensation, and (c) models in which cues underwent compensation for contextual factors. Compensation was modeled by computing cues relative to expectations (C-CuRE), a new approach to compensation that preserves fine-grained detail in the signal. Only the compensation model achieved a similar accuracy to listeners and showed the same effects of context. Thus, even simple categorization metrics can overcome the variability in speech when sufficient information is available and compensation schemes like C-CuRE are employed.

[1]  Peilin Wu,et al.  Preliminary Observations , 1830, The Medico-chirurgical review.

[2]  A. Liberman,et al.  Acoustic Loci and Transitional Cues for Consonants , 1954 .

[3]  G. W. Hughes,et al.  Spectral Properties of Fricative Consonants , 1956 .

[4]  B. C. Griffith,et al.  The discrimination of speech sounds within and across phoneme boundaries. , 1957, Journal of experimental psychology.

[5]  P. Strevens Spectra of Fricative Noise in Human Speech , 1960 .

[6]  R. Luce,et al.  Individual Choice Behavior: A Theoretical Analysis. , 1960 .

[7]  P. Denes On the Motor Theory of Speech Perception , 1965 .

[8]  S. Ohman Coarticulation in VCV utterances: spectrographic measurements. , 1966, The Journal of the Acoustical Society of America.

[9]  M. Posner,et al.  On the genesis of abstract ideas. , 1968, Journal of experimental psychology.

[10]  E. Meltzer A Reconsideration of , 1971 .

[11]  Stephen K. Reed,et al.  Pattern recognition and categorization , 1972 .

[12]  David B Pisoni,et al.  On the identification of place and voicing features in synthetic stop consonants. , 1974, Journal of phonetics.

[13]  D. Pisoni,et al.  Reaction times to comparisons within and across phonetic categories , 1974, Perception & psychophysics.

[14]  H. Winitz,et al.  The distribution of perceptual cues in English prevocalic fricatives. , 1975, Journal of speech and hearing research.

[15]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[16]  D. Massaro,et al.  The contribution of fundamental frequency and voice onset time to the /zi/-/si/ distinction. , 1976, The Journal of the Acoustical Society of America.

[17]  S. McKee,et al.  Quantitative studies in retinex theory a comparison between theoretical predictions and observer responses to the “color mondrian” experiments , 1976, Vision Research.

[18]  T. M. Nearey Phonetic feature systems for vowels , 1978 .

[19]  D. Massaro,et al.  Integration of featural information in speech perception. , 1978, Psychological review.

[20]  Douglas L. Medin,et al.  Context theory of classification learning. , 1978 .

[21]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[22]  S. Blumstein,et al.  Invariant cues for place of articulation in stop consonants. , 1978, The Journal of the Acoustical Society of America.

[23]  P. Mermelstein,et al.  On the relationship between vowel and consonant identification when cued by the same acoustic information , 1978, Perception & psychophysics.

[24]  Gregg C. Oden,et al.  Integration of Place and Voicing Information in the Identification of Synthetic Stop Consonants. , 1978 .

[25]  Han-Yong. You An acoustic and perceptual study of English fricatives , 1979 .

[26]  R. Duncan Luce,et al.  Individual Choice Behavior: A Theoretical Analysis , 1979 .

[27]  S. Blumstein,et al.  Perceptual invariance and onset spectra for stop consonants in different vowel environments. , 1980, The Journal of the Acoustical Society of America.

[28]  W. Ganong Phonetic categorization in auditory word perception. , 1980, Journal of experimental psychology. Human perception and performance.

[29]  J. G. Martin,et al.  Perception of anticipatory coarticulation effects. , 1981, The Journal of the Acoustical Society of America.

[30]  Q. Summerfield Articulatory rate and perceptual constancy in phonetic perception. , 1981, Journal of experimental psychology. Human perception and performance.

[31]  S. Soli Second formants in fricatives: Acoustic consequences of fricative‐vowel coarticulation , 1981 .

[32]  V. Mann,et al.  Influence of preceding fricative on stop consonant perception. , 1981, The Journal of the Acoustical Society of America.

[33]  S. Blumstein,et al.  Phonetic features and acoustic invariance in speech , 1981, Cognition.

[34]  D. Whalen Effects of vocalic formant transitions and vowel quality on the English [s]-[ŝ] boundary. , 1981, The Journal of the Acoustical Society of America.

[35]  J. G. Martin,et al.  Perception of anticipatory coarticulation effects in vowel-stop consonant-bowel sequences. , 1982, Journal of experimental psychology. Human perception and performance.

[36]  R. Port,et al.  Consonant/vowel ratio as a cue for voicing in English , 1982, Perception & psychophysics.

[37]  Dominic W. Massaro,et al.  Categorical or continuous speech perception: A new test , 1983, Speech Commun..

[38]  H. Barlow Vision: A computational investigation into the human representation and processing of visual information: David Marr. San Francisco: W. H. Freeman, 1982. pp. xvi + 397 , 1983 .

[39]  S. Blumstein,et al.  A reconsideration of acoustic invariance for place of articulation in diffuse stop consonants: evidence from a cross-language study. , 1981, The Journal of the Acoustical Society of America.

[40]  M. Bornstein,et al.  Discrimination and matching within and between hues measured by reaction times: some implications for categorical perception and levels of information processing , 1984, Psychological research.

[41]  Paul A. Luce,et al.  Time-varying features of initial stop consonants in auditory running spectra: A first report , 1984, Perception & psychophysics.

[42]  C. Fowler Segmentation of coarticulated speech in perception , 1984, Perception & psychophysics.

[43]  D. Homa,et al.  Role of feedback, category size, and stimulus distortion on the acquisition and utilization of ill-defined categories , 1984 .

[44]  A. Liberman,et al.  The motor theory of speech perception revised , 1985, Cognition.

[45]  S. Blumstein,et al.  Limitations of context conditioned effects in the perception of [b] and [w] , 1985, Perception & psychophysics.

[46]  Douglas L. Hintzman,et al.  "Schema Abstraction" in a Multiple-Trace Memory Model , 1986 .

[47]  R. Nosofsky Attention, similarity, and the identification-categorization relationship. , 1986, Journal of experimental psychology. General.

[48]  S R Baum,et al.  Preliminary observations on the use of duration as a cue to syllable-initial fricative consonant voicing in English. , 1987, The Journal of the Acoustical Society of America.

[49]  J. Perkell,et al.  Invariance and variability in speech processes , 1987 .

[50]  L. Cronbach Statistical tests for moderator variables: flaws in analyses recently proposed , 1987 .

[51]  S. Blumstein,et al.  Acoustic characteristics of English voiceless fricatives: a descriptive analysis , 1988 .

[52]  Gregory Ashby,et al.  Toward a Unified Theory of Similarity and Recognition , 1988 .

[53]  T. Crystal,et al.  Segmental durations in connected‐speech signals: Current results , 1988 .

[54]  P. Milenkovic,et al.  Statistical analysis of word-initial voiceless obstruents: preliminary data. , 1988, The Journal of the Acoustical Society of America.

[55]  D. Massaro Testing between the TRACE model and the fuzzy logical model of speech perception , 1989, Cognitive Psychology.

[56]  Douglas H. Whalen,et al.  Vowel and consonant judgments are not independent when cued by the same information , 1989 .

[57]  D H Whalen,et al.  Vowel and consonant judgments are not independent when cued by the same information. , 1987, Perception & psychophysics.

[58]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[59]  J. L. Miller,et al.  Effect of speaking rate on the perceptual structure of a phonetic category , 1989, Perception & psychophysics.

[60]  A. Jongman Duration of frication noise required for identification of English fricatives. , 1989, The Journal of the Acoustical Society of America.

[61]  Terrance M. Nearey,et al.  The segment as a unit of speech perception , 1990 .

[62]  D. H. Whalen,et al.  Perception of overlapping segments: thoughts on Nearey’s model , 1992 .

[63]  S. Blumstein,et al.  Acoustic and perceptual characteristics of voicing in fricatives and fricative clusters. , 1992, The Journal of the Acoustical Society of America.

[64]  T. M. Nearey,et al.  Context Effects in a Double-Weak Theory of Speech Perception , 1992, Language and speech.

[65]  Irving Biederman,et al.  Visual object recognition , 1993 .

[66]  S. Goldinger,et al.  Episodic encoding of voice attributes and recognition memory for spoken words. , 1993, Journal of experimental psychology. Learning, memory, and cognition.

[67]  D. Pisoni,et al.  SPEECH PERCEPTION AS A TALKER-CONTINGENT PROCESS. , 1993, Psychological science.

[68]  J. B. Pickering,et al.  Vowel Perception and Production , 1994 .

[69]  W Marslen-Wilson,et al.  Levels of perceptual representation and process in lexical access: words, phonemes, and features. , 1994, Psychological review.

[70]  B. Ross,et al.  Concepts and Categories , 1994 .

[71]  S. Lipsitz,et al.  Analysis of repeated categorical data using generalized estimating equations. , 1994, Statistics in medicine.

[72]  R. Diehl,et al.  Some Distributional Facts about Fricatives and a Perceptual Explanation , 1994, Phonetica.

[73]  C. Stoel-Gammon,et al.  Cross-Language Differences in Phonological Acquisition: Swedish and American /t/ , 1994, Phonetica.

[74]  Jörgen Pind,et al.  Speaking rate, VOT, and quantity: The search for higher‐order invariants for two Icelandic speech cues , 1994 .

[75]  C A Fowler,et al.  Invariants, specifiers, cues: An investigation of locus equations as information for place of articulation , 1994, Perception & psychophysics.

[76]  S. Blumstein,et al.  The effect of subphonetic differences on lexical access , 1994, Cognition.

[77]  J. Pind,et al.  Speaking rate, voice-onset time, and quantity: The search for higher-order invariants for two Icelandic speech cues , 1995, Perception & psychophysics.

[78]  F. Keil,et al.  Categorical effects in the perception of faces , 1995, Cognition.

[79]  Robert L. Goldstone Effects of Categorization on Color Perception , 1995 .

[80]  B. Lindblom,et al.  Role of articulation in speech perception: clues from production. , 1996, The Journal of the Acoustical Society of America.

[81]  Harvey M. Sussman,et al.  Locus equations as phonetic descriptors of consonantal place of articulation , 1996 .

[82]  J. Ohala,et al.  Speech perception is hearing sounds, not tongues. , 1994, The Journal of the Acoustical Society of America.

[83]  C A Fowler,et al.  Listeners do hear sounds, not tongues. , 1996, The Journal of the Acoustical Society of America.

[84]  Joanne L. Miller Internal Structure of Phonetic Categories , 1997 .

[85]  A. Oliva,et al.  Coarse Blobs or Fine Edges? Evidence That Information Diagnosticity Changes the Perception of Complex Visual Stimuli , 1997, Cognitive Psychology.

[86]  Jennifer S. Pardo,et al.  Perceiving the causes of coarticulatory acoustic variation: Consonant voicing and vowel pitch , 1997, Perception & psychophysics.

[87]  J. Kulikowski,et al.  Colour constancy as a function of hue. , 1997, Acta psychologica.

[88]  P. Schyns,et al.  Categorization creates functional features , 1997 .

[89]  J. Mullennix,et al.  Talker Variability in Speech Processing , 1997 .

[90]  C. J. McGrath,et al.  Effect of exchange rate return on volatility spill-over across trading regions , 2012 .

[91]  T. M. Nearey,et al.  Speech perception as pattern recognition. , 1997, The Journal of the Acoustical Society of America.

[92]  A. Lotto,et al.  General contrast effects in speech perception: Effect of preceding liquid on stop consonant identification , 1998, Perception & psychophysics.

[93]  S. Goldinger Echoes of echoes? An episodic theory of lexical access. , 1998, Psychological review.

[94]  H. Sussman,et al.  Linear correlates in the speech signal: The orderly output constraint , 1998, Behavioral and Brain Sciences.

[95]  Elizabeth A. Strand Uncovering the Role of Gender Stereotypes in Speech Perception , 1999 .

[96]  J. Fritz,et al.  Sensitivity to change. , 1999, Physical therapy.

[97]  D. Pisoni,et al.  Effects of talker, rate, and amplitude variation on recognition memory for spoken words , 1999, Perception & psychophysics.

[98]  Elizabeth A. Strand,et al.  Auditory–visual integration of talker gender in vowel perception , 1999 .

[99]  Konstantinos Koumpis,et al.  Proceedings of the 6th International Conference on Spoken Language Processing , 2000 .

[100]  Allard Jongman,et al.  Acoustic and perceptual properties of English fricatives , 2000, INTERSPEECH.

[101]  A. Jongman,et al.  Acoustic characteristics of English fricatives. , 2000, The Journal of the Acoustical Society of America.

[102]  A. Liberman,et al.  On the relation of speech to language , 2000, Trends in Cognitive Sciences.

[103]  Janet B. Pierrehumbert,et al.  Exemplar dynamics: Word frequency, lenition and contrast , 2000 .

[104]  D. Medin,et al.  Are there kinds of concepts? , 2000, Annual review of psychology.

[105]  Dominic W. Massaro The horse race to language understanding: FLMP was first out of the gate, and has yet to be overtaken , 2000 .

[106]  J. Davidoff,et al.  The categorical perception of colors and facial expressions: The effect of verbal interference , 2000, Memory & cognition.

[107]  C A Fowler,et al.  Perceptual parsing of acoustic consequences of velum lowering from information for vowels , 2000, Perception & psychophysics.

[108]  Robert L. Goldstone,et al.  Altering object representations through category learning , 2001, Cognition.

[109]  M. Tanenhaus,et al.  Subcategorical mismatches and the time course of lexical access: Evidence for lexical competition , 2001 .

[110]  R Smits,et al.  Evidence for hierarchical categorization of coarticulated phonemes. , 2001, Journal of experimental psychology. Human perception and performance.

[111]  D. Gow Assimilation and Anticipation in Continuous Spoken Word Recognition , 2001 .

[112]  R. Smits Hierarchical categorization of coarticulated phonemes: A theoretical analysis , 2001, Perception & psychophysics.

[113]  M. Ernst,et al.  Humans integrate visual and haptic information in a statistically optimal fashion , 2002, Nature.

[114]  R. Jacobs What determines visual cue reliability? , 2002, Trends in Cognitive Sciences.

[115]  Jessica Maye,et al.  Infant sensitivity to distributional information can affect phonetic discrimination , 2002, Cognition.

[116]  James D. Harnsberger,et al.  Language-specific patterns of vowel-to-vowel coarticulation: acoustic structures and their perceptual correlates , 2002, J. Phonetics.

[117]  Kenneth N Stevens,et al.  Toward a model for lexical access based on acoustic landmarks and distinctive features. , 2002, The Journal of the Acoustical Society of America.

[118]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[119]  Michelle R. Molis,et al.  Generalizing a neuropsychological model of visual categorization to auditory categorization of vowels , 2002, Perception & psychophysics.

[120]  M. Tanenhaus,et al.  Gradient effects of within-category phonetic variation on lexical access , 2002, Cognition.

[121]  J. Hillenbrand,et al.  A narrow band pattern-matching model of vowel perception. , 2003, The Journal of the Acoustical Society of America.

[122]  Arjan van Hessen,et al.  The end of categorical perception as we know it , 2003, Speech Commun..

[123]  D. Gow Feature parsing: Feature cue mapping in spoken word recognition , 2003, Perception & psychophysics.

[124]  Bart de Boer,et al.  Investigating the role of infant-directed speech with a computer model , 2003 .

[125]  Sarah Hawkins,et al.  Roles and representations of systematic fine phonetic detail in speech understanding , 2003, J. Phonetics.

[126]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[127]  J. Pierrehumbert Phonetic Diversity, Statistical Learning, and Acquisition of Phonology , 2003, Language and speech.

[128]  Anne Pier Salverda,et al.  The role of prosodic boundaries in the resolution of lexical embedding in speech comprehension , 2003, Cognition.

[129]  Michael Kiefte,et al.  Sensitivity to change in perception of speech , 2003, Speech Commun..

[130]  S. Hawkins Roles and representations of systematic fine phonetic detail in speech understanding [Journal of Phonetics, 31 (2003) 373-405] , 2004, J. Phonetics.

[131]  Cynthia M Connine,et al.  It’s not what you hear but how often you hear it: On the neglected role of phonological variant frequency in auditory word recognition , 2004, Psychonomic bulletin & review.

[132]  Robert Allen Fox,et al.  Acoustic and spectral characteristics of young children's fricative productions: a developmental perspective. , 2005, The Journal of the Acoustical Society of America.

[133]  D H Whalen,et al.  Perception of pitch location within a speaker's F0 range. , 2005, The Journal of the Acoustical Society of America.

[134]  Morten H. Christiansen,et al.  The differential role of phonological and distributional cues in grammatical categorisation , 2005, Cognition.

[135]  Timothy Koschmann,et al.  Concepts and Categories , 2005 .

[136]  P. Luce,et al.  Examining the time course of indexical specificity effects in spoken word recognition. , 2005, Journal of experimental psychology. Learning, memory, and cognition.

[137]  P. Boersma Praat : doing phonetics by computer (version 4.4.24) , 2006 .

[138]  Benjamin Munson,et al.  The acoustic and perceptual bases of judgments of women and men's sexual orientation from read speech , 2006, J. Phonetics.

[139]  Allard Jongman,et al.  Categorization of sounds. , 2006, Journal of experimental psychology. Human perception and performance.

[140]  K. Gegenfurtner,et al.  Memory modulates color appearance , 2006, Nature Neuroscience.

[141]  L. Holt The mean matters: effects of statistically defined nonspeech spectral distributions on speech categorization. , 2006, The Journal of the Acoustical Society of America.

[142]  B. Munson The Acoustic Correlates of Perceived Masculinity, Perceived Femininity, and Perceived Sexual Orientation , 2007, Language and speech.

[143]  Laurel Fais,et al.  Infant-directed speech supports phonetic category learning in English and Japanese , 2007, Cognition.

[144]  R. Port How are words stored in memory? Beyond phones and phonemes , 2007 .

[145]  James L. McClelland,et al.  Unsupervised learning of vowel categories from infant-directed speech , 2007, Proceedings of the National Academy of Sciences.

[146]  Bob McMurray,et al.  Context Effects on Musical Chord Categorization: Different Forms of Top-Down Feedback in Speech and Music? , 2008, Cogn. Sci..

[147]  Holger Mitterer,et al.  Recalibrating Color Categories Using World Knowledge , 2008, Psychological science.

[148]  Cynthia M Connine,et al.  Processing variant forms in spoken word recognition: The role of variant frequency , 2008, Perception & psychophysics.

[149]  Sarah C. Creel,et al.  Heeding the voice of experience: The role of talker variation in lexical access , 2008, Cognition.

[150]  A. Jongman,et al.  Perception of clear fricatives by normal-hearing and simulated hearing-impaired listeners. , 2008, The Journal of the Acoustical Society of America.

[151]  Richard N Aslin,et al.  Tracking the time course of phonetic cue integration during spoken word recognition , 2008, Psychonomic bulletin & review.

[152]  Michael J. Spivey,et al.  Gradient sensitivity to within-category variation in words and syllables. , 2008, Journal of experimental psychology. Human perception and performance.

[153]  Richard N Aslin,et al.  Statistical learning of phonetic categories: insights from a computational approach. , 2009, Developmental science.

[154]  D. Roberson,et al.  Thresholds for color discrimination in English and Korean speakers , 2009, Cognition.

[155]  Naomi H. Feldman,et al.  The influence of categories on perception: explaining the perceptual magnet effect as optimal statistical inference. , 2009, Psychological review.

[156]  P. Boersma Praat : doing phonetics by computer (version 5.1.05) , 2009 .

[157]  A. Jongman,et al.  Acoustic characteristics of clearly spoken English fricatives. , 2009, The Journal of the Acoustical Society of America.

[158]  M. Tanenhaus,et al.  Within-category VOT affects recovery from "lexical" garden paths: Evidence against phoneme-level inhibition. , 2009, Journal of memory and language.

[159]  C. Fowler,et al.  A critical examination of the spectral contrast account of compensation for coarticulation , 2009, Psychonomic bulletin & review.

[160]  Stephanie Huette,et al.  Continuous dynamics of color categorization , 2010, Psychonomic bulletin & review.

[161]  A. Lotto,et al.  Speech perception as categorization , 2010, Attention, perception & psychophysics.

[162]  Joseph C. Toscano,et al.  Continuous Perception and Graded Categorization , 2010, Psychological science.

[163]  Fabian A. Soto,et al.  Error-driven learning in visual categorization and object recognition: a common-elements model. , 2010, Psychological review.

[164]  Kenneth N. Stevens,et al.  Quantal theory, enhancement and overlap , 2010, J. Phonetics.

[165]  Jennifer Cole,et al.  Unmasking the acoustic effects of vowel-to-vowel coarticulation: A statistical modeling approach , 2010, J. Phonetics.

[166]  Cue integration with categories: A statistical approach to cue weighting and combination in speech perception , 2010 .

[167]  Emergent Information-Level Coupling Between Perception and Production , 2011 .

[168]  Cécile Fougeron,et al.  The Oxford Handbook of Laboratory Phonology , 2011 .

[169]  Cheyenne Munson,et al.  Features as an emergent product of computing perceptual cues relative to expectations , 2011 .

[170]  J. Ohala The listener as a source of sound change , 2012 .