A Psychophysical Imaging Method Evidencing Auditory Cue Extraction during Speech Perception: A Group Analysis of Auditory Classification Images

Although there is a large consensus regarding the involvement of specific acoustic cues in speech perception, the precise mechanisms underlying the transformation from continuous acoustical properties into discrete perceptual units remains undetermined. This gap in knowledge is partially due to the lack of a turnkey solution for isolating critical speech cues from natural stimuli. In this paper, we describe a psychoacoustic imaging method known as the Auditory Classification Image technique that allows experimenters to estimate the relative importance of time-frequency regions in categorizing natural speech utterances in noise. Importantly, this technique enables the testing of hypotheses on the listening strategies of participants at the group level. We exemplify this approach by identifying the acoustic cues involved in da/ga categorization with two phonetic contexts, Al- or Ar-. The application of Auditory Classification Images to our group of 16 participants revealed significant critical regions on the second and third formant onsets, as predicted by the literature, as well as an unexpected temporal cue on the first formant. Finally, through a cluster-based nonparametric test, we demonstrate that this method is sufficiently sensitive to detect fine modifications of the classification strategies between different utterances of the same phoneme.

[1]  A. Liberman,et al.  Acoustic Loci and Transitional Cues for Consonants , 1954 .

[2]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[3]  A M Liberman,et al.  Perception of the speech code. , 1967, Psychological review.

[4]  Christian K. Machens,et al.  Linearity of Cortical Receptive Fields Measured with Natural Sounds , 2004, The Journal of Neuroscience.

[5]  Jont B. Allen,et al.  A psychoacoustic method for studying the necessary and sufficient perceptual cues of American English fricative consonants in noise. , 2012, The Journal of the Acoustical Society of America.

[6]  J. Gallant,et al.  Complete functional characterization of sensory neurons by system identification. , 2006, Annual review of neuroscience.

[7]  Darragh Smyth,et al.  Methods for first-order kernel estimation: simple-cell receptive fields from responses to natural scenes , 2003, Network.

[8]  Simon Barthelmé,et al.  Improved classification images with sparse priors in a smooth basis. , 2009, Journal of vision.

[9]  Miguel P Eckstein,et al.  Template changes with perceptual learning are driven by feature informativeness. , 2014, Journal of vision.

[10]  A. Ahumada Perceptual Classification Images from Vernier Acuity Masked by Noise , 1996 .

[11]  Keith Johnson,et al.  Phonetic Feature Encoding in Human Superior Temporal Gyrus , 2014, Science.

[12]  Malcolm Slaney,et al.  Lyon's Cochlear Model , 1997 .

[13]  Y. Paulignan,et al.  Neural correlates of non-verbal social interactions: A dual-EEG study , 2014, Neuropsychologia.

[14]  Morgan Sonderegger,et al.  A rational account of perceptual compensation for coarticulation , 2010 .

[15]  N. Kraus,et al.  Brainstem Timing: Implications for Cortical Processing and Literacy , 2005, The Journal of Neuroscience.

[16]  Ulrich H. Frauenfelder,et al.  Phoneme monitoring, syllable monitoring and lexical access. , 1981 .

[17]  Philippe G Schyns,et al.  Accurate statistical tests for smooth classification images. , 2005, Journal of vision.

[18]  Karl J. Friston,et al.  Human Brain Function , 1997 .

[19]  Eva R. M. Joosten,et al.  Human pitch detectors are tuned on a fine scale, but are perceptually accessed on a coarse scale , 2012, Biological Cybernetics.

[20]  Bryan E Pfingst,et al.  Relative contributions of spectral and temporal cues for phoneme recognition. , 2005, The Journal of the Acoustical Society of America.

[21]  A. Liberman,et al.  The role of consonant-vowel transitions in the perception of the stop and nasal consonants. , 1954 .

[22]  Kenneth Knoblauch,et al.  Estimating classification images with generalized linear and additive models. , 2008, Journal of vision.

[23]  Powen Ru,et al.  Multiresolution spectrotemporal analysis of complex sounds. , 2005, The Journal of the Acoustical Society of America.

[24]  A. Lotto,et al.  General contrast effects in speech perception: Effect of preceding liquid on stop consonant identification , 1998, Perception & psychophysics.

[25]  Keith A. Johnson,et al.  Acoustic and Auditory Phonetics , 1997, Phonetica.

[26]  S. Wood Generalized Additive Models: An Introduction with R , 2006 .

[27]  J. Obleser,et al.  Pre-lexical abstraction of speech in the auditory cortex , 2009, Trends in Cognitive Sciences.

[28]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[29]  C. Fowler Compensation for coarticulation reflects gesture perception, not spectral contrast , 2006, Perception & psychophysics.

[30]  Alan Y. Chiang,et al.  Generalized Additive Models: An Introduction With R , 2007, Technometrics.

[31]  Kenneth Knoblauch,et al.  Modeling Psychophysical Data in R , 2012 .

[32]  R. Oostenveld,et al.  Nonparametric statistical testing of EEG- and MEG-data , 2007, Journal of Neuroscience Methods.

[33]  E. Healy,et al.  On the number of auditory filter outputs needed to understand speech: Further evidence for auditory channel independence , 2009, Hearing Research.

[34]  Marion S Régnier,et al.  A method to identify noise-robust perceptual features: application for consonant /t/. , 2008, The Journal of the Acoustical Society of America.

[35]  Thomas E. Nichols,et al.  Thresholding of Statistical Maps in Functional Neuroimaging Using the False Discovery Rate , 2002, NeuroImage.

[36]  Thomas E. Nichols,et al.  Nonparametric Permutation Tests for Functional Neuroimaging , 2003 .

[37]  Nina Kraus,et al.  Brain Stem Response to Speech: A Biological Marker of Auditory Processing , 2005, Ear and hearing.

[38]  C. Fowler,et al.  A critical examination of the spectral contrast account of compensation for coarticulation , 2009, Psychonomic bulletin & review.

[39]  Miguel P Eckstein,et al.  Classification images for detection, contrast discrimination, and identification tasks with a common ideal observer. , 2006, Journal of vision.

[40]  H. Levitt Transformed up-down methods in psychoacoustics. , 1971, The Journal of the Acoustical Society of America.

[41]  Angelika Königseder,et al.  Walter de Gruyter , 2016 .

[42]  Jason M. Gold,et al.  Characterizing perceptual learning with external noise , 2004, Cogn. Sci..

[43]  C. Fowler,et al.  Compensation for coarticulation: disentangling auditory and gestural theories of perception of coarticulatory effects in speech. , 2010, Journal of experimental psychology. Human perception and performance.

[44]  Kenneth Knoblauch,et al.  Frequency and phase contributions to the detection of temporal luminance modulation. , 2005, Journal of the Optical Society of America. A, Optics, image science, and vision.

[45]  A. Ahumada,et al.  Stimulus Features in Signal Detection , 1971 .

[46]  A. Juneja A comparison of automatic and human speech recognition in null grammar. , 2012, The Journal of the Acoustical Society of America.

[47]  P. Bennett,et al.  Deriving behavioural receptive fields for visually completed contours , 2000, Current Biology.

[48]  L. Holt Speech categorization in context: joint effects of nonspeech and speech precursors. , 2006, The Journal of the Acoustical Society of America.

[49]  Sarah M. N. Woolley,et al.  A Generalized Linear Model for Estimating Spectrotemporal Receptive Fields from Responses to Natural Sounds , 2011, PloS one.

[50]  Jie Tian,et al.  Seeing Jesus in toast: Neural and behavioral correlates of face pareidolia , 2014, Cortex.

[51]  Colin T Phillips,et al.  Feasibility of Remote Real-Time Guidance of a Cardiac Examination Performed by Novices Using a Pocket-Sized Ultrasound Device , 2013, Emergency medicine international.

[52]  A. Ahumada Classification image weights and internal noise level estimation. , 2002, Journal of vision.

[53]  V. Mann Influence of preceding liquid on stop-consonant perception. , 1980, Perception & psychophysics.

[54]  N. Prins Psychophysics: A Practical Introduction , 2009 .

[55]  D Norris,et al.  Merging information in speech recognition: Feedback is never necessary , 2000, Behavioral and Brain Sciences.

[56]  V. Mann,et al.  Influence of preceding fricative on stop consonant perception. , 1981, The Journal of the Acoustical Society of America.

[57]  Jont B. Allen,et al.  A psychoacoustic method to find the perceptual cues of stop consonants in natural speech. , 2010, The Journal of the Acoustical Society of America.

[58]  Robert Oostenveld,et al.  FieldTrip: Open Source Software for Advanced Analysis of MEG, EEG, and Invasive Electrophysiological Data , 2010, Comput. Intell. Neurosci..

[59]  Brian C J Moore,et al.  Introduction. The perception of speech: from sound to meaning , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[60]  Kenneth Knoblauch,et al.  Perceptual classification of chromatic modulation , 2005, Visual Neuroscience.

[61]  N. Mesgarani,et al.  Selective cortical representation of attended speaker in multi-talker speech perception , 2012, Nature.

[62]  A. Lotto,et al.  Behavioral examinations of the level of auditory processing of speech context effects , 2002, Hearing Research.

[63]  Sarah M. N. Woolley,et al.  Stimulus-Dependent Auditory Tuning Results in Synchronous Population Coding of Vocalizations in the Songbird Midbrain , 2006, The Journal of Neuroscience.

[64]  M. Hoen,et al.  Using auditory classification images for the identification of fine acoustic cues used in speech perception , 2013, Front. Hum. Neurosci..

[65]  V. Mann,et al.  Perceptual assessment of fricative--stop coarticulation. , 1980, The Journal of the Acoustical Society of America.

[66]  Технология Springer Science+Business Media , 2013 .

[67]  R. Turner,et al.  Deficient approaches to human neuroimaging , 2014, Front. Hum. Neurosci..