Effects of talker continuity and speech rate on auditory working memory

Speech processing is slower and less accurate when listeners encounter speech from multiple talkers compared to one continuous talker. However, interference from multiple talkers has been investigated only using immediate speech recognition or long-term memory recognition tasks. These tasks reveal opposite effects of speech processing time on speech recognition – while fast processing of multi-talker speech impedes immediate recognition, it also results in more abstract and less talker-specific long-term memories for speech. Here, we investigated whether and how processing multi-talker speech disrupts working memory maintenance, an intermediate stage between perceptual recognition and long-term memory. In a digit sequence recall task, listeners encoded seven-digit sequences and recalled them after a 5-s delay. Sequences were spoken by either a single talker or multiple talkers at one of three presentation rates (0-, 200-, and 500-ms inter-digit intervals). Listeners’ recall was slower and less accurate for sequences spoken by multiple talkers than a single talker. Especially for the fastest presentation rate, listeners were less efficient when recalling sequences spoken by multiple talkers. Our results reveal that talker-specificity effects for speech working memory are most prominent when listeners must rapidly encode speech. These results suggest that, like immediate speech recognition, working memory for speech is susceptible to interference from variability across talkers. While many studies ascribe effects of talker variability to the need to calibrate perception to talker-specific acoustics, these results are also consistent with the idea that a sudden change of talkers disrupts attentional focus, interfering with efficient working-memory processing.

[1]  D. Pisoni,et al.  Speech Perception as a Talker-Contingent Process , 1993, Psychological science.

[2]  Barbara G. Shinn-Cunningham,et al.  Influence of Task-Relevant and Task-Irrelevant Feature Continuity on Selective Auditory Attention , 2012, Journal of the Association for Research in Otolaryngology.

[3]  Nancy Niedzielski,et al.  The Effect of Social Information on the Perception of Sociolinguistic Variables , 1999 .

[4]  C. J. Darwin,et al.  Chapter 11 – Auditory Grouping , 1995 .

[5]  L. V. Noorden Temporal coherence in the perception of tone sequences , 1975 .

[6]  S. Goldinger,et al.  Episodic encoding of voice attributes and recognition memory for spoken words. , 1993, Journal of experimental psychology. Learning, memory, and cognition.

[7]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[8]  Satrajit S. Ghosh,et al.  Dysfunction of Rapid Neural Adaptation in Dyslexia , 2016, Neuron.

[9]  Josh H. McDermott,et al.  Attentive Tracking of Sound Sources , 2015, Current Biology.

[10]  N. Cowan What are the differences between long-term, short-term, and working memory? , 2008, Progress in brain research.

[11]  Randall W Engle,et al.  Working memory, short-term memory, and general fluid intelligence: a latent-variable approach. , 1999, Journal of experimental psychology. General.

[12]  A. Liberman,et al.  Tempo of frequency change as a cue for distinguishing classes of speech sounds. , 1956, Journal of experimental psychology.

[13]  Satrajit S. Ghosh,et al.  Phonological Working Memory for Words and Nonwords in Cerebral Cortex. , 2017, Journal of speech, language, and hearing research : JSLHR.

[14]  L. Holt,et al.  Listening for the Norm: Adaptive Coding in Speech Categorization , 2012, Front. Psychology.

[15]  David B. Pisoni,et al.  On the nature of talker variability effects on recall of spoken word lists. , 1991 .

[16]  D. Pisoni,et al.  Talker-specific learning in speech perception , 1998, Perception & psychophysics.

[17]  Christopher M. Conway,et al.  Lack of Cross-Modal Effects in Dual-Modality Implicit Statistical Learning , 2018, Front. Psychol..

[18]  P K Kuhl,et al.  The encoding of rate and talker information during phonetic perception , 1997, Perception & psychophysics.

[19]  I. Winkler,et al.  The role of attention in the formation of auditory streams , 2007, Perception & psychophysics.

[20]  D B Pisoni,et al.  Stimulus variability and processing dependencies in speech perception , 1990, Perception & psychophysics.

[21]  Gregory Hickok,et al.  The functional neuroanatomy of language. , 2009, Physics of life reviews.

[22]  Hari M. Bharadwaj,et al.  Bottom-up influences of voice continuity in focusing selective auditory attention , 2014, Psychological research.

[23]  D. Pisoni,et al.  Effects of stimulus variability on perception and representation of spoken words in memory , 1995, Perception & psychophysics.

[24]  D. Pisoni,et al.  Effects of talker, rate, and amplitude variation on recognition memory for spoken words , 1999, Perception & psychophysics.

[25]  A M Liberman,et al.  Perception of the speech code. , 1967, Psychological review.

[26]  James T. Townsend,et al.  The Stochastic Modeling of Elementary Psychological Processes , 1983 .

[27]  Sophie K. Scott,et al.  What is the relationship between phonological short-term memory and speech processing? , 2006, Trends in Cognitive Sciences.

[28]  Sung-joo Lim,et al.  The Human Neural Alpha Response to Speech is a Proxy of Attentional Control , 2017, Cerebral cortex.

[29]  F. Craik,et al.  The Effect of Speaker's Voice on Word Recognition , 1974 .

[30]  A. Bregman,et al.  Primary auditory stream segregation and perception of order in rapid sequences of tones. , 1971, Journal of experimental psychology.

[31]  T. M. Nearey Static, dynamic, and relational properties in vowel perception. , 1989, The Journal of the Acoustical Society of America.

[32]  P. Luce,et al.  Examining the time course of indexical specificity effects in spoken word recognition. , 2005, Journal of experimental psychology. Learning, memory, and cognition.

[33]  R E Geiselman,et al.  Incidental retention of speaker’s voice , 1977, Memory & cognition.

[34]  C. Schroeder,et al.  The Spectrotemporal Filter Mechanism of Auditory Selective Attention , 2013, Neuron.

[35]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[36]  J. Mullennix,et al.  Some effects of talker variability on spoken word recognition. , 1989, The Journal of the Acoustical Society of America.

[37]  H. Nusbaum,et al.  Acoustic differences, listener expectations, and the perceptual accommodation of talker variability. , 2007, Journal of experimental psychology. Human perception and performance.

[38]  J. Hillenbrand,et al.  Acoustic characteristics of American English vowels. , 1994, The Journal of the Acoustical Society of America.

[39]  A. Baddeley Working memory: looking back and looking forward , 2003, Nature Reviews Neuroscience.

[40]  Timothy D. Griffiths,et al.  Auditory working memory for objects vs. features , 2015, Front. Neurosci..

[41]  I. Nelken,et al.  Modeling the auditory scene: predictive regularity representations and perceptual objects , 2009, Trends in Cognitive Sciences.

[42]  P. Iverson,et al.  Vowel normalization for accent: an investigation of best exemplar locations in northern and southern British English sentences. , 2004, The Journal of the Acoustical Society of America.

[43]  Elizabeth A. Strand,et al.  Auditory–visual integration of talker gender in vowel perception , 1999 .

[44]  Y. Cohen,et al.  The what, where and how of auditory-object perception , 2013, Nature Reviews Neuroscience.

[45]  Sven L Mattys,et al.  On building models of spoken-word recognition: When there is as much to learn from natural “oddities” as artificial normality , 2008, Perception & psychophysics.

[46]  Tyler K. Perrachione,et al.  Varying acoustic-phonemic ambiguity reveals that talker normalization is obligatory in speech processing , 2018, Attention, Perception, & Psychophysics.

[47]  P. Luce,et al.  Spoken Word Recognition: The Challenge of Variation , 2005 .

[48]  P. Wong,et al.  Poor phonetic perceivers are affected by cognitive load when resolving talker variability. , 2015, The Journal of the Acoustical Society of America.

[49]  H. Nusbaum,et al.  Neural Bases of Talker Normalization , 2004, Journal of Cognitive Neuroscience.

[50]  D. Broadbent,et al.  Information Conveyed by Vowels , 1957 .

[51]  Andrew R. A. Conway,et al.  A controlled-attention view of working-memory capacity. , 2001, Journal of experimental psychology. General.

[52]  B. Shinn-Cunningham Object-based auditory and visual attention , 2008, Trends in Cognitive Sciences.

[53]  Trevor Hastie,et al.  Statistical Models in S , 1991 .

[54]  K. von Kriegstein,et al.  Percepts, not acoustic properties, are the units of auditory short-term memory. , 2014, Journal of experimental psychology. Human perception and performance.

[55]  Michael S. Gazzaniga,et al.  Interview: Mark D'Esposito with Michael S. Gazzaniga , 2004, Journal of Cognitive Neuroscience.

[56]  H. Nusbaum,et al.  Perceptual Plasticity for Auditory Object Recognition , 2017, Front. Psychol..

[57]  H. Nusbaum Talker Normalization: Phonetic Constancy as a Cognitive Process , 2011 .

[58]  H. Nusbaum,et al.  Speech perception as an active cognitive process , 2014, Front. Syst. Neurosci..

[59]  Virginia A. Mann,et al.  Distinguishing universal and language-dependent levels of speech perception: Evidence from Japanese listeners' perception of English “l” and “r” , 1986, Cognition.

[60]  Sung-joo Lim,et al.  The Benefit of Attention-to-Memory Depends on the Interplay of Memory Capacity and Memory Load , 2018, Front. Psychol..

[61]  Sung-joo Lim,et al.  Selective Attention to Auditory Memory Neurally Enhances Perceptual Precision , 2015, The Journal of Neuroscience.

[62]  Patrick C M Wong,et al.  Learning a novel phonological contrast depends on interactions between individual differences and training paradigm design. , 2011, The Journal of the Acoustical Society of America.

[63]  James T. Townsend,et al.  Methods of Modeling Capacity in Simple Processing Systems , 2014 .

[64]  Rachel M. Theodore,et al.  Attention modulates specificity effects in spoken word recognition: Challenges to the time-course hypothesis , 2015, Attention, perception & psychophysics.

[65]  Virginia Best,et al.  Object continuity enhances selective auditory attention , 2008, Proceedings of the National Academy of Sciences.

[66]  T. Griffiths,et al.  What is an auditory object? , 2004, Nature Reviews Neuroscience.

[67]  Bharath Chandrasekaran,et al.  Neural Processing of What and Who Information in Speech , 2011, Journal of Cognitive Neuroscience.

[68]  Dave F. Kleinschmidt,et al.  Robust speech perception: recognize the familiar, generalize to the similar, and adapt to the novel. , 2015, Psychological review.

[69]  J. Mullennix,et al.  Effects of talker variability on recall of spoken word lists. , 1989, Journal of experimental psychology. Learning, memory, and cognition.

[70]  B C Moore,et al.  The role of spectral and periodicity cues in auditory stream segregation, measured using a temporal discrimination task. , 1999, The Journal of the Acoustical Society of America.

[71]  David J. Therriault,et al.  A latent variable analysis of working memory capacity, short-term memory capacity, processing speed, and general fluid intelligence , 2002 .

[72]  Dylan M. Jones,et al.  Does auditory streaming require attention? Evidence from attentional selectivity in short-term memory. , 2003, Journal of experimental psychology. Human perception and performance.

[73]  Pamela Souza,et al.  The advantage of knowing the talker. , 2013, Journal of the American Academy of Audiology.

[74]  M. Brysbaert,et al.  Combining speed and accuracy in cognitive psychology: Is the inverse efficiency score (IES) a better dependent variable than the mean reaction time (RT) and the percentage of errors (PE)? , 2011 .

[75]  R. W. Hukin,et al.  Effectiveness of spatial cues, prosody, and talker characteristics in selective attention. , 2000, The Journal of the Acoustical Society of America.

[76]  A. Bregman Auditory Scene Analysis , 2008 .

[77]  Julio González,et al.  Examining talker effects in the perception of native- and foreign-accented speech , 2012, Attention, perception & psychophysics.