Sequential stream segregation of voiced and unvoiced speech sounds based on fundamental frequency

&NA; Differences in fundamental frequency (F0) between voiced sounds are known to be a strong cue for stream segregation. However, speech consists of both voiced and unvoiced sounds, and less is known about whether and how the unvoiced portions are segregated. This study measured listeners' ability to integrate or segregate sequences of consonant‐vowel tokens, comprising a voiceless fricative and a vowel, as a function of the F0 difference between interleaved sequences of tokens. A performance‐based measure was used, in which listeners detected the presence of a repeated token either within one sequence or between the two sequences (measures of voluntary and obligatory streaming, respectively). The results showed a systematic increase of voluntary stream segregation as the F0 difference between the two interleaved sequences increased from 0 to 13 semitones, suggesting that F0 differences allowed listeners to segregate speech sounds, including the unvoiced portions. In contrast to the consistent effects of voluntary streaming, the trend towards obligatory stream segregation at large F0 differences failed to reach significance. Listeners were no longer able to perform the voluntary‐streaming task reliably when the unvoiced portions were removed from the stimuli, suggesting that the unvoiced portions were used and correctly segregated in the original task. The results demonstrate that streaming based on F0 differences occurs for natural speech sounds, and that the unvoiced portions are correctly assigned to the corresponding voiced portions. HighlightsThe stimuli used in the study consisted in an unvoiced fricative consonant and a voiced vowel (CV token).Listeners could use a difference in F0 to segregate alternating CV tokens.Evidence for both obligatory and voluntary stream segregation was found.Listeners did not base their judgments on the vowel part only or the consonant part only.Listeners were no longer able to perform the task without the fricative part of the stimuli.Listeners were able to segregate the whole tokens based on &Dgr;F0 despite the lack of F0 cues in the fricative part.

[1]  A. Oxenham,et al.  Objective and Subjective Psychophysical Measures of Auditory Stream Integration and Segregation , 2010, Journal of the Association for Research in Otolaryngology.

[2]  N. Grimault,et al.  Sequential streaming, binaural cues and lateralization. , 2015, The Journal of the Acoustical Society of America.

[3]  R. M. Warren,et al.  Illusory changes of distinct speech upon repetition--the verbal transformation effect. , 1961, British journal of psychology.

[4]  Brian Roberts,et al.  Build-up of the tendency to segregate auditory streams: resetting effects evoked by a single deviant tone. , 2010, The Journal of the Acoustical Society of America.

[5]  D S Brungart,et al.  Informational and energetic masking effects in the perception of two simultaneous talkers. , 2001, The Journal of the Acoustical Society of America.

[6]  S. G. Nooteboom,et al.  Intonation and the perceptual separation of simultaneous voices , 1982 .

[7]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[8]  Deniz Başkent,et al.  Factors limiting vocal-tract length discrimination in cochlear implant simulations. , 2015, The Journal of the Acoustical Society of America.

[9]  Neil A. Macmillan,et al.  Detection Theory: A User's Guide , 1991 .

[10]  K. Harris Cues for the Discrimination of American English Fricatives in Spoken Syllables , 1958 .

[11]  Douglas Johnson,et al.  Stream Segregation and Peripheral Channeling , 1991 .

[12]  Roy D. Patterson,et al.  Auditory Stream Segregation Based on Speaker Size, and Identification of Size-Modulated Vowel Sequences , 2007 .

[13]  B C Moore,et al.  The role of spectral and periodicity cues in auditory stream segregation, measured using a temporal discrimination task. , 1999, The Journal of the Acoustical Society of America.

[14]  Nicolas Grimault,et al.  Effect of spectral smearing on the perceptual segregation of vowel sequences , 2007, Hearing Research.

[15]  D. Whalen Subcategorical phonetic mismatches slow phonetic judgments , 1984, Perception & psychophysics.

[16]  Brian Roberts,et al.  The verbal transformation effect and the perceptual organization of speech: Influence of formant transitions and F0-contour continuity , 2015, Hearing Research.

[17]  Stuart Anstis,et al.  Adaptation to auditory streaming of frequency-modulated tones. , 1985 .

[18]  L. V. Noorden Temporal coherence in the perception of tone sequences , 1975 .

[19]  C. Darwin,et al.  Effects of fundamental frequency and vocal-tract length changes on attention to one of two simultaneous talkers. , 2003, The Journal of the Acoustical Society of America.

[20]  B. Moore,et al.  Primitive stream segregation of tone sequences without differences in fundamental frequency or passband. , 2002, The Journal of the Acoustical Society of America.

[21]  J. Bird Effects of a difference in fundamental frequency in separating two sentences. , 1997 .

[22]  A. Jongman,et al.  Acoustic characteristics of English fricatives. , 2000, The Journal of the Acoustical Society of America.

[23]  Brian C J Moore,et al.  Properties of auditory stream formation , 2012, Philosophical Transactions of the Royal Society B: Biological Sciences.

[24]  R. W. Hukin,et al.  Effectiveness of spatial cues, prosody, and talker characteristics in selective attention. , 2000, The Journal of the Acoustical Society of America.

[25]  B H Repp,et al.  Two strategies in fricative discrimination , 1981, Perception & psychophysics.

[26]  Q. Summerfield,et al.  Modeling the perception of concurrent vowels: vowels with different fundamental frequencies. , 1990, The Journal of the Acoustical Society of America.

[27]  Albert S. Bregman,et al.  The Auditory Scene. (Book Reviews: Auditory Scene Analysis. The Perceptual Organization of Sound.) , 1990 .

[28]  A. Cutler,et al.  Formant transitions in fricative identification: the role of native fricative inventory. , 2006, The Journal of the Acoustical Society of America.

[29]  T. Baer,et al.  Harmonics-to-noise ratio as an index of the degree of hoarseness. , 1982, The Journal of the Acoustical Society of America.

[30]  A. Oxenham,et al.  Sequential stream segregation in the absence of spectral cues. , 1999, The Journal of the Acoustical Society of America.

[31]  G. A. Miller The masking of speech. , 1947, Psychological bulletin.

[32]  R. Cole,et al.  Perception of temporal order in speech: the role of vowel transitions. , 1973, Canadian journal of psychology.

[33]  E. C. Cmm,et al.  on the Recognition of Speech, with , 2008 .