The relationship between speech perception and auditory organisation : studies with spectrally reduced speech.

Listeners are remarkably adept at recognising speech that has undergone extensive spectral reduction. Natural speech can be reproduced using as few as three time-varying sinusoids mimicking the corresponding speech formants. Untrained listeners are able to transcribe this `sine-wave' speech with a high degree of reliability. Phonetic percepts generated by sine-wave speech occur despite an apparent lack of the cues on which low level grouping processes are believed to operate. Consequently, it has been proposed that speech perception is governed by processes operating independently of those described by auditory scene analysis. This thesis examines the auditory scene analysis account in relation to sine-wave speech perception through a mixture of perceptual and computational studies. A re-examination is made of evidence provided by previous perceptual studies of sine-wave speech that the application of a simple grouping cue may increase the intelligibility of sine-wave speech. New evidence is presented from a perceptual study employing stimuli constructed from simultaneous sinewave speech sources. This study demonstrates that in conditions that are closer to those of everyday listening, grouping cues have an important role in the formation of coherent speech percepts. In conjunction with these perceptual studies, results from automatic segregation and recognition tasks suggest that sine-wave speech contains su cient low level, non-speech-speci c structure to allow partial descriptions of sine-wave sources to be recovered from two source mixtures. It is argued that these partial descriptions are su cient to support the limited intelligibility observed in two-source sinewave speech listening tests. It is shown that the recognition of sine-wave speech may proceed directly from natural speech models if a peak-based representation and missing-data recognition strategy are employed. These techniques are also shown to suitable for the recognition of natural speech in noisy conditions. In conclusion, it is considered that because sine-wave speech possesses residual primitive structure and may allow the action of schema-driven organisation, then its perception may be accommodated within the auditory scene analysis account. The Relationship between Speech Perception and Auditory Organisation: Studies with Spectrally Reduced Speech

[1]  S. Pinker,et al.  Auditory streaming and the building of timbre. , 1978, Canadian journal of psychology.

[2]  R. M. Warren,et al.  Spectral redundancy: Intelligibility of sentences heard through narrow spectral slits , 1995, Perception & psychophysics.

[3]  R. B. Gardner,et al.  Perceptual grouping of formants with static and dynamic differences in fundamental frequency , 1989 .

[4]  Linda B. Smith,et al.  The role of phonemes and syllables in the perceived similarity of speech sounds for children , 1986, Memory & cognition.

[5]  Daniel Reisberg,et al.  On the Perception of Interleaved Melodies , 1995 .

[6]  J A Bashford,et al.  Spectral restoration of speech: Intelligibility is increased by inserting noise in spectral gaps , 1997, Perception & psychophysics.

[7]  T Houtgast,et al.  A physical method for measuring speech-transmission quality. , 1980, The Journal of the Acoustical Society of America.

[8]  C. Darwin,et al.  Spectral integration based on common amplitude modulation , 1985, Perception & psychophysics.

[9]  T. C. Rand,et al.  Dichotic release from masking for speech , 1974 .

[10]  B. Moore,et al.  Thresholds for hearing mistuned partials as separate tones in harmonic complexes. , 1986, The Journal of the Acoustical Society of America.

[11]  R V Shannon,et al.  Speech Recognition with Primarily Temporal Cues , 1995, Science.

[12]  R B Gardner,et al.  Mistuning a harmonic of a vowel: grouping and phase effects on vowel quality. , 1986, The Journal of the Acoustical Society of America.

[13]  C. M. Marin,et al.  Segregation of concurrent sounds. II: Effects of spectral envelope tracing, frequency modulation coherence, and frequency modulation width. , 1991, The Journal of the Acoustical Society of America.

[14]  S. Arlinger,et al.  Normal-hearing and hearing-impaired subjects' ability to just follow conversation in competing speech, reversed speech, and noise backgrounds. , 1992, Journal of speech and hearing research.

[15]  N. Sutherland,et al.  Grouping Frequency Components of Vowels: When is a Harmonic not a Harmonic? , 1984 .

[16]  G L Dannenbring,et al.  Perceived auditory continuity with alternately rising and falling frequency transitions. , 1976, Canadian journal of psychology.

[17]  K N Stevens ACOUSTIC PROPERTIES USED FOR THE IDENTIFICATION OF SPEECH SOUNDS a , 1983, Annals of the New York Academy of Sciences.

[18]  R. Cole,et al.  Perception of temporal order in speech: the role of vowel transitions. , 1973, Canadian journal of psychology.

[19]  Harvey b. Fletcher,et al.  Speech and hearing in communication , 1953 .

[20]  A S Bregman,et al.  The effects of auditory streaming on duplex perception , 1989, Perception & psychophysics.

[21]  A. Bregman,et al.  The perceptual segregation of simultaneous auditory signals: Pulse train segregation and vowel segregation , 1989, Perception & psychophysics.

[22]  S. McAdams Segregation of concurrent sounds. I: Effects of frequency modulation coherence. , 1989, The Journal of the Acoustical Society of America.

[23]  Richard F. Lyon,et al.  A perceptual pitch detector , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[24]  Stuart Anstis,et al.  Adaptation to auditory streaming of frequency-modulated tones. , 1985 .

[25]  K. Stevens Acoustic correlates of some phonetic categories. , 1979, The Journal of the Acoustical Society of America.

[26]  J. Culling,et al.  Perceptual and computational separation of simultaneous vowels: cues arising from low-frequency beating. , 1994, The Journal of the Acoustical Society of America.

[27]  Gary L. Dannenbring,et al.  The effect of continuity on auditory stream segregation , 1973 .

[28]  G. A. Miller,et al.  The Trill Threshold , 1950 .

[29]  D. Pisoni Auditory and phonetic memory codes in the discrimination of consonants and vowels , 1973, Perception & psychophysics.

[30]  Peter Ladefoged,et al.  On the Fusion of Sounds Reaching Different Sense Organs , 1957 .

[31]  C. Darwin,et al.  On the Dynamic Use of Prosody in Speech Perception , 1975 .

[32]  S Buus,et al.  Release from masking caused by envelope fluctuations. , 1985, The Journal of the Acoustical Society of America.

[33]  Q Summerfield,et al.  Use of Visual Information for Phonetic Perception , 1979, Phonetica.

[34]  J W Hall,et al.  Perceptual organization in a comodulation masking release interference paradigm: exploring the role of amplitude modulation, frequency modulation, and harmonicity. , 1995, The Journal of the Acoustical Society of America.

[35]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[36]  Phil D. Green,et al.  Some solution to the missing feature problem in data classification, with application to noise robust ASR , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[37]  John F. Culling,et al.  Speech perception seen through the ear , 1989, Speech Commun..

[38]  K. Stevens,et al.  Linguistic experience alters phonetic perception in infants by 6 months of age. , 1992, Science.

[39]  Martin Cooke,et al.  Modelling auditory processing and organisation , 1993, Distinguished dissertations in computer science.

[40]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[41]  D. Pisoni,et al.  Speech perception without traditional speech cues. , 1981, Science.

[42]  B C Moore,et al.  Modulation discrimination interference and auditory grouping. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[43]  B C Moore,et al.  Comodulation masking release (CMR): effects of signal frequency, flanking-band frequency, masker bandwidth, flanking-band level, and monotic versus dichotic presentation of the flanking band. , 1987, The Journal of the Acoustical Society of America.

[44]  Q. Summerfield,et al.  Modeling the perception of concurrent vowels: vowels with different fundamental frequencies. , 1990, The Journal of the Acoustical Society of America.

[45]  Michaël Titus Maria Scheffers,et al.  Sifting vowels. Auditory pitch analysis and sound segregation. , 1983 .

[46]  Dennis H. Klatt,et al.  Software for a cascade/parallel formant synthesizer , 1980 .

[47]  Joseph W. Hall,et al.  Detection in noise by spectro-temporal pattern analysis. , 1984, The Journal of the Acoustical Society of America.

[48]  R. Carlyon,et al.  Discriminating between coherent and incoherent frequency modulation of complex tones. , 1991, The Journal of the Acoustical Society of America.

[49]  Sieb G. Nooteboom,et al.  Contributions of prosody to speech perception , 1976 .

[50]  Phil D. Green,et al.  Handling missing data in speech recognition , 1994, ICSLP.

[51]  A S Bregman,et al.  Auditory streaming is cumulative. , 1978, Journal of experimental psychology. Human perception and performance.

[52]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[53]  Albert S. Bregman Psychological data and computational ASA , 1998 .

[54]  R. Plomp,et al.  Effect of temporal envelope smearing on speech reception. , 1994, The Journal of the Acoustical Society of America.

[55]  L. Demany,et al.  The effect of vibrato on the recognition of masked vowels , 1990, Perception & psychophysics.

[56]  A S Bregman,et al.  Propagation of constraints in auditory organization , 1989, Perception & psychophysics.

[57]  E. Cudahy,et al.  Effects of a contralateral interference tone on auditory recognition , 1974 .

[58]  R. Carlyon,et al.  The psychophysics of concurrent sound segregation. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[59]  B. Moore,et al.  Effects of spectral smearing on the intelligibility of sentences in noise , 1993 .

[60]  M. Wertheimer Untersuchungen zur Lehre von der Gestalt. II , 1923 .

[61]  B. Moore,et al.  Vowel identification based on amplitude modulation. , 1996, The Journal of the Acoustical Society of America.

[62]  T D Carrell,et al.  The effect of amplitude comodulation on auditory object formation in sentence perception , 1992, Perception & psychophysics.

[63]  B. Moore,et al.  Temporal window shape as a function of frequency and level. , 1989, The Journal of the Acoustical Society of America.

[64]  R. Plomp,et al.  Effect of reducing slow temporal modulations on speech reception. , 1994, The Journal of the Acoustical Society of America.

[65]  D. Broadbent,et al.  Perception of Sequence in Auditory Events , 1960 .

[66]  Phil D. Green,et al.  Missing data techniques for robust speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[67]  R Drullman,et al.  Temporal envelope and fine structure cues for speech intelligibility. , 1994, The Journal of the Acoustical Society of America.

[68]  C. Darwin,et al.  The Quarterly Journal of Experimental Psychology Section a Human Experimental Psychology Perceptual Grouping of Speech Components Differing in Fundamental Frequency and Onset-time Perceptual Grouping of Speech Components Differing in Fundamental Frequency and Onset-time , 2022 .

[69]  P E Rubin,et al.  On the perception of speech from time-varying acoustic information: Contributions of amplitude variation , 1990, Perception & psychophysics.

[70]  Richard Lippmann,et al.  Accurate consonant perception without mid-frequency speech energy , 1996, IEEE Trans. Speech Audio Process..

[71]  Peter D. Eimas,et al.  Organization in the Perception of Speech by Young Infants , 1992 .

[72]  Q. Summerfield,et al.  Modeling the perception of concurrent vowels: vowels with the same fundamental frequency. , 1989, The Journal of the Acoustical Society of America.

[73]  Q. Summerfield Some preliminaries to a comprehensive account of audio-visual speech perception. , 1987 .

[74]  A. Treisman Contextual Cues in Selective Listening , 1960 .

[75]  S. Rosen Temporal information in speech: acoustic, auditory and linguistic aspects. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[76]  Jont B. Allen,et al.  ASA Edition of Speech and Hearing in Communication , 1996 .

[77]  R B Gardner,et al.  Grouping of vowel harmonics by frequency modulation: Absence of effects on phonemic categorization , 1986, Perception & psychophysics.

[78]  Brian C. J. Moore,et al.  THE INFLUENCE OF EXTRANEOUS SOUNDS ON THE PERCEPTUAL ESTIMATION OF FIRST-FORMANT FREQUENCY IN VOWELS UNDER CONDITIONS OF ASYNCHRONY , 1991 .

[79]  Further investigation into the influence of preceding liquids on stop consonant perception , 1981 .

[80]  J. L. Miller,et al.  On the role of visual rate information in phonetic perception , 1985, Perception & psychophysics.

[81]  A M Liberman,et al.  Duplex perception of cues for stop consonants: Evidence for a phonetic mode , 1981, Perception & psychophysics.

[82]  H. McGurk,et al.  Visual influences on speech perception processes , 1978, Perception & psychophysics.

[83]  C. Darwin,et al.  Perceptual separation of simultaneous vowels: within and across-formant grouping by F0. , 1993, The Journal of the Acoustical Society of America.

[84]  A S Bregman,et al.  Perceived continuity of gliding and steady-state tones through interrupting noise , 1987, Perception & psychophysics.

[85]  D D Dirks,et al.  Masking effects of speech competing messages. , 1969, Journal of speech and hearing research.

[86]  Phil D. Green,et al.  Auditory scene analysis and hidden Markov model recognition of speech in noise , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[87]  P E Rubin,et al.  On the perception of intonation from sinusoidal sentences , 1984, Perception & psychophysics.

[88]  U. Tilmann Zwicker,et al.  Auditory recognition of diotic and dichotic vowel pairs , 1984, Speech Commun..

[89]  J. Culling,et al.  Auditory segregation of competing voices: absence of effects of FM or AM coherence. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[90]  B Rakerd,et al.  Evidence of Talker-Independent Information for Vowels , 1986, Language and speech.

[91]  L. L. Elliott Development of auditory narrow-band frequency contours. , 1967, The Journal of the Acoustical Society of America.

[92]  Patti Price,et al.  The DARPA 1000-word resource management database for continuous speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[93]  J. Driver Enhancement of selective listening by illusory mislocation of speech sounds due to lip-reading , 1996, Nature.

[94]  A. Liberman,et al.  Some experiments on the sound of silence in phonetic perception. , 1979, The Journal of the Acoustical Society of America.

[95]  L. V. Noorden Temporal coherence in the perception of tone sequences , 1975 .

[96]  J. Cutting Auditory and linguistic processes in speech perception: inferences from six fusions in dichotic listening. , 1976, Psychological review.