Glimpsing speech

The purpose of this paper is to provide further support for the notion of multiple looks or ‘glimpses’ in everyday speech perception, and to highlight some of the obstacles that confront any system, computational or biological, wishing to exploit them. Moore (2003) provides arguments for an intelligent temporal integration of speech, motivated by the ‘multiple looks’ model of Viemeister and Wakefield (1991). Their model involved the detection of brief tones in noise, but Moore argues that such a process might also apply to speech perception. The idea that listeners are able to integrate glimpses of speech to form linguistic percepts was suggested by Miller and Licklider (1950), who demonstrated the high intelligibility of interrupted speech. Informally, a glimpse may be defined as an arbitrary time–frequency region which contains a reasonably undistorted view of the target signal. Section 2 describes some studies which contribute to the appeal of glimpses, while the problem of defining a glimpse is tackled in Section 3. The glimpses notion has also influenced computational approaches to robust automatic speech recognition (ASR), leading to the development of missing data theory (Cooke, Green, & Crawford, 1994; Cooke, Green, Josifovski, & Vizinho, 2001). A recent implementation of these ideas (Barker, Cooke, & Green, 2001) was one of the top performers in the 2001 AURORA global evaluation of robust ASR. Computational aspects of glimpsing are discussed further in section 4.

[1]  Sarah Hawkins,et al.  polysp: a polysystemic, phonetically-rich approach to speech understanding , 2001 .

[2]  Phil D. Green,et al.  Handling missing data in speech recognition , 1994, ICSLP.

[3]  N. Viemeister,et al.  Temporal integration and multiple looks. , 1991, The Journal of the Acoustical Society of America.

[4]  Hervé Glotin,et al.  A new SNR-feature mapping for robust multistream speech recognition , 1999 .

[5]  Brian C. J. Moore,et al.  Temporal integration and context effects in hearing , 2003, J. Phonetics.

[6]  G. A. Miller,et al.  The Intelligibility of Interrupted Speech , 1948 .

[7]  R Drullman,et al.  Temporal envelope and fine structure cues for speech intelligibility. , 1994, The Journal of the Acoustical Society of America.

[8]  Richard Lippmann,et al.  Accurate consonant perception without mid-frequency speech energy , 1996, IEEE Trans. Speech Audio Process..

[9]  Birger Kollmeier,et al.  Estimation of the signal-to-noise ratio with amplitude modulation spectrograms , 2002, Speech Commun..

[10]  Peter F. Assmann,et al.  The Perception of Speech Under Adverse Conditions , 2004 .

[11]  Louis C. W. Pols,et al.  Perisegmental speech improves consonant and vowel identification , 1999, Speech Communication.

[12]  Ljubomir Josifovski,et al.  Robust Automatic Speech Recognition with Missing and Unreliable Data , 2003 .

[13]  Richard Lippmann,et al.  Speech recognition by machines and humans , 1997, Speech Commun..

[14]  Hideki Kawahara,et al.  Missing-data model of vowel identification. , 1999, The Journal of the Acoustical Society of America.

[15]  J. Culling,et al.  Perceptual and computational separation of simultaneous vowels: cues arising from low-frequency beating. , 1994, The Journal of the Acoustical Society of America.

[16]  Jon Barker,et al.  Robust ASR based on clean speech models: an evaluation of missing data techniques for connected digit recognition in noise , 2001, INTERSPEECH.

[17]  S. Rosen,et al.  Uncomodulated glimpsing in "checkerboard" noise. , 1993, The Journal of the Acoustical Society of America.

[18]  Malcolm Slaney,et al.  A critique of pure audition , 1998 .

[19]  Sarel van Vuuren,et al.  Relevance of time-frequency features for phonetic and speaker-channel classification , 2000, Speech Commun..

[20]  R Drullman,et al.  Speech intelligibility in noise: relative contribution of speech elements above and below the noise level. , 1995, The Journal of the Acoustical Society of America.

[21]  J. Jenkins,et al.  Dynamic specification of coarticulated vowels. , 1983, The Journal of the Acoustical Society of America.

[22]  Daniel P. W. Ellis,et al.  The auditory organization of speech and other sources in listeners and computational models , 2001, Speech Commun..

[23]  Philipos C Loizou,et al.  The intelligibility of speech with "holes" in the spectrum. , 2002, The Journal of the Acoustical Society of America.