Active hearing, active speaking

A static view of the world permeates most research in speech and hearing. In this idealised situation, sources don’t move and neither do listeners; the acoustic environment doesn’t change; and speakers speak without any effect of auditory input from their own voice or other speakers. Corpora for speech research and most behavioural tasks have grown to reflect the static viewpoint. Yet it is clear that speech and hearing takes place in a world where none of the static assumptions hold, or at least not for long. The dynamic view complicates tradi- tional signal processing approaches, and renders conventional evaluation processes unrepeatable since the observer’s dynamics influence the signals received at the ears. However, the dynamic viewpoint also provides many opportunities for active processes to exploit. Some of these, such as the use of head movements to resolve front-back confusions, are well-known, while others exist solely as hypotheses. This paper reviews known and potential benefits of active processes in both hearing and speech production, and goes on to describe two recent studies which demonstrate the value of such processes. The first shows how dynamic cues can be used to estimate distance in an acoustic environment. The second demonstrates that the changes in speech production which take place when other speakers are active result in increased glimpsing opportunities at the ear of the interlocutor.

[1]  M. Picheny,et al.  Speaking clearly for the hard of hearing. II: Acoustic characteristics of clear and conversational speech. , 1986, Journal of speech and hearing research.

[2]  D H Ashmead,et al.  Contribution of listeners' approaching motion to auditory distance perception. , 1995, Journal of experimental psychology. Human perception and performance.

[3]  Hiroshi G. Okuno,et al.  Real-Time Sound Source Localization and Separation Based on Active Audio-Visual Integration , 2003, IWANN.

[4]  Francine R. Chen,et al.  Acoustic characteristics and intelligibility of clear and conversational speech at the segmental level , 1980 .

[5]  H. Wallach,et al.  The role of head movements and vestibular and visual cues in sound localization. , 1940 .

[6]  Sue Harding,et al.  Auditory Gist Perception: An Alternative to Attentional Selection of Auditory Streams? , 2008, WAPCV.

[7]  R. H. Bernacki,et al.  Effects of noise on speech production: acoustic and perceptual analyses. , 1988, The Journal of the Acoustical Society of America.

[8]  E. C. Cmm,et al.  on the Recognition of Speech, with , 2008 .

[9]  Magister Philip Mackensen Auditive Localization. Head movements, an additional cue in Localization , 2004 .

[10]  A. R. Palmer,et al.  Some investigations into non-passive listening , 2007, Hearing Research.

[11]  W R Thurlow,et al.  Head movements during sound localization. , 1967, The Journal of the Acoustical Society of America.

[12]  D. R. Campbell,et al.  A MATLAB Simulation of “ Shoebox ” Room Acoustics for use in Research and Teaching , 2022 .

[13]  Roger K. Moore Spoken language processing: Piecing together the puzzle , 2007, Speech Commun..

[14]  Martin Cooke,et al.  Speech production modifications produced by competing talkers, babble, and stationary noise. , 2008, The Journal of the Acoustical Society of America.

[15]  Erik Berglund,et al.  Sound source localisation through active audition , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Thomas S. Huang,et al.  Theory of Reconstruction from Image Motion , 1992 .

[17]  Barbara G. Shinn-Cunningham,et al.  Learning Reverberation: Considerations for Spatial Auditory Displays , 2000 .

[18]  T. D. Hanley,et al.  Effect of level of distracting noise upon speaking rate, duration and intensity. , 1949, The Journal of speech disorders.

[19]  John R. Hershey,et al.  Super-human multi-talker speech recognition: the IBM 2006 speech separation challenge system , 2006, INTERSPEECH.

[20]  Lihi Zelnik-Manor,et al.  Multi-body Factorization with Uncertainty: Revisiting Motion Consistency , 2005, International Journal of Computer Vision.

[21]  A. Bronkhorst,et al.  Auditory distance perception in humans : A summary of past and present research , 2005 .

[22]  J C Junqua,et al.  The Lombard reflex and its role on human listeners and automatic speech recognizers. , 1993, The Journal of the Acoustical Society of America.

[23]  Martin Cooke,et al.  A glimpsing model of speech perception in noise. , 2006, The Journal of the Acoustical Society of America.

[24]  J M Loomis,et al.  Active localization of virtual sounds. , 1990, The Journal of the Acoustical Society of America.

[25]  Jack M. Loomis,et al.  Auditory distance perception by translating observers , 1993, Proceedings of 1993 IEEE Research Properties in Virtual Reality Symposium.

[26]  Takeo Kanade,et al.  A Multibody Factorization Method for Independently Moving Objects , 1998, International Journal of Computer Vision.

[27]  Heidi Christensen,et al.  Active binaural distance estimation for dynamic sources , 2007, INTERSPEECH.

[28]  D B Pisoni,et al.  An addendum to "Effects of Noise on Speech Production: Acoustic and Perceptual Analyses" [J. Acoust. Soc. Am. 84, 917-928 (1988)]. , 1989, The Journal of the Acoustical Society of America.

[29]  J J O'NEILL,et al.  Effects of ambient noise on speaker intelligibility of words and phrases , 1958, The Laryngoscope.

[30]  J. Local,et al.  Towards a phonology of conversation: turn-taking in Tyneside English , 1986, Journal of Linguistics.

[31]  Darren B. Ward,et al.  Particle filtering algorithms for tracking an acoustic source in a reverberant environment , 2003, IEEE Trans. Speech Audio Process..