Harmonic sound stream segregation using localization and its application to speech stream segregation

Sound stream segregation is essential to understand auditory events in the real world. In this paper, we present a new method of segregating a series of harmonic sounds. The harmonic structure and sound source direction are used as clues for segregation. The direction information of the sources is used to extract fundamental frequencies of individual harmonic sounds, and harmonic sounds are segregated according to the extracted fundamental frequencies. Sequential grouping of harmonic sounds is achieved by using both sound source directions and fundamental frequencies. An application of the harmonic stream segregation to speech stream segregation is presented. It provides effective speech stream segregation using binaural microphones. Experimental results show that the method reduces the spectrum distortions and the fundamental frequency errors compared to an existing monaural system, and that it can segregate three simultaneous harmonic streams with only two microphones.

[1]  Alain de Cheveigné,et al.  Separation of concurrent harmonic sounds: Fundamental frequency estimation and a time-domain cancell , 1993 .

[2]  Jens Blauert,et al.  Cocktail party processors based on binaural models , 1998 .

[3]  Andrzej Cichocki,et al.  A New Learning Algorithm for Blind Signal Separation , 1995, NIPS.

[4]  Tomohiro Nakatani,et al.  A new speech enhancement: speech stream segregation , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  Tomohiro Nakatani,et al.  A computational model of sound stream segregation with multi-agent paradigm , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[6]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[7]  N. Ohnishi,et al.  Localizing sound source by incorporating biological auditory mechanism , 1988, IEEE 1988 International Conference on Neural Networks.

[8]  R. O. Schmidt,et al.  Multiple emitter location and signal Parameter estimation , 1986 .

[9]  R Meddis,et al.  The role of interaural time difference and fundamental frequency difference in the identification of concurrent vowel pairs. , 1992, The Journal of the Acoustical Society of America.

[10]  Tomohiro Nakatani,et al.  Localization by harmonic structure and its application to harmonic sound stream segregation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[11]  Mitchel Weintraub,et al.  A computational model for separating two simultaneous talkers , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Guy J. Brown Computational auditory scene analysis : a representational approach , 1993 .

[13]  Richard F. Lyon,et al.  Computational models of neural auditory processing , 1984, ICASSP.

[14]  P Green,et al.  Computational auditory scene analysis: listening to several things at once. , 1993, Endeavour.

[15]  Christian Jutten,et al.  Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture , 1991, Signal Process..

[16]  Ramdas Kumaresan,et al.  Voiced-speech analysis based on the residual interfering signal canceler (RISC) algorithm , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  T. W. Parsons Separation of speech from interfering speech by means of harmonic selection , 1976 .

[18]  John P. Costas Residual Signal Analysis - A Search and Destroy Approach to Spectral Analysis , 1980 .

[19]  R. W. Stadler,et al.  On the potential of fixed arrays for hearing aids , 1993 .

[20]  E. Oja,et al.  Independent Component Analysis , 2013 .

[21]  M. Bodden Modeling human sound-source localization and the cocktail-party-effect , 1993 .

[22]  Albert S. Bregman,et al.  The Auditory Scene. (Book Reviews: Auditory Scene Analysis. The Perceptual Organization of Sound.) , 1990 .

[23]  George R. Doddington,et al.  An integrated pitch tracking algorithm for speech systems , 1983, ICASSP.

[24]  Sadaoki Furui,et al.  Advances in Speech Signal Processing , 1991 .

[25]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[26]  Tomohiro Nakatani,et al.  Residue-Driven Architecture for Computational Auditory Scene Analysis , 1995, IJCAI.

[27]  Tatsuya Morita Separation of mixed voices by acoustic parameter optimization , 1991 .