Performance of phase transform for detecting sound sources with microphone arrays in reverberant and noisy environments

The performance of sound source location (SSL) algorithms with microphone arrays can be enhanced by processing signals prior to the delay and sum operation. The phase transform (PHAT) has been shown to improve SSL images, especially in reverberant environments. This paper introduces a modification, referred to as the PHAT-@b transform, that varies the degree of spectral magnitude information used by the transform through a single parameter. Performance results are computed using a Monte Carlo simulation of an eight element perimeter array with a receiver operating characteristic (ROC) analysis for detecting single and multiple sound sources. In addition, a Fisher's criterion performance measure is also computed for target and noise peak separability and compared to the ROC results. Results show that the standard PHAT significantly improves detection performance for broadband signals especially in high levels of reverberation noise, and to a lesser degree for noise from other coherent sources. For narrowband targets the PHAT typically results in significant performance degradation; however, the PHAT-@b can achieve performance improvements for both narrowband and broadband signals. Finally, the performance for real speech signal samples is examined and shown to exhibit properties similar to both the simulated broad and narrowband cases, suggesting the use of @b values between 0.5 and 0.7 for array applications with general signals.

[1]  Michael S. Brandstein,et al.  Microphone Arrays - Signal Processing Techniques and Applications , 2001, Microphone Arrays.

[2]  Thomas S. Huang Multimedia/multimodal signal processing, analysis, and understanding , 2004, First International Symposium on Control, Communications and Signal Processing, 2004..

[3]  P. Laguna,et al.  Signal Processing , 2002, Yearbook of Medical Informatics.

[4]  W. Tager Near field superdirectivity (NFSD) , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[5]  Michael S. Brandstein,et al.  A practical methodology for speech source localization with microphone arrays , 1997, Comput. Speech Lang..

[6]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[7]  James R. Hopgood,et al.  Nonconcurrent multiple speakers tracking based on extended Kalman particle filter , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Parham Aarabi,et al.  Enhanced sound localization , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[9]  H. V. Trees Detection, Estimation, And Modulation Theory , 2001 .

[10]  Klaus Uwe Simmer,et al.  Superdirective Microphone Arrays , 2001, Microphone Arrays.

[11]  Michael S. Brandstein,et al.  Robust Localization in Reverberant Rooms , 2001, Microphone Arrays.

[12]  Maurizio Omologo,et al.  Use of the crosspower-spectrum phase in acoustic event location , 1997, IEEE Trans. Speech Audio Process..

[13]  M. Melamed Detection , 2021, SETI: Astronomy as a Contact Sport.

[14]  Seok Cheol Kee,et al.  Speaker detection and tracking at mobile robot platform , 2004, Proceedings of 2004 International Symposium on Intelligent Signal Processing and Communication Systems, 2004. ISPACS 2004..

[15]  A. Gualtierotti H. L. Van Trees, Detection, Estimation, and Modulation Theory, , 1976 .

[16]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[17]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[18]  Rodney A. Kennedy,et al.  Equalization in an acoustic reverberant environment: robustness results , 2000, IEEE Trans. Speech Audio Process..

[19]  Parham Aarabi,et al.  EURASIP Journal on Applied Signal Processing 2003:4, 338–347 c ○ 2003 Hindawi Publishing Corporation The Fusion of Distributed Microphone Arrays for Sound Localization , 2002 .

[20]  Michael H. Coen,et al.  Design Principles for Intelligent Environments , 1998, AAAI/IAAI.

[21]  Venkataraman Sundareswaran,et al.  Integrated multimodal human-computer interface and augmented reality for interactive display applications , 2000, Defense, Security, and Sensing.

[22]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[23]  Mohan M. Trivedi,et al.  Source localization in reverberant environments: modeling and statistical analysis , 2003, IEEE Trans. Speech Audio Process..

[24]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .