Microphone array beamforming for spatial audio object capture.
暂无分享,去创建一个
Microphone arrays can capture a sound scene and can be combined with signal processing to spatially filter or beamform the scene to extract the source of interest by suppressing unwanted sounds.
Microphone array beamforming has been widely used for speech enhancement, giving rise to a vast number of beamforming methods to optimally suppress interfering sounds. However, the opportunities of these systems in broadcast and consumer audio recording have not been investigated, where wideband capture is a requirement. In this case, the microphone array design plays a significant role, yet despite the various designs from the literature, it is not clear which geometry provides the best performance under a range of criteria relevant for these applications. Moreover, the interactions between the array geometry, the beamformer and other design parameters and their impact on both physical and perceptual quality of extracted audio sources have not been established.
The main contribution of this thesis is to determine the uniform microphone array design that maximises the quality of extracted audio sources (or objects) from horizontal sound scenes, since most sound scenes have much larger variation in azimuth than elevation. Both physical and perceptual performance evaluations are conducted with a range of microphone geometries and beamforming methods showing that baffled circular arrays outperform alternative geometries both objectively (in terms of frequency range, spatial resolution, directivity and robustness) and perceptually (based on interference suppression and quality of target and overall sounds). New insights of the interactions between array geometries and beamformers are provided. Moreover, a subjective evaluation of beamforming methods is undertaken showing the benefits of the on-axis distortionless response in combination with very high directivity from the superdirective beamformer, particularly for wideband signals.
In addition to the array geometry, the effects of directivity order and regularisation are further investigated to synthesise frequency-invariant directional responses with the least-squares beamformer. The results exhibit the trade-offs between directivity and robustness with regularisation and between directivity and frequency range with directivity order. Baffled circular arrays perform best consistently for different orders and regularisation parameters. Furthermore, an optimal regularisation parameter is derived that minimises the error between the target and synthesised responses in presence of manifold errors, outperforming constant robustness constraints particularly for gain and positioning errors whose optimal regularised responses are frequency dependent.
The combination of simulation and perceptual results presented in this thesis represents a significant addition to the beamforming literature, potentially influencing the design of future compact microphone arrays.