Primary-Ambient Source Separation for Upmixing to Surround Sound Systems

Extracting spatial information from an audio recording is a necessary step for upmixing stereo tracks to be played on surround systems. One important spatial feature is the perceived direction of the different audio sources in the recording, which determines how to remix the different sources in the surround system. The focus of this paper is the separation of two types of audio sources: primary (direct) and ambient (surrounding) sources. Several approaches have been proposed to solve the problem, based mainly on the correlation between the two channels in the stereo recording. In this paper, we propose a new approach based on training a neural network to determine and extract the two sources from a stereo track. By performing a subjective and objective evaluation between the proposed method and common methods from the literature, the proposed approach shows improvement in the separation accuracy, while being computationally attractive for real-time applications.

[1]  Christof Faller,et al.  Multi-Loudspeaker Playback of Stereo Signals , 2006 .

[2]  U. Zölzer,et al.  STEREO SIGNAL SEPARATION AND UPMIXING BY MID-SIDE DECOMPOSITION IN THE FREQUENCY-DOMAIN , 2015 .

[3]  Ee-Leng Tan,et al.  Time-shifted principal component analysis based cue extraction for stereo audio signals , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Derry FitzGerald,et al.  Upmixing from mono - A source separation approach , 2011, 2011 17th International Conference on Digital Signal Processing (DSP).

[5]  J. Blauert Spatial Hearing: The Psychophysics of Human Sound Localization , 1983 .

[6]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Michael M. Goodwin,et al.  Geometric signal decompositions for spatial audio enhancement , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Mingsian R. Bai,et al.  Upmixing and Downmixing Two-channel Stereo Audio for Consumer Electronics , 2007, IEEE Transactions on Consumer Electronics.

[10]  Jacob Benesty,et al.  Enhancement of Spatial Sound Quality: A New Reverberation-Extraction Audio Upmixer , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Rémi Gribonval,et al.  BSS_EVAL Toolbox User Guide -- Revision 2.0 , 2005 .

[12]  Christian Uhle,et al.  A SUPERVISED LEARNING APPROACH TO AMBIENCE EXTRACTION FROMMONO RECORDINGS FOR BLIND UPMIXING , 2008 .

[13]  Michael M. Goodwin,et al.  Primary-Ambient Signal Decomposition and Vector-Based Localization for Spatial Audio Coding and Enhancement , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[14]  Jean-Marc Jot,et al.  A Frequency-Domain Approach to Multichannel Upmix , 2004 .

[15]  Juergen Herre,et al.  Ambience Separation from Mono Recordings Using Non-Negative Matrix Factorization , 2007 .

[16]  M. Risoud,et al.  Sound source localization. , 2018, European annals of otorhinolaryngology, head and neck diseases.

[17]  Ville Pulkki Directional Audio Coding in Spatial Sound Reproduction and Stereo Upmixing , 2006 .