An Advanced DSP Algorithm for Music-Less Audio Stream Generation

This paper investigates the problem of separation of human voice from a mixture of human voice and different music instruments. The human voice may be a part of singing voice in a song or it may be a part of some news broadcasted by a channel and contains background music. The final outcome of this work would be a file containing only vocals. Stereo audio is considered for separation in this advance approach. The signal is processed in time frequency domain. In this method of blind source separation the input stereo audio file is processed in the form of frames, then windowed and in last Short time Fourier transform (STFT) is applied on signal. The signal is masked for de-mixing purpose using independent layers of time frequency filters (TFF). A mask is defined for each layer based upon filtering technique. One of the filtering techniques is Pan TFF and the other is inter-channel phase difference TFF. Filtering helps to select STFT coefficients that are estimated as a part of vocals and makes the rest of them zero. After coefficient selection the signal is reconstructed by overlap add (OLA) method to get the final output signal containing only vocals.

[1]  Antoine Liutkus,et al.  Adaptive filtering for music/voice separation exploiting the repeating musical structure , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  V. G. Reju,et al.  Underdetermined Convolutive Blind Source Separation via Time–Frequency Masking , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Alex Loscos,et al.  Demixing Commercial Music Productions via Human-Assisted Time-Frequency Masking , 2006 .

[4]  Jiuchao Feng,et al.  A weighted general discrete fourier transform for the frequency-domain Blind Source Separation of Convolutive Mixtures , 2008 .

[5]  Rémi Gribonval,et al.  BSS_EVAL Toolbox User Guide -- Revision 2.0 , 2005 .

[6]  Emmanuel Vincent,et al.  First Stereo Audio Source Separation Evaluation Campaign: Data, Algorithms and Results , 2007, ICA.

[7]  Richard Polfreman,et al.  Towards effective singing voice extraction from stereophonic recordings , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  D. Sundararajan The Discrete Fourier Transform: Theory, Algorithms and Applications , 2001 .

[9]  Hamid Amiri,et al.  About Multichannel Speech Signal Extraction and Separation Techniques , 2012, ArXiv.

[10]  Hiroshi Sawada,et al.  Reducing musical noise by a fine-shift overlap-add method applied to source separation using a time-frequency mask , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..