Sound Source Separation Using Spatio-temporal Sound Pressure Distribution Images and Machine Learning

Sound source separation (SSS) using a microphone array is effective in various situations, such as in the recording of target speech in noisy environments. We previously proposed an SSS system using a differential-type array and time-delay neural network (NN). An advantage of the differential-type array was that distortion of the target speech due to the nonlinear property of the NN was prevented. However, the system was only effective for a narrowband signal. For broadband signals (such as speech), a new SSS system is proposed, which extends the previous one in the following two areas. First, the input to the NN is extended to sound pressure distribution images, which are formed based on the microphone outputs. Second, the number of layers in the NN is increased. Computer simulations revealed that the proposed system exhibited higher SSS performance than conventional arrays, including our previous one.

[1]  Shuichi Sakamoto,et al.  Proposal of a Sound Source Separation Method Using Image Signal Processing of a Spatio-Temporal Sound Pressure Distribution Image , 2018 .

[2]  Masanori Morise,et al.  Sound source separation using image signal processing based on sparsity of sound field , 2016 .

[3]  S. C. Kremer,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[4]  H. Kobatake,et al.  Super directive sensor array with neural network structure , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Kenji Ozawa,et al.  Neural-network-based microphone-array system trained with temporal-spatial patterns of multiple sinusoidal signals , 2017 .

[6]  Masanori Morise,et al.  Separation of Two Sound Sources in the Same Direction by Image Signal Processing , 2018, 2018 IEEE 7th Global Conference on Consumer Electronics (GCCE).

[7]  Masanori Morise,et al.  Sound Source Separation by Instantaneous Estimation-Based Spectral Subtraction , 2018, 2018 5th International Conference on Systems and Informatics (ICSAI).

[8]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[9]  Kenji Ozawa,et al.  Optimization of neural-network-based superdirective microphone-array system using a genetic algorithm , 2015 .

[10]  Masanori Morise,et al.  Broadbanding of a NN-based microphone-array system by decomposing into frequency components , 2017, 2017 IEEE 6th Global Conference on Consumer Electronics (GCCE).

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Masataka Goto,et al.  RWC Music Database: Popular, Classical and Jazz Music Databases , 2002, ISMIR.

[13]  Jacob Benesty,et al.  Acoustic Array Systems: Theory, Implementation, and Application , 2013 .

[14]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..