论文信息 - Seq2Tree: A Tree-Structured Extension of LSTM Network

Seq2Tree: A Tree-Structured Extension of LSTM Network

Long Short-Term Memory network(LSTM) has attracted much attention on sequence modeling tasks, because of its ability to preserve longer term information in a sequence, compared to ordinary Recurrent Neural Networks(RNN’s). The basic LSTM structure assumes a chain structure of the input sequence. However, audio streams often show a trend of combining phonemes into meaningful units, which could be words in speech processing task, or a certain type of noise in signal and noise separation task. We introduce Seq2Tree network, a modification of the LSTM network which constructs a tree structure from an input sequence. Experiments show that Seq2Tree network outperforms the state-of-the-art Bidirectional LSTM(BLSTM) model on the signal and noise separation task, namely CHiME Speech Separation and Recognition Challenge.

Xiang Li | Weicheng Ma | Zhaoheng Ni | Sang Chin | Kai Cao

[1] Jon Barker,et al. An analysis of environment, microphone and data simulation mismatches in robust speech recognition , 2017, Comput. Speech Lang..

[2] Wojciech Zaremba,et al. Learning to Execute , 2014, ArXiv.

[3] Jonathan Le Roux,et al. Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4] Christopher D. Manning,et al. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[5] Birgit Vogel-Heuser,et al. Sparse representation and its applications in micro-milling condition monitoring: noise separation and tool condition monitoring , 2014 .

[6] Hakan Erdogan,et al. Deep neural networks for single channel source separation , 2013, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7] Björn W. Schuller,et al. Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR , 2015, LVA/ICA.

[8] Wojciech Zaremba,et al. Recurrent Neural Network Regularization , 2014, ArXiv.

[9] Jon Barker,et al. The second ‘CHiME’ speech separation and recognition challenge: An overview of challenge systems and outcomes , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[10] Li Xu,et al. Accurate and cost-effective technique for jitter and noise separation based on single-frequency measurement , 2016 .

[11] Björn W. Schuller,et al. Discriminatively trained recurrent neural networks for single-channel speech separation , 2014, 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[12] Emmanuel Vincent,et al. Subjective and Objective Quality Assessment of Audio Source Separation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[13] Yoshua Bengio,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[14] Jon Barker,et al. The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[15] Sepp Hochreiter,et al. The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..