Generating Sound Words from Audio Signals of Acoustic Events with Sequence-to-Sequence Model
暂无分享,去创建一个
[1] Hervé Bourlard,et al. Connectionist probability estimators in HMM speech recognition , 1994, IEEE Trans. Speech Audio Process..
[2] Satoshi Nakamura,et al. Data collection in real acoustical environments for sound scene understanding and hands-free speech recognition , 1999, EUROSPEECH.
[3] Hiroshi G. Okuno,et al. Automatic transformation of environmental sounds into sound-imitation words based on Japanese syllable structure , 2003, INTERSPEECH.
[4] Renate Sitte,et al. Comparison of techniques for environmental sound recognition , 2003, Pattern Recognit. Lett..
[5] Jürgen Schmidhuber,et al. Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.
[6] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[7] Shrikanth S. Narayanan,et al. Classification of sound clips by two schemes: Using onomatopoeia and semantic labels , 2008, 2008 IEEE International Conference on Multimedia and Expo.
[8] T. Nishiura,et al. The Acoustic Sound Field Dictation with Hidden Markov Model Based on an Onomatopeia , 2010 .
[9] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.
[10] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[11] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[12] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[14] Yoshua Bengio,et al. End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Kunio Kashino,et al. Visualizing Video Sounds With Sound Word Animation to Enrich User Experience , 2017, IEEE Transactions on Multimedia.
[16] Yu Zhang,et al. Very deep convolutional networks for end-to-end speech recognition , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).