Audio Word2vec: Sequence-to-Sequence Autoencoding for Unsupervised Learning of Audio Segmentation and Representation
暂无分享,去创建一个
Hung-yi Lee | Chia-Hao Shen | Sung-Feng Huang | Yi-Chen Chen | Yu-Hsuan Wang | Hung-yi Lee | Yi-Chen Chen | Sung-Feng Huang | Yu-Hsuan Wang | Chia-Hao Shen | S. Huang
[1] Yu Zhang,et al. Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[2] Brian Kingsbury,et al. End-to-end ASR-free keyword search from speech , 2017, ICASSP.
[3] Tanja Schultz,et al. Globalphone: a multilingual speech and text database developed at karlsruhe university , 2002, INTERSPEECH.
[4] Bin Ma,et al. An acoustic segment modeling approach to query-by-example spoken term detection , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Hugo Van hamme,et al. Fast word acquisition in an NMF-based learning framework , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Lin-Shan Lee,et al. Audio Word2Vec: Unsupervised Learning of Audio Segment Representations Using Sequence-to-Sequence Autoencoder , 2016, INTERSPEECH.
[7] Yifan Gong,et al. Unsupervised adaptation with domain separation networks for robust speech recognition , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[8] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[9] Samy Bengio,et al. Generating Sentences from a Continuous Space , 2015, CoNLL.
[10] Lin-Shan Lee,et al. Phonetic-and-Semantic Embedding of Spoken words with Applications in Spoken Content Retrieval , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[11] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[12] Karen Livescu,et al. Multi-view Recurrent Neural Acoustic Word Embeddings , 2016, ICLR.
[13] Yu Zhang,et al. Learning Latent Representations for Speech Generation and Transformation , 2017, INTERSPEECH.
[14] Kenneth Ward Church,et al. Towards spoken term discovery at scale with zero resources , 2010, INTERSPEECH.
[15] James R. Glass,et al. A Nonparametric Bayesian Approach to Acoustic Model Discovery , 2012, ACL.
[16] Andrew L. Maas,et al. Word-level Acoustic Modeling with Convolutional Vector Regression , 2012 .
[17] Aren Jansen,et al. Exploiting Discriminative Point Process Models for Spoken Term Detection , 2012, INTERSPEECH.
[18] Adam Lopez,et al. Towards speech-to-text translation without speech recognition , 2017, EACL.
[19] Daniel Jurafsky,et al. A Hierarchical Neural Autoencoder for Paragraphs and Documents , 2015, ACL.
[20] Erhardt Barth,et al. A Hybrid Convolutional Variational Autoencoder for Text Generation , 2017, EMNLP.
[21] Hung-yi Lee,et al. Gate Activation Signal Analysis for Gated Recurrent Neural Networks and its Correlation with Phoneme Boundaries , 2017, INTERSPEECH.
[22] Karen Livescu,et al. Deep convolutional acoustic word embeddings using word-pair side information , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] James R. Glass,et al. Unsupervised Pattern Discovery in Speech , 2008, IEEE Transactions on Audio, Speech, and Language Processing.
[24] Haizhou Li,et al. An Iterative Approach to Model Merging for Speech Pattern Discovery , 2011 .
[25] Quoc V. Le,et al. Distributed Representations of Sentences and Documents , 2014, ICML.
[26] Patrick Kenny,et al. Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification , 2009, INTERSPEECH.
[27] Aren Jansen,et al. Towards Learning Semantic Audio Representations from Unlabeled Data , 2017 .
[28] Aren Jansen,et al. The zero resource speech challenge 2017 , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[29] Yu Zhang,et al. Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data , 2017, NIPS.
[30] Hung-yi Lee,et al. Language Transfer of Audio Word2Vec: Learning Audio Segment Representations Without Target Language Data , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[31] Aren Jansen,et al. Segmental acoustic indexing for zero resource keyword search , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] Lin-Shan Lee,et al. Segmental Audio Word2Vec: Representing Utterances as Sequences of Vectors with Applications in Spoken Term Detection , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[33] Karen Livescu,et al. An embedded segmental K-means model for unsupervised segmentation and clustering of speech , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[34] Tara N. Sainath,et al. Query-by-example keyword spotting using long short-term memory networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[35] Aren Jansen,et al. Unsupervised Word Segmentation and Lexicon Discovery Using Acoustic Word Embeddings , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[36] Majid Mirbagheri,et al. ASR for Under-Resourced Languages From Probabilistic Transcription , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[37] Karen Livescu,et al. Query-by-Example Search with Discriminative Neural Acoustic Word Embeddings , 2017, INTERSPEECH.
[38] Herman Kamper,et al. Truly Unsupervised Acoustic Word Embeddings Using Weak Top-down Constraints in Encoder-decoder Models , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[39] Okko Johannes Räsänen,et al. Basic cuts revisited: Temporal segmentation of speech into phone-like units with statistical learning at a pre-linguistic level , 2014, CogSci.
[40] Balaraman Ravindran,et al. Diversity driven attention model for query-based abstractive summarization , 2017, ACL.
[41] Aren Jansen,et al. Towards Unsupervised Training of Speaker Independent Acoustic Models , 2011, INTERSPEECH.
[42] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[43] Nobuaki Minematsu,et al. Unsupervised optimal phoneme segmentation: Objectives, algorithm and comparisons , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.
[44] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[45] Sanja Fidler,et al. Skip-Thought Vectors , 2015, NIPS.
[46] Yoshua Bengio,et al. Hierarchical Multiscale Recurrent Neural Networks , 2016, ICLR.
[47] Nitish Srivastava,et al. Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.
[48] Aren Jansen,et al. A segmental framework for fully-unsupervised large-vocabulary speech recognition , 2016, Comput. Speech Lang..
[49] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[50] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[51] Lin-Shan Lee,et al. Unsupervised Discovery of Structured Acoustic Tokens With Applications to Spoken Term Detection , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[52] Léon Bottou,et al. Wasserstein GAN , 2017, ArXiv.
[53] Emmanuel Dupoux,et al. Learning Word Embeddings: Unsupervised Methods for Fixed-size Representations of Variable-length Speech Segments , 2018, INTERSPEECH.
[54] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[55] Lin-Shan Lee,et al. Unsupervised spoken term detection with spoken queries by multi-level acoustic patterns with varying model granularity , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[56] Karen Livescu,et al. Discriminative acoustic word embeddings: Tecurrent neural network-based approaches , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).
[57] Frank K. Soong,et al. A segment model based approach to speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.
[58] Boon Pang Lim,et al. Multitask Learning for Phone Recognition of Underresourced Languages Using Mismatched Transcription , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[59] Pierre Baldi,et al. Autoencoders, Unsupervised Learning, and Deep Architectures , 2011, ICML Unsupervised and Transfer Learning.
[60] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[61] François Laviolette,et al. Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..
[62] I-Fan Chen,et al. A hybrid HMM/DNN approach to keyword spotting of short words , 2013, INTERSPEECH.
[63] Lin-Shan Lee,et al. Enhanced Spoken Term Detection Using Support Vector Machines and Weighted Pseudo Examples , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[64] James R. Glass,et al. Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech , 2018, INTERSPEECH.
[65] Aaron C. Courville,et al. Improved Training of Wasserstein GANs , 2017, NIPS.
[66] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[67] Micha Elsner,et al. Speech segmentation with a neural encoder model of working memory , 2017, EMNLP.
[68] James R. Glass,et al. Towards multi-speaker unsupervised speech pattern discovery , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.
[69] Georg Heigold,et al. Word embeddings for speech recognition , 2014, INTERSPEECH.
[70] Giampiero Salvi,et al. Word Discovery with Beta Process Factor Analysis , 2012, INTERSPEECH.
[71] Aren Jansen,et al. Fixed-dimensional acoustic embeddings of variable-length segments in low-resource settings , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[72] Kenneth Ward Church,et al. A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[73] Herbert Gish,et al. Keyword Spotting of Arbitrary Words Using Minimal Speech Resources , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
[74] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.
[75] Hugo Van hamme,et al. Discovering Phone Patterns in Spoken Utterances by Non-Negative Matrix Factorization , 2008, IEEE Signal Processing Letters.
[76] Nicolas Usunier,et al. Joint Learning of Speaker and Phonetic Similarities with Siamese Networks , 2016, INTERSPEECH.
[77] James R. Glass,et al. Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.