[1] Karen Livescu,et al. Deep convolutional acoustic word embeddings using word-pair side information , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Tianqi Chen,et al. Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.
[3] Xiang Zhang,et al. Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems , 2015, ICLR.
[4] James R. Glass,et al. Learning Word Embeddings from Speech , 2017, ArXiv.
[5] Yu Zhang,et al. Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data , 2017, NIPS.
[6] Bowen Zhou,et al. Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.
[7] Christian Biemann,et al. Unspeech: Unsupervised Speech Context Embeddings , 2018, INTERSPEECH.
[8] Mark J. F. Gales,et al. Improving Interpretability and Regularization in Deep Learning , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[9] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[10] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[11] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[12] Phil D. Green,et al. Using phone features to improve dialogue state tracking generalisation to unseen states , 2016, SIGDIAL Conference.
[13] Enhong Chen,et al. Word Embedding Revisited: A New Representation Learning and Explicit Matrix Factorization Perspective , 2015, IJCAI.
[14] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[15] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Victor Zue,et al. Speech database development at MIT: Timit and beyond , 1990, Speech Commun..
[17] D J Ostry,et al. Coarticulation of jaw movements in speech production: is context sensitivity in speech kinematics centrally planned? , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.
[18] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[19] James R. Glass,et al. Scalable Factorized Hierarchical Variational Autoencoder Training , 2018, INTERSPEECH.
[20] Andriy Mnih,et al. Disentangling by Factorising , 2018, ICML.
[21] Christopher Burgess,et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.
[22] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[23] Georg Heigold,et al. Word embeddings for speech recognition , 2014, INTERSPEECH.
[24] James R. Glass,et al. Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech , 2018, INTERSPEECH.
[25] Karen Livescu,et al. Discriminative acoustic word embeddings: Tecurrent neural network-based approaches , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).
[26] Lucia Specia,et al. Exploring the use of acoustic embeddings in neural machine translation , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[27] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[28] Johann-Mattis List,et al. SCA: Phonetic Alignment Based on Sound Classes , 2011, ESSLLI Student Sessions.
[29] Yu Zhang,et al. Learning Latent Representations for Speech Generation and Transformation , 2017, INTERSPEECH.
[30] Omer Levy,et al. word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.
[31] Alan L. Higgins,et al. Speaker verifier using nearest‐neighbor distance measure , 1995 .
[32] Alexei A. Efros,et al. Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] James F. Allen,et al. Bi-directional conversion between graphemes and phonemes using a joint N-gram model , 2001, SSW.
[34] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.