Tandem decoding of children's speech for keyword detection in a child-robot interaction scenario
暂无分享,去创建一个
Björn W. Schuller | Martin Wöllmer | Anton Batliner | Stefan Steidl | Dino Seppi | Björn Schuller | M. Wöllmer | S. Steidl | A. Batliner | Dino Seppi
[1] Jürgen Schmidhuber,et al. Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.
[2] Kornel Laskowski,et al. Combining Efforts for Improving Automatic Classification of Emotional User States , 2006 .
[3] Hynek Hermansky,et al. Multi-resolution RASTA filtering for TANDEM-based ASR , 2005, INTERSPEECH.
[4] B. Schuller,et al. Switching Linear Dynamic Models for Recognition of Emotionally Colored and Noisy Speech , 2010, Sprachkommunikation.
[5] Björn W. Schuller,et al. Emotion recognition from speech: Putting ASR in the loop , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[6] Michael Picheny,et al. Improvements in children's speech recognition performance , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[7] Jürgen Schmidhuber,et al. An Application of Recurrent Neural Networks to Discriminative Keyword Spotting , 2007, ICANN.
[8] Björn W. Schuller,et al. Recognition of spontaneous conversational speech using long short-term memory phoneme predictions , 2010, INTERSPEECH.
[9] Herbert Jaeger,et al. The''echo state''approach to analysing and training recurrent neural networks , 2001 .
[10] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..
[11] Dirk Heylen,et al. Towards responsive Sensitive Artificial Listeners , 2008 .
[12] Björn W. Schuller,et al. On the Impact of Children's Emotional Speech on Acoustic and Language Models , 2010, EURASIP J. Audio Speech Music. Process..
[13] Alex Graves,et al. A Tandem BLSTM-DBN Architecture for Keyword Spotting with Enhanced Context Modeling , 2009, NOLISP 2009.
[14] Ronald A. Cole,et al. Highly accurate children's speech recognition for interactive reading tutors using subword units , 2007, Speech Commun..
[15] Geoffrey Zweig,et al. Exact alpha-beta computation in logarithmic space with application to MAP word graph construction , 2000, INTERSPEECH.
[16] Björn W. Schuller,et al. Recognising interest in conversational speech - comparing bag of frames and supra-segmental features , 2009, INTERSPEECH.
[17] WöllmerMartin,et al. Tandem decoding of children's speech for keyword detection in a child-robot interaction scenario , 2011 .
[18] José L. Pérez-Córdoba,et al. Histogram equalization of speech representation for robust speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.
[19] Joakim Gustafson,et al. Voice transformations for improving children²s speech recognition in a publicly available dialogue system , 2002, INTERSPEECH.
[20] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.
[21] Jonathan Foote,et al. An overview of audio information retrieval , 1999, Multimedia Systems.
[22] Jeff A. Bilmes,et al. Maximum mutual information based reduction strategies for cross-correlation based joint distributional modeling , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[23] R. C. Rose,et al. Keyword detection in conversational speech utterances using hidden Markov model based continuous speech recognition , 1995, Comput. Speech Lang..
[24] Bhuvana Ramabhadran,et al. Vocabulary independent spoken term detection , 2007, SIGIR.
[25] Loïc Kessous,et al. Whodunnit - Searching for the most important feature types signalling emotion-related user states in speech , 2011, Comput. Speech Lang..
[26] Peter Tiño,et al. Learning long-term dependencies in NARX recurrent neural networks , 1996, IEEE Trans. Neural Networks.
[27] Samy Bengio,et al. Discriminative keyword spotting , 2009, Speech Commun..
[28] Yoshua Bengio,et al. Markovian Models for Sequential Data , 2004 .
[29] Samy Bengio,et al. Posterior based keyword spotting with a priori thresholds , 2006, INTERSPEECH.
[30] Björn W. Schuller,et al. Robust discriminative keyword spotting for emotionally colored spontaneous speech using bidirectional LSTM networks , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[31] Shrikanth S. Narayanan,et al. Automatic speech recognition for children , 1997, EUROSPEECH.
[32] Alex Graves,et al. Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.
[33] Jürgen Schmidhuber,et al. Sequence Labelling in Structured Domains with Hierarchical Recurrent Neural Networks , 2007, IJCAI.
[34] Sarel van Vuuren,et al. Relevance of time-frequency features for phonetic and speaker-channel classification , 2000, Speech Commun..
[35] Geoffrey Zweig,et al. The graphical models toolkit: An open source software system for speech and time-series processing , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[36] Stuart J. Russell,et al. Dynamic bayesian networks: representation, inference and learning , 2002 .
[37] Daniel P. W. Ellis,et al. Tandem acoustic modeling in large-vocabulary recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[38] Björn W. Schuller,et al. Robust in-car spelling recognition - a tandem BLSTM-HMM approach , 2009, INTERSPEECH.
[39] B. Repp. Some observations on the development of anticipatory coarticulation. , 1986, The Journal of the Acoustical Society of America.
[40] Geoffrey E. Hinton,et al. A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.
[41] Hervé Bourlard,et al. Enhanced Phone Posteriors for Improving Speech Recognition Systems , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[42] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[43] Jürgen Schmidhuber,et al. Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition , 2005, ICANN.
[44] Jürgen Schmidhuber,et al. Unconstrained On-line Handwriting Recognition with Recurrent Neural Networks , 2007, NIPS.
[45] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[46] C. Mayo,et al. The influence of phonemic awareness development on acoustic cue weighting strategies in children's speech perception. , 2003, Journal of speech, language, and hearing research : JSLHR.
[47] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[48] Steffen Udluft,et al. Learning Long Term Dependencies with Recurrent Neural Networks , 2006, ICANN.
[49] Björn Schuller,et al. Being bored? Recognising natural interest by extensive audiovisual integration for real-life application , 2009, Image Vis. Comput..
[50] Daniel P. W. Ellis,et al. Tandem connectionist feature extraction for conventional HMM systems , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[51] Björn W. Schuller,et al. Combining Long Short-Term Memory and Dynamic Bayesian Networks for Incremental Emotion-Sensitive Artificial Listening , 2010, IEEE Journal of Selected Topics in Signal Processing.
[52] Jürgen Schmidhuber,et al. Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.
[53] Björn W. Schuller,et al. Robust vocabulary independent keyword spotting with graphical models , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.
[54] Marcus Liwicki,et al. A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks , 2007 .
[55] Yoshua Bengio,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .
[56] Diego Giuliani,et al. Investigating recognition of children's speech , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..
[57] Shrikanth S. Narayanan,et al. Creating conversational interfaces for children , 2002, IEEE Trans. Speech Audio Process..
[58] Lakhmi C. Jain,et al. Introduction to Bayesian Networks , 2008 .
[59] Frantisek Grézl,et al. Optimizing bottle-neck features for lvcsr , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.
[60] Björn W. Schuller,et al. A multidimensional dynamic time warping algorithm for efficient multimodal fusion of asynchronous data streams , 2009, Neurocomputing.
[61] Björn W. Schuller,et al. Does affect affect automatic recognition of children2s speech? , 2008, WOCCI.
[62] A. Graves,et al. Unconstrained Online Handwriting Recognition with Recurrent Neural Networks , 2007 .
[63] Gérard Chollet,et al. Confidence measures for keyword spotting using support vector machines , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..
[64] Alex Graves,et al. Connectionist Temporal Classification , 2012 .
[65] Samy Bengio,et al. An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition , 2002, NIPS.
[66] Hervé Bourlard,et al. Connectionist Speech Recognition: A Hybrid Approach , 1993 .
[67] J.A. Bilmes,et al. Graphical model architectures for speech recognition , 2005, IEEE Signal Processing Magazine.
[68] Jeff A. Bilmes,et al. A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .
[69] Ronald J. Williams,et al. Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .
[70] J. Bilmes. Gaussian Models in Automatic Speech Recognition , 2008 .
[71] Richard Rose,et al. A hidden Markov model based keyword recognition system , 1990, International Conference on Acoustics, Speech, and Signal Processing.
[72] Trevor Darrell,et al. Hidden Conditional Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[73] Shrikanth S. Narayanan,et al. Analyzing Children's Speech: An Acoustic Study of Consonants and Consonant-Vowel Transition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
[74] Stefan Steidl,et al. Automatic classification of emotion related user states in spontaneous children's speech , 2009 .
[75] Jeff A. Bilmes,et al. Graphical models and automatic speech recognition , 2002 .
[76] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.