Bidirectional LSTM Networks for Context-Sensitive Keyword Detection in a Cognitive Virtual Agent Framework
暂无分享,去创建一个
Björn W. Schuller | Alex Graves | Florian Eyben | Martin Wöllmer | Gerhard Rigoll | A. Graves | Björn Schuller | F. Eyben | M. Wöllmer | G. Rigoll | Alex Graves
[1] Björn W. Schuller,et al. Robust discriminative keyword spotting for emotionally colored spontaneous speech using bidirectional LSTM networks , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[2] Jürgen Schmidhuber,et al. Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.
[3] Jeff A. Bilmes,et al. Graphical models and automatic speech recognition , 2002 .
[4] Jonathan G. Fiscus,et al. DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 , 1993 .
[5] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.
[6] Björn W. Schuller,et al. Abandoning emotion classes - towards continuous emotion recognition with modelling of long-range dependencies , 2008, INTERSPEECH.
[7] Dirk Heylen,et al. Towards responsive Sensitive Artificial Listeners , 2008 .
[8] Sharon L. Oviatt,et al. Multimodal interface research: a science without borders , 2000, INTERSPEECH.
[9] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[10] A. Waibel,et al. MULTIMODAL HUMAN-COMPUTER INTERACTION , 1993 .
[11] Björn Schuller,et al. Being bored? Recognising natural interest by extensive audiovisual integration for real-life application , 2009, Image Vis. Comput..
[12] R. C. Rose,et al. Keyword detection in conversational speech utterances using hidden Markov model based continuous speech recognition , 1995, Comput. Speech Lang..
[13] Bhuvana Ramabhadran,et al. Vocabulary independent spoken term detection , 2007, SIGIR.
[14] Jürgen Schmidhuber,et al. Unconstrained On-line Handwriting Recognition with Recurrent Neural Networks , 2007, NIPS.
[15] Sarel van Vuuren,et al. Relevance of time-frequency features for phonetic and speaker-channel classification , 2000, Speech Commun..
[16] Hervé Bourlard,et al. Connectionist Speech Recognition: A Hybrid Approach , 1993 .
[17] Geoffrey Zweig,et al. The graphical models toolkit: An open source software system for speech and time-series processing , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[18] Alex Graves,et al. Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.
[19] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..
[20] L. Baum,et al. A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .
[21] Björn W. Schuller,et al. On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues , 2009, Journal on Multimodal User Interfaces.
[22] Samy Bengio,et al. Posterior based keyword spotting with a priori thresholds , 2006, INTERSPEECH.
[23] Richard Rose,et al. A hidden Markov model based keyword recognition system , 1990, International Conference on Acoustics, Speech, and Signal Processing.
[24] Trevor Darrell,et al. Hidden Conditional Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[25] Stephen Cox,et al. Some statistical issues in the comparison of speech recognition algorithms , 1989, International Conference on Acoustics, Speech, and Signal Processing,.
[26] Leslie G. Valiant,et al. Cognitive computation , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.
[27] Peter Tiño,et al. Learning long-term dependencies in NARX recurrent neural networks , 1996, IEEE Trans. Neural Networks.
[28] Björn W. Schuller,et al. A multidimensional dynamic time warping algorithm for efficient multimodal fusion of asynchronous data streams , 2009, Neurocomputing.
[29] Geoffrey Zweig,et al. Exact alpha-beta computation in logarithmic space with application to MAP word graph construction , 2000, INTERSPEECH.
[30] Björn W. Schuller,et al. Recognising interest in conversational speech - comparing bag of frames and supra-segmental features , 2009, INTERSPEECH.
[31] Mitsuru Ishizuka,et al. A chat system based on emotion estimation from text and embodied conversational messengers , 2005, Proceedings of the 2005 International Conference on Active Media Technology, 2005. (AMT 2005)..
[32] Jeff A. Bilmes,et al. A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .
[33] Jonathan G. Fiscus,et al. Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .
[34] Yoshua Bengio,et al. Markovian Models for Sequential Data , 2004 .
[35] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[36] A. Graves,et al. Unconstrained Online Handwriting Recognition with Recurrent Neural Networks , 2007 .
[37] Mark Johnson,et al. Mathematical Foundations of Speech and Language Processing , 2004 .
[38] Gérard Chollet,et al. Confidence measures for keyword spotting using support vector machines , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..
[39] Alex Graves,et al. Connectionist Temporal Classification , 2012 .
[40] Ronald J. Williams,et al. Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .
[41] Hui Lin,et al. Improving multi-lattice alignment based spoken keyword spotting , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[42] J. Bilmes. Gaussian Models in Automatic Speech Recognition , 2008 .
[43] S. C. Kremer,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .
[44] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[45] Kostas Karpouzis,et al. The HUMAINE Database: Addressing the Collection and Annotation of Naturalistic and Induced Emotional Data , 2007, ACII.
[46] Jürgen Schmidhuber,et al. Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition , 2005, ICANN.
[47] Björn W. Schuller,et al. Data-driven clustering in emotional space for affect recognition using discriminatively trained LSTM networks , 2009, INTERSPEECH.
[48] Jürgen Schmidhuber,et al. Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.
[49] Björn W. Schuller,et al. Robust vocabulary independent keyword spotting with graphical models , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.
[50] Steffen Udluft,et al. Learning long-term dependencies with recurrent neural networks , 2008, Neurocomputing.
[51] Marcus Liwicki,et al. A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks , 2007 .
[52] Yoshua Bengio,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .
[53] Daniel P. W. Ellis,et al. Tandem connectionist feature extraction for conventional HMM systems , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[54] Tom Ziemke,et al. On the Role of Emotion in Embodied Cognitive Architectures: From Organisms to Robots , 2009, Cognitive Computation.
[55] Jeff A. Bilmes,et al. Maximum mutual information based reduction strategies for cross-correlation based joint distributional modeling , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[56] J.A. Bilmes,et al. Graphical model architectures for speech recognition , 2005, IEEE Signal Processing Magazine.
[57] Hui Lin,et al. OOV detection by joint word/phone lattice alignment , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).
[58] Matthew Turk,et al. Multimodal Human-Computer Interaction , 2005 .
[59] Henry Lieberman,et al. A model of textual affect sensing using real-world knowledge , 2003, IUI '03.
[60] Herbert Jaeger,et al. The''echo state''approach to analysing and training recurrent neural networks , 2001 .
[61] Samy Bengio,et al. Discriminative keyword spotting , 2009, Speech Commun..
[62] A. Nakamura,et al. Nature (London , 1975 .
[63] Geoffrey E. Hinton,et al. A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.
[64] Hervé Bourlard,et al. Enhanced Phone Posteriors for Improving Speech Recognition Systems , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[65] Lakhmi C. Jain,et al. Introduction to Bayesian Networks , 2008 .
[66] Jürgen Schmidhuber,et al. An Application of Recurrent Neural Networks to Discriminative Keyword Spotting , 2007, ICANN.
[67] Daniel P. W. Ellis,et al. Tandem acoustic modeling in large-vocabulary recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[68] Michael Weintraub,et al. Keyword-spotting using SRI's DECIPHER large-vocabulary speech-recognition system , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[69] Björn W. Schuller,et al. Robust in-car spelling recognition - a tandem BLSTM-HMM approach , 2009, INTERSPEECH.
[70] Björn W. Schuller,et al. Spoken term detection with Connectionist Temporal Classification: A novel hybrid CTC-DBN decoder , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.