论文信息 - Deep learning for multisensorial and multimodal interaction - 字舞流文

Deep learning for multisensorial and multimodal interaction

Stefanos Zafeiriou | Björn Schuller | Gil Keren | Olivier Pietquin | Amr El-Desoky Mousa | O. Pietquin | Björn Schuller | A. Mousa | S. Zafeiriou | Gil Keren

[1] Björn W. Schuller,et al. Tunable Sensitivity to Large Errors in Neural Network Training , 2017, AAAI.

[2] Ashutosh Saxena,et al. Deep multimodal embedding: Manipulating novel objects with point-clouds, language and trajectories , 2015, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[3] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Björn W. Schuller,et al. Convolutional Neural Networks with Data Augmentation for Classifying Speakers' Native Language , 2016, INTERSPEECH.

[5] Xin Zhao,et al. Semi-Supervised Multimodal Deep Learning for RGB-D Object Recognition , 2016, IJCAI.

[6] George Trigeorgis,et al. Deep Canonical Time Warping , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Bernt Schiele,et al. Learning Deep Representations of Fine-Grained Visual Descriptions , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] John Salvatier,et al. Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[9] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[10] Alexander C. Berg,et al. Combining multiple sources of knowledge in deep CNNs for action recognition , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[11] Björn W. Schuller,et al. Convolutional RNN: An enhanced model for extracting features from sequential data , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[12] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[13] Ruslan Salakhutdinov,et al. Generating Images from Captions with Attention , 2015, ICLR.

[14] Matthew R. Walter,et al. Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences , 2015, AAAI.

[15] Wolfram Burgard,et al. Multimodal deep learning for robust RGB-D object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16] Yoshua Bengio,et al. Describing Multimedia Content Using Attention-Based Encoder-Decoder Networks , 2015, IEEE Transactions on Multimedia.

[17] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.

[18] Navdeep Jaitly,et al. Pointer Networks , 2015, NIPS.

[19] Trevor Darrell,et al. Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20] Christopher Joseph Pal,et al. EmoNets: Multimodal deep learning approaches for emotion recognition in video , 2015, Journal on Multimodal User Interfaces.

[21] Christopher Joseph Pal,et al. Describing Videos by Exploiting Temporal Structure , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[23] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[24] Geoffrey E. Hinton,et al. Grammar as a Foreign Language , 2014, NIPS.

[25] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26] Marcus Rohrbach,et al. Translating Videos to Natural Language Using Deep Recurrent Neural Networks , 2014, NAACL.

[27] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[30] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[31] Armand Joulin,et al. Deep Fragment Embeddings for Bidirectional Image Sentence Mapping , 2014, NIPS.

[32] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[33] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34] Amr El-Desoky Mousa,et al. Sub-word based language modeling of morphologically rich languages for LVCSR , 2014 .

[35] Robert A. Jacobs,et al. Transfer of object shape knowledge across visual and haptic modalities , 2014, CogSci.

[36] Geoffrey Zweig,et al. Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[37] Hagen Soltau,et al. Morpheme-based feature-rich language models using Deep Neural Networks for LVCSR of Egyptian Arabic , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[38] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[39] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[40] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[41] Andrew Y. Ng,et al. Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[42] Tara N. Sainath,et al. Deep Neural Network Language Models , 2012, WLM@NAACL-HLT.

[43] Mohan S. Kankanhalli,et al. Multimodal fusion for multimedia analysis: a survey , 2010, Multimedia Systems.

[44] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.

[45] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[46] Björn W. Schuller,et al. On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues , 2009, Journal on Multimodal User Interfaces.

[47] Lukás Burget,et al. Neural network based language models for highly inflective languages , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[48] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[49] Benoit Huet,et al. Toward emotion indexing of multimedia excerpts , 2008, 2008 International Workshop on Content-Based Multimedia Indexing.

[50] Yoshua Bengio,et al. Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[51] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[52] Roberto Pieraccini,et al. Using Markov decision process for learning dialogue strategies , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[53] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[54] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[55] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[56] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[57] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .