Deep learning for multisensorial and multimodal interaction

[1]  Björn W. Schuller,et al.  Tunable Sensitivity to Large Errors in Neural Network Training , 2017, AAAI.

[2]  Ashutosh Saxena,et al.  Deep multimodal embedding: Manipulating novel objects with point-clouds, language and trajectories , 2015, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Björn W. Schuller,et al.  Convolutional Neural Networks with Data Augmentation for Classifying Speakers' Native Language , 2016, INTERSPEECH.

[5]  Xin Zhao,et al.  Semi-Supervised Multimodal Deep Learning for RGB-D Object Recognition , 2016, IJCAI.

[6]  George Trigeorgis,et al.  Deep Canonical Time Warping , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Bernt Schiele,et al.  Learning Deep Representations of Fine-Grained Visual Descriptions , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[9]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[10]  Alexander C. Berg,et al.  Combining multiple sources of knowledge in deep CNNs for action recognition , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[11]  Björn W. Schuller,et al.  Convolutional RNN: An enhanced model for extracting features from sequential data , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[12]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[13]  Ruslan Salakhutdinov,et al.  Generating Images from Captions with Attention , 2015, ICLR.

[14]  Matthew R. Walter,et al.  Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences , 2015, AAAI.

[15]  Wolfram Burgard,et al.  Multimodal deep learning for robust RGB-D object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16]  Yoshua Bengio,et al.  Describing Multimedia Content Using Attention-Based Encoder-Decoder Networks , 2015, IEEE Transactions on Multimedia.

[17]  Yoshua Bengio,et al.  Attention-Based Models for Speech Recognition , 2015, NIPS.

[18]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[19]  Trevor Darrell,et al.  Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Christopher Joseph Pal,et al.  EmoNets: Multimodal deep learning approaches for emotion recognition in video , 2015, Journal on Multimodal User Interfaces.

[21]  Christopher Joseph Pal,et al.  Describing Videos by Exploiting Temporal Structure , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[23]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[24]  Geoffrey E. Hinton,et al.  Grammar as a Foreign Language , 2014, NIPS.

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Marcus Rohrbach,et al.  Translating Videos to Natural Language Using Deep Recurrent Neural Networks , 2014, NAACL.

[27]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[30]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[31]  Armand Joulin,et al.  Deep Fragment Embeddings for Bidirectional Image Sentence Mapping , 2014, NIPS.

[32]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[33]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Amr El-Desoky Mousa,et al.  Sub-word based language modeling of morphologically rich languages for LVCSR , 2014 .

[35]  Robert A. Jacobs,et al.  Transfer of object shape knowledge across visual and haptic modalities , 2014, CogSci.

[36]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[37]  Hagen Soltau,et al.  Morpheme-based feature-rich language models using Deep Neural Networks for LVCSR of Egyptian Arabic , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[38]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[39]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[40]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[41]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[42]  Tara N. Sainath,et al.  Deep Neural Network Language Models , 2012, WLM@NAACL-HLT.

[43]  Mohan S. Kankanhalli,et al.  Multimodal fusion for multimedia analysis: a survey , 2010, Multimedia Systems.

[44]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[45]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[46]  Björn W. Schuller,et al.  On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues , 2009, Journal on Multimodal User Interfaces.

[47]  Lukás Burget,et al.  Neural network based language models for highly inflective languages , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[48]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[49]  Benoit Huet,et al.  Toward emotion indexing of multimedia excerpts , 2008, 2008 International Workshop on Content-Based Multimedia Indexing.

[50]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[51]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[52]  Roberto Pieraccini,et al.  Using Markov decision process for learning dialogue strategies , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[53]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[54]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[55]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[56]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[57]  Boris Polyak Some methods of speeding up the convergence of iteration methods , 1964 .