Perspectives on predictive power of multimodal deep learning: surprises and future directions

[1]  Louis-Philippe Morency,et al.  Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  George Trigeorgis,et al.  Deep Canonical Time Warping for Simultaneous Alignment and Representation Learning of Sequences , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Stefanos Zafeiriou,et al.  End2You - The Imperial Toolkit for Multimodal Profiling by End-to-End Learning , 2018, ArXiv.

[4]  Li Deng,et al.  Artificial Intelligence in the Rising Wave of Deep Learning: The Historical Path and Future Outlook [Perspectives] , 2018, IEEE Signal Processing Magazine.

[5]  Li Deng,et al.  Question-Answering with Grammatically-Interpretable Representations , 2017, AAAI.

[6]  Li Deng,et al.  Deep Learning for Image-to-Text Generation: A Technical Overview , 2017, IEEE Signal Processing Magazine.

[7]  Alfred O. Hero,et al.  Challenges and Open Problems in Signal Processing: Panel Discussion Summary from ICASSP 2017 [Panel and Forum] , 2017, IEEE Signal Processing Magazine.

[8]  Björn W. Schuller,et al.  From Hard to Soft: Towards more Human-like Emotion Recognition by Modelling the Perception Uncertainty , 2017, ACM Multimedia.

[9]  Fabien Ringeval,et al.  End-to-end learning for dimensional emotion recognition from physiological signals , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[10]  Antonio Torralba,et al.  SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.

[11]  Pascale Fung,et al.  A Long Short-Term Memory Framework for Predicting Humor in Dialogues , 2016, NAACL.

[12]  George Trigeorgis,et al.  Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[14]  Jianfeng Gao,et al.  Reasoning in Vector Space: An Exploratory Study of Question Answering , 2016, ICLR.

[15]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Eduardo Coutinho,et al.  Cooperative Learning and its Application to Emotion Recognition from Speech , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[17]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[18]  Ruslan Salakhutdinov,et al.  Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.

[19]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[20]  Yifan Gong,et al.  Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Geoffrey Zweig,et al.  Recent advances in deep learning for speech research at Microsoft , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Xiao Li,et al.  Machine Learning Paradigms for Speech Recognition: An Overview , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[24]  Maja Pantic,et al.  Classifying laughter and speech using audio-visual feature prediction , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  Björn W. Schuller,et al.  Recognition of spontaneous conversational speech using long short-term memory phoneme predictions , 2010, INTERSPEECH.

[26]  Hui Lin,et al.  A study on multilingual acoustic modeling for large vocabulary ASR , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[27]  Björn W. Schuller,et al.  Abandoning emotion classes - towards continuous emotion recognition with modelling of long-range dependencies , 2008, INTERSPEECH.

[28]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[29]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[30]  Björn W. Schuller,et al.  A Combined LSTM-RNN - HMM - Approach for Meeting Event Segmentation and Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.