论文信息 - Perspectives on predictive power of multimodal deep learning: surprises and future directions - 字舞流文

Perspectives on predictive power of multimodal deep learning: surprises and future directions

Samy Bengio | Björn Schuller | Louis-Philippe Morency | Li Deng | Samy Bengio | L. Deng | Louis-Philippe Morency | Björn Schuller

[1] Louis-Philippe Morency,et al. Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] George Trigeorgis,et al. Deep Canonical Time Warping for Simultaneous Alignment and Representation Learning of Sequences , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3] Stefanos Zafeiriou,et al. End2You - The Imperial Toolkit for Multimodal Profiling by End-to-End Learning , 2018, ArXiv.

[4] Li Deng,et al. Artificial Intelligence in the Rising Wave of Deep Learning: The Historical Path and Future Outlook [Perspectives] , 2018, IEEE Signal Processing Magazine.

[5] Li Deng,et al. Question-Answering with Grammatically-Interpretable Representations , 2017, AAAI.

[6] Li Deng,et al. Deep Learning for Image-to-Text Generation: A Technical Overview , 2017, IEEE Signal Processing Magazine.

[7] Alfred O. Hero,et al. Challenges and Open Problems in Signal Processing: Panel Discussion Summary from ICASSP 2017 [Panel and Forum] , 2017, IEEE Signal Processing Magazine.

[8] Björn W. Schuller,et al. From Hard to Soft: Towards more Human-like Emotion Recognition by Modelling the Perception Uncertainty , 2017, ACM Multimedia.

[9] Fabien Ringeval,et al. End-to-end learning for dimensional emotion recognition from physiological signals , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[10] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.

[11] Pascale Fung,et al. A Long Short-Term Memory Framework for Predicting Humor in Dialogues , 2016, NAACL.

[12] George Trigeorgis,et al. Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13] Koray Kavukcuoglu,et al. Pixel Recurrent Neural Networks , 2016, ICML.

[14] Jianfeng Gao,et al. Reasoning in Vector Space: An Exploratory Study of Question Answering , 2016, ICLR.

[15] Jason Yosinski,et al. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Eduardo Coutinho,et al. Cooperative Learning and its Application to Emotion Recognition from Speech , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[17] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[18] Ruslan Salakhutdinov,et al. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.

[19] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[20] Yifan Gong,et al. Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21] Geoffrey Zweig,et al. Recent advances in deep learning for speech research at Microsoft , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22] Xiao Li,et al. Machine Learning Paradigms for Speech Recognition: An Overview , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[23] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.

[24] Maja Pantic,et al. Classifying laughter and speech using audio-visual feature prediction , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25] Björn W. Schuller,et al. Recognition of spontaneous conversational speech using long short-term memory phoneme predictions , 2010, INTERSPEECH.

[26] Hui Lin,et al. A study on multilingual acoustic modeling for large vocabulary ASR , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[27] Björn W. Schuller,et al. Abandoning emotion classes - towards continuous emotion recognition with modelling of long-range dependencies , 2008, INTERSPEECH.

[28] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[29] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[30] Björn W. Schuller,et al. A Combined LSTM-RNN - HMM - Approach for Meeting Event Segmentation and Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.