Strong and Simple Baselines for Multimodal Utterance Embeddings
暂无分享,去创建一个
Ruslan Salakhutdinov | Louis-Philippe Morency | Paul Pu Liang | Yao-Hung Hubert Tsai | Yao Chong Lim | R. Salakhutdinov | Louis-Philippe Morency | Y. Lim | P. Liang
[1] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[2] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..
[3] Paavo Alku,et al. Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering , 1991, Speech Commun..
[4] D G Childers,et al. Vocal quality factors: analysis, synthesis, and perception. , 1991, The Journal of the Acoustical Society of America.
[5] M. Knapp,et al. The Interaction of Visual and Verbal Features in Human Communication , 1992 .
[6] Paavo Alku,et al. Parabolic spectral parameter - A new method for quantification of the glottal flow , 1997, Speech Commun..
[7] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[8] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..
[9] Lakhmi C. Jain,et al. Recurrent Neural Networks: Design and Applications , 1999 .
[10] P. Tseng. Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .
[11] P. Alku,et al. Normalized amplitude quotient for parametrization of the glottal flow. , 2002, The Journal of the Acoustical Society of America.
[12] Marilyn A. Walker,et al. MATCH: An Architecture for Multimodal Dialogue Systems , 2002, ACL.
[13] Cynthia LeRouge,et al. Developing multimodal intelligent affective interfaces for tele-home health care , 2003, Int. J. Hum. Comput. Stud..
[14] Ronald,et al. Learning representations by backpropagating errors , 2004 .
[15] Alexander I. Rudnicky. Multimodal Dialogue Systems , 2005 .
[16] Mei-Chen Yeh,et al. Fast Human Detection Using a Cascade of Histograms of Oriented Gradients , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).
[17] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[18] Geoffrey E. Hinton,et al. Three new graphical models for statistical language modelling , 2007, ICML '07.
[19] Mark Liberman,et al. Speaker identification on the SCOTUS corpus , 2008 .
[20] Peter Robinson,et al. Multimodal Affect Recognition in Intelligent Tutoring Systems , 2011, ACII.
[21] Fernando De la Torre,et al. Facial Expression Analysis , 2011, Visual Analysis of Humans.
[22] Abeer Alwan,et al. Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics , 2019, INTERSPEECH.
[23] Nitish Srivastava,et al. Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..
[24] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[25] Xia Mao,et al. Multimodal Intelligent Tutoring Systems , 2012 .
[26] Patrick A. Naylor,et al. Detection of Glottal Closure Instants From Speech Signals: A Quantitative Review , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[27] Stephen Grossberg,et al. Recurrent neural networks , 2013, Scholarpedia.
[28] John Kane,et al. Wavelet Maxima Dispersion for Breathy to Tense Voice Discrimination , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[29] Yuki Suga,et al. Multimodal integration learning of robot behavior using deep neural networks , 2014, Robotics Auton. Syst..
[30] John Kane,et al. COVAREP — A collaborative voice analysis repository for speech technologies , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[31] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.
[32] Louis-Philippe Morency,et al. Computational Analysis of Persuasiveness in Social Multimedia: A Novel Dataset and Multimodal Prediction Approach , 2014, ICMI.
[33] G. Vigliocco,et al. Language as a multimodal phenomenon: implications for language learning, processing and evolution , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.
[34] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[35] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[36] Stephen J. Wright. Coordinate descent algorithms , 2015, Mathematical Programming.
[37] Dan Klein,et al. When and why are log-linear models self-normalizing? , 2015, NAACL.
[38] Hal Daumé,et al. Deep Unordered Composition Rivals Syntactic Methods for Text Classification , 2015, ACL.
[39] Thea van der Geest,et al. Focus on Accessibility: Multimodal Healthcare Technology for All , 2016, MMHealth@ACM Multimedia.
[40] Sanjeev Arora,et al. A Latent Variable Model Approach to PMI-based Word Embeddings , 2015, TACL.
[41] Roland Göcke,et al. Extending Long Short-Term Memory for Multi-View Structured Learning , 2016, ECCV.
[42] Louis-Philippe Morency,et al. Deep multimodal fusion for persuasiveness prediction , 2016, ICMI.
[43] Louis-Philippe Morency,et al. Multimodal Sentiment Intensity Analysis in Videos: Facial Gestures and Verbal Messages , 2016, IEEE Intelligent Systems.
[44] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[45] Sanjeev Arora,et al. A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.
[46] Erik Cambria,et al. Tensor Fusion Network for Multimodal Sentiment Analysis , 2017, EMNLP.
[47] Charles Blundell,et al. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.
[48] Juan Chica,et al. A Multimodal Robot Based Model for the Preservation of Intangible Cultural Heritage , 2017, HRI.
[49] Ersin Yumer,et al. Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[50] Daniel Sonntag,et al. Interakt - A Multimodal Multisensory Interactive Cognitive Assessment Tool , 2017, ArXiv.
[51] Aaron C. Courville,et al. Improved Training of Wasserstein GANs , 2017, NIPS.
[52] Graham Neubig,et al. Stronger Baselines for Trustable Results in Neural Machine Translation , 2017, NMT@ACL.
[53] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[54] Erik Cambria,et al. Memory Fusion Network for Multi-view Sequential Learning , 2018, AAAI.
[55] Guoyin Wang,et al. Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms , 2018, ACL.
[56] Vladlen Koltun,et al. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.
[57] Jingtao Wang,et al. Predicting Learners' Emotions in Mobile MOOC Learning via a Multimodal Intelligent Tutor , 2018, ITS.
[58] Louis-Philippe Morency,et al. Multimodal Language Analysis with Recurrent Multistage Fusion , 2018, EMNLP.
[59] Chan Woo Lee,et al. Convolutional Attention Networks for Multimodal Emotion Recognition from Speech and Text Data , 2018, ArXiv.
[60] Ruslan Salakhutdinov,et al. Learning Factorized Multimodal Representations , 2018, ICLR.
[61] Louis-Philippe Morency,et al. Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[62] Douwe Kiela,et al. No Training Required: Exploring Random Encoders for Sentence Classification , 2019, ICLR.
[63] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.