Speech gesture generation from the trimodal context of text, audio, and speaker identity
暂无分享,去创建一个
Youngwoo Yoon | Geehyuk Lee | Minsu Jang | Joo-Haeng Lee | Jaeyeon Lee | Jaehong Kim | Bok Cha | Jaehong Kim | Minsu Jang | Youngwoo Yoon | Jaeyeon Lee | Bok Cha | Joo-Haeng Lee | Geehyuk Lee
[1] Leland McInnes,et al. UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..
[2] Stefan Kopp,et al. Towards a Common Framework for Multimodal Generation: The Behavior Markup Language , 2006, IVA.
[3] Yuyu Xu,et al. Virtual character performance from speech , 2013, SCA '13.
[4] Wojciech Zaremba,et al. Improved Techniques for Training GANs , 2016, NIPS.
[5] Kate L. Howard,et al. Why rate when you could compare? Using the “EloChoice” package to assess pairwise comparisons of perceived physical strength , 2018, PloS one.
[6] Frederick R. Forst,et al. On robust estimation of the location parameter , 1980 .
[7] Justine Cassell,et al. BEAT: the Behavior Expression Animation Toolkit , 2001, Life-like characters.
[8] Jitendra Malik,et al. Learning Individual Styles of Conversational Gesture , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Michael Neff,et al. Multi-objective adversarial gesture generation , 2019, MIG.
[10] Yaser Sheikh,et al. Towards Social Artificial Intelligence: Nonverbal Social Signal Prediction in a Triadic Interaction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Alexei A. Efros,et al. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[12] P. J. Huber. Robust Estimation of a Location Parameter , 1964 .
[13] M. Studdert-Kennedy. Hand and Mind: What Gestures Reveal About Thought. , 1994 .
[14] Dario Pavllo,et al. 3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Michael Kipp,et al. Gesture generation by imitation: from human behavior to computer character animation , 2005 .
[16] Louis-Philippe Morency,et al. Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[17] Ali Borji,et al. Pros and Cons of GAN Evaluation Measures , 2018, Comput. Vis. Image Underst..
[18] Autumn B. Hostetter,et al. Effects of personality and social situation on representational gesture production , 2012 .
[19] Jonas Beskow,et al. Style‐Controllable Speech‐Driven Gesture Synthesis Using Normalising Flows , 2020, Comput. Graph. Forum.
[20] Wei Chu,et al. Extensions of Gaussian processes for ranking: semi-supervised and active learning , 2005 .
[21] Matthias Scheutz,et al. Hand Gestures and Verbal Acknowledgments Improve Human-Robot Rapport , 2017, ICSR.
[22] Stefan Kopp,et al. The Relation of Speech and Gestures: Temporal Synchrony Follows Semantic Synchrony , 2011 .
[23] Naoshi Kaneko,et al. Analyzing Input and Output Representations for Speech-Driven Gesture Generation , 2019, IVA.
[24] Zheng Lin,et al. Learning Sentiment-Specific Word Embedding via Global Sentiment Representation , 2018, AAAI.
[25] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.
[26] Sjoerd van Steenkiste,et al. FVD: A new Metric for Video Generation , 2019, DGS@ICLR.
[27] P. Hagoort,et al. Synchronization of speech and gesture: evidence for interaction in action. , 2014, Journal of experimental psychology. General.
[28] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.
[29] Phillip Isola,et al. On the "steerability" of generative adversarial networks , 2019, ICLR.
[30] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[31] S. Levine,et al. Gesture controllers , 2010, ACM Trans. Graph..
[32] Bilge Mutlu,et al. Learning-Based Modeling of Multimodal Behaviors for Humanlike Robots , 2014, 2014 9th ACM/IEEE International Conference on Human-Robot Interaction (HRI).
[33] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.
[34] Carlos Busso,et al. Speech-driven Animation with Meaningful Behaviors , 2017, Speech Commun..
[35] Hans-Peter Seidel,et al. Annotated New Text Engine Animation Animation Lexicon Animation Gesture Profiles MR : . . . JL : . . . Gesture Generation Video Annotated Gesture Script , 2007 .
[36] Dominik Roblek,et al. Fréchet Audio Distance: A Metric for Evaluating Music Enhancement Algorithms , 2018, ArXiv.
[37] J. Burgoon,et al. Nonverbal Behaviors, Persuasion, and Credibility , 1990 .
[38] Stacy Marsella,et al. Predicting Co-verbal Gestures: A Deep and Temporal Modeling Approach , 2015, IVA.
[39] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[40] Stefan Kopp,et al. Gesture and speech in interaction: An overview , 2014, Speech Commun..
[41] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[42] Gustav Eje Henter,et al. Gesticulator: A framework for semantically-aware speech-driven gesture generation , 2020, ICMI.
[43] Alberto Menache,et al. Understanding Motion Capture for Computer Animation and Video Games , 1999 .
[44] Louis-Philippe Morency,et al. Language2Pose: Natural Language Grounded Pose Forecasting , 2019, 2019 International Conference on 3D Vision (3DV).
[45] Vladlen Koltun,et al. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.
[46] Youngwoo Yoon,et al. Robots Learn Social Skills: End-to-End Learning of Co-Speech Gesture Generation for Humanoid Robots , 2018, 2019 International Conference on Robotics and Automation (ICRA).
[47] Motion, Interaction and Games , 2019, MIG.
[48] Petra Himmel. Hand And Mind What Gestures Reveal About Thought , 2016 .
[49] Seunghoon Hong,et al. Diversity-Sensitive Conditional Generative Adversarial Networks , 2019, ICLR.
[50] Cristian Sminchisescu,et al. Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[51] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.
[52] Taewoo Kim,et al. C-3PO: Cyclic-Three-Phase Optimization for Human-Robot Motion Retargeting based on Reinforcement Learning , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).
[53] Sotaro Kita,et al. How representational gestures help speaking , 2000 .
[54] Andreas Aristidou,et al. Folk Dance Evaluation Using Laban Movement Analysis , 2015, JOCCH.
[55] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[56] Sriram Subramanian,et al. The effects of robot-performed co-verbal gesture on listener behaviour , 2011, 2011 11th IEEE-RAS International Conference on Humanoid Robots.
[57] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[58] Gabriel Skantze,et al. Multimodal Continuous Turn-Taking Prediction Using Multiscale RNNs , 2018, ICMI.
[59] D. McNeill. Gesture and Thought , 2005 .
[60] Wei Chu,et al. Extensions of Gaussian Processes for Ranking : Semi-supervised and Active Learning , 2005 .
[61] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[62] Dominik Roblek,et al. Fréchet Audio Distance: A Reference-Free Metric for Evaluating Music Enhancement Algorithms , 2019, INTERSPEECH.