Predicting head pose from speech
暂无分享,去创建一个
[1] Carlo Magi,et al. Properties of line spectrum pair polynomials: a review , 2006 .
[2] Paul Debevec,et al. The Digital Emily project: photoreal facial modeling and animation , 2009, SIGGRAPH '09.
[3] Robert M. Gray,et al. An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..
[4] S. Kopp,et al. The Effects of an Embodied Agent´s Nonverbal Behavior on User's Evaluation and Behavioral Mimicry , 2007 .
[5] Brian Kingsbury,et al. New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[6] Mark Johnson,et al. An Improved Non-monotonic Transition System for Dependency Parsing , 2015, EMNLP.
[7] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[8] H. Brenton,et al. The Uncanny Valley : does it exist ? , 2005 .
[9] Jürgen Schmidhuber,et al. LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.
[10] A. Kendon. Do Gestures Communicate? A Review , 1994 .
[11] Michael M. Cohen,et al. Modeling Coarticulation in Synthetic Visual Speech , 1993 .
[12] Yukiko I. Nakano,et al. MACK: Media lab Autonomous Conversational Kiosk , 2002 .
[13] S Goldin-Meadow,et al. Silence is liberating: removing the handcuffs on grammatical expression in the manual modality. , 1996, Psychological review.
[14] Jaakko Lehtinen,et al. Production-level facial performance capture using deep convolutional neural networks , 2016, Symposium on Computer Animation.
[15] Iain Matthews,et al. Modeling and animating eye blinks , 2011, TAP.
[16] H. Schussler,et al. A stability theorem for discrete systems , 1976 .
[17] Pascal Vincent,et al. Dropout as data augmentation , 2015, ArXiv.
[18] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.
[19] Justine Cassell,et al. BEAT: the Behavior Expression Animation Toolkit , 2001, Life-like characters.
[20] Fan Bo. Head motion generation for speech-driven talking avatar , 2013 .
[21] Atef Ben Youssef,et al. Articulatory features for speech-driven head motion synthesis , 2013, INTERSPEECH.
[22] Marc Leman,et al. Content-Based Music Information Retrieval: Current Directions and Future Challenges , 2008, Proceedings of the IEEE.
[23] Björn W. Schuller,et al. Building autonomous sensitive artificial listeners (Extended abstract) , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).
[24] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[25] Roger K. Moore. A Bayesian explanation of the ‘Uncanny Valley’ effect and related psychological phenomena , 2012, Scientific Reports.
[26] J. Makhoul,et al. Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.
[27] Alex Graves,et al. Supervised Sequence Labelling , 2012 .
[28] Tara N. Sainath,et al. Improvements to filterbank and delta learning within a deep neural network framework , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] B. Butterworth,et al. Gesture, speech, and computational stages: a reply to McNeill. , 1989, Psychological review.
[30] V. Yngve. On getting a word in edgewise , 1970 .
[31] Neil A. Macmillan,et al. Detection Theory: A User's Guide , 1991 .
[32] Gregor Hofer,et al. Automatic head motion prediction from speech data , 2007, INTERSPEECH.
[33] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..
[34] Robert C. Hubal,et al. How do varied populations interact with embodied conversational agents? Findings from inner-city adolescents and prisoners , 2008, Comput. Hum. Behav..
[35] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[36] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[37] Michael I. Jordan. Attractor dynamics and parallelism in a connectionist sequential machine , 1990 .
[38] Stephen D. Laycock,et al. Predicting Head Pose from Speech with a Conditional Variational Autoencoder , 2017, INTERSPEECH.
[39] Brian Butterworth,et al. Gesture and Silence as Indicators of Planning in Speech , 1978 .
[40] Chris Dyer,et al. On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.
[41] Zhigang Deng,et al. Natural head motion synthesis driven by acoustic prosodic features , 2005, Comput. Animat. Virtual Worlds.
[42] Wesley Mattheyses,et al. Audiovisual speech synthesis: An overview of the state-of-the-art , 2015, Speech Commun..
[43] A. A. Mullin,et al. Principles of neurodynamics , 1962 .
[44] John Salvatier,et al. Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.
[45] E. Vatikiotis-Bateson,et al. Kinematics-Based Synthesis of Realistic Talking Faces , 1998, AVSP.
[46] Timothy F. Cootes,et al. Statistical models of appearance for medical image analysis and computer vision , 2001, SPIE Medical Imaging.
[47] Björn Stenger,et al. Expressive Visual Text-to-Speech Using Active Appearance Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[48] Jean-Yves Bouguet,et al. Camera calibration toolbox for matlab , 2001 .
[49] J. Loomis,et al. Interpersonal Distance in Immersive Virtual Environments , 2003, Personality & social psychology bulletin.
[50] M. Mori. THE UNCANNY VALLEY , 2020, The Monster Theory Reader.
[51] Martial Hebert,et al. An Uncertain Future: Forecasting from Static Images Using Variational Autoencoders , 2016, ECCV.
[52] Lei Xie,et al. Head motion synthesis from speech using deep neural networks , 2015, Multimedia Tools and Applications.
[53] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[54] Albrecht Rüdiger,et al. Spectrum and spectral density estimation by the Discrete Fourier transform (DFT), including a comprehensive list of window functions and some new at-top windows , 2002 .
[55] Yisong Yue,et al. A deep learning approach for generalized speech animation , 2017, ACM Trans. Graph..
[56] Beth Logan,et al. Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.
[57] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[58] Stephen D. Laycock,et al. Joint Learning of Facial Expression and Head Pose from Speech , 2018, INTERSPEECH.
[59] Naomi H. Feldman,et al. The influence of categories on perception: explaining the perceptual magnet effect as optimal statistical inference. , 2009, Psychological review.
[60] C. G. Fisher,et al. Confusions among visually perceived consonants. , 1968, Journal of speech and hearing research.
[61] Michael J. Black,et al. Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion , 1995, Proceedings of IEEE International Conference on Computer Vision.
[62] Stephen A. Zahorian,et al. Yet Another Algorithm for Pitch Tracking , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[63] P. Ekman,et al. Facial action coding system: a technique for the measurement of facial movement , 1978 .
[64] Emile A. Hendriks,et al. Action unit classification using active appearance models and conditional random fields , 2011, Cognitive Processing.
[65] John E. Markel,et al. Linear Prediction of Speech , 1976, Communication and Cybernetics.
[66] Hongdong Li,et al. A simple prior-free method for non-rigid structure-from-motion factorization , 2012, CVPR.
[67] A.R.D. Thornton,et al. Foundations of Modern Auditory Theory , 1970 .
[68] F. Itakura. Line spectrum representation of linear predictor coefficients of speech signals , 1975 .
[69] James J. Filliben,et al. NIST/SEMATECH e-Handbook of Statistical Methods; Chapter 1: Exploratory Data Analysis , 2003 .
[70] C. Creider. Hand and Mind: What Gestures Reveal about Thought , 1994 .
[71] Frédéric H. Pighin,et al. Expressive speech-driven facial animation , 2005, TOGS.
[72] Slav Petrov,et al. Globally Normalized Transition-Based Neural Networks , 2016, ACL.
[73] F ROSENBLATT,et al. The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.
[74] Arthur Schuster,et al. On the investigation of hidden periodicities with application to a supposed 26 day period of meteorological phenomena , 1898 .
[75] T. Kanade,et al. Reconstructing 3D Human Pose from 2D Image Landmarks , 2012, ECCV.
[76] E. Jentsch. On the psychology of the uncanny (1906) 1 , 1997 .
[77] E. B. Newman,et al. A Scale for the Measurement of the Psychological Magnitude Pitch , 1937 .
[78] Hiroshi Ishiguro,et al. The Perception of Humans and Robots: Uncanny Hills in Parietal Cortex , 2010 .
[79] Takaaki Kuratate,et al. Audio-visual synthesis of talking faces from speech production correlates. , 1999 .
[80] Dong Yu,et al. Speech emotion recognition using deep neural network and extreme learning machine , 2014, INTERSPEECH.
[81] M. Black. Avatars , 2008, BMJ : British Medical Journal.
[82] Hideki Kawahara,et al. YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.
[83] Bernhard P. Wrobel,et al. Multiple View Geometry in Computer Vision , 2001 .
[84] Ira Kemelmacher-Shlizerman,et al. Synthesizing Obama , 2017, ACM Trans. Graph..
[85] Christoph Bregler,et al. Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.
[86] H. Hotelling. Relations Between Two Sets of Variates , 1936 .
[87] Yoshua Bengio,et al. Deep Generative Stochastic Networks Trainable by Backprop , 2013, ICML.
[88] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.
[89] Samy Bengio,et al. Generating Sentences from a Continuous Space , 2015, CoNLL.
[90] John P. Lewis,et al. Universal capture: image-based facial animation for "The Matrix Reloaded" , 2003, SIGGRAPH '03.
[91] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[92] J. Cassell,et al. Nudge nudge wink wink: elements of face-to-face conversation for embodied conversational agents , 2001 .
[93] Dirk Heylen,et al. Generation of Facial Expressions from Emotion Using a Fuzzy Rule Based System , 2001, Australian Joint Conference on Artificial Intelligence.
[94] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[95] C. Hjortsjö. Man's face and mimic language , 1969 .
[96] Navdeep Jaitly,et al. Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.
[97] Nigel G. Ward,et al. Prosodic features which cue back-channel responses in English and Japanese , 2000 .
[98] Louis-Philippe Morency,et al. Predicting Listener Backchannels: A Probabilistic Multimodal Approach , 2008, IVA.
[99] Florian Metze,et al. Extracting deep bottleneck features using stacked auto-encoders , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[100] Geoffrey Zweig,et al. Recent advances in deep learning for speech research at Microsoft , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[101] Matthew Brand,et al. Voice puppetry , 1999, SIGGRAPH.
[102] Yifan Gong,et al. An analysis of convolutional neural networks for speech recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[103] Heiga Zen,et al. Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences , 2007, Comput. Speech Lang..
[104] Carlos Busso,et al. Joint Learning of Speech-Driven Facial Motion with Bidirectional Long-Short Term Memory , 2017, IVA.
[105] Joakim Nivre,et al. On the Semantics and Pragmatics of Linguistic Feedback , 1992, J. Semant..
[106] Mark Steedman,et al. Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents , 1994, SIGGRAPH.
[107] Zhengyou Zhang,et al. Flexible camera calibration by viewing a plane from unknown orientations , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.
[108] Samuel R. Bowman,et al. A Gold Standard Dependency Corpus for English , 2014, LREC.
[109] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[110] Yonghui Wu,et al. Exploring the Limits of Language Modeling , 2016, ArXiv.
[111] Lei Xie,et al. BLSTM neural networks for speech driven head motion synthesis , 2015, INTERSPEECH.
[112] M. Schroeder. Period histogram and product spectrum: new methods for fundamental-frequency measurement. , 1968, The Journal of the Acoustical Society of America.
[113] Honglak Lee,et al. Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.
[114] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[115] Jinho D. Choi. Dynamic Feature Induction: The Last Gist to the State-of-the-Art , 2016, NAACL.
[116] Frank K. Soong,et al. Text Driven 3D Photo-Realistic Talking Head , 2011, INTERSPEECH.
[117] Tara N. Sainath,et al. Improvements to Deep Convolutional Neural Networks for LVCSR , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[118] Hiroshi Shimodaira,et al. Bidirectional LSTM Networks Employing Stacked Bottleneck Features for Expressive Speech-Driven Head Motion Synthesis , 2016, IVA.
[119] Zhigang Deng,et al. Rigid Head Motion in Expressive Speech Animation: Analysis and Synthesis , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[120] Ray L. Birdwhistell,et al. Introduction to kinesics : an annotation system for analysis of body motion and gesture , 1952 .
[121] Volker Strom,et al. Visual prosody: facial movements accompanying speech , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.
[122] George Trigeorgis,et al. Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[123] Stephen D. Laycock,et al. Predicting Head Pose in Dyadic Conversation , 2017, IVA.
[124] Paul Boersma,et al. Praat, a system for doing phonetics by computer , 2002 .
[125] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[126] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.
[127] Etienne de Sevin,et al. A listener model: introducing personality traits , 2012, Journal on Multimodal User Interfaces.
[128] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[129] Theodore Raphan,et al. Rotation axes of the head during positioning, head shaking, and locomotion. , 2007, Journal of neurophysiology.
[130] P. Ekman,et al. The Repertoire of Nonverbal Behavior: Categories, Origins, Usage, and Coding , 1969 .
[131] Moshe Mahler,et al. Dynamic units of visual speech , 2012, SCA '12.
[132] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.
[133] A. Noll. Cepstrum pitch determination. , 1967, The Journal of the Acoustical Society of America.
[134] K. Dautenhahn,et al. Towards interactive robots in autism therapy: background, motivation and challenges , 2004 .
[135] Simon Baker,et al. Equivalence and efficiency of image alignment algorithms , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.
[136] Timothy F. Cootes,et al. Face Recognition Using Active Appearance Models , 1998, ECCV.
[137] J. Graftieaux. [The uncanny]. , 2011, Annales francaises d'anesthesie et de reanimation.
[138] V. Tiwari. MFCC and its applications in speaker recognition , 2010 .
[139] J. Gower. Generalized procrustes analysis , 1975 .
[140] Thomas Gold,et al. Hearing , 1953, Trans. IRE Prof. Group Inf. Theory.
[141] Timothy F. Cootes,et al. Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..
[142] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.
[143] T. Wickens. Elementary Signal Detection Theory , 2001 .
[144] Richard D. Hichwa,et al. A neural basis for lexical retrieval , 1996, Nature.
[145] Stéphane Bouchard,et al. Virtual Reality Therapy Versus Cognitive Behavior Therapy for Social Phobia: A Preliminary Controlled Study , 2005, Cyberpsychology Behav. Soc. Netw..
[146] Zhigang Deng,et al. Audio-based head motion synthesis for Avatar-based telepresence systems , 2004, ETP '04.
[147] Jeffery A. Jones,et al. Visual Prosody and Speech Intelligibility , 2004, Psychological science.