A Novel Dynamic Model Capturing Spatial and Temporal Patterns for Facial Expression Analysis

Facial expression analysis could be greatly improved by incorporating spatial and temporal patterns present in facial behavior, but the patterns have not yet been utilized to their full advantage. We remedy this via a novel dynamic model-an interval temporal restricted Boltzmann machine (IT-RBM) - that is able to capture both universal spatial patterns and complicated temporal patterns in facial behavior for facial expression analysis. We regard a facial expression as a multifarious activity composed of sequential or overlapping primitive facial events. Allen's interval algebra is implemented to portray these complicated temporal patterns via a two-layer Bayesian network. The nodes in the upper-most layer are representative of the primitive facial events, and the nodes in the lower layer depict the temporal relationships between those events. Our model also captures inherent universal spatial patterns via a multi-value restricted Boltzmann machine in which the visible nodes are facial events, and the connections between hidden and visible nodes model intrinsic spatial patterns. Efficient learning and inference algorithms are proposed. Experiments on posed and spontaneous expression distinction and expression recognition demonstrate that our proposed IT-RBM achieves superior performance compared to state-of-the art research due to its ability to incorporate these facial behavior patterns.

[1]  André Elisseeff,et al.  Using Markov Blankets for Causal Structure Learning , 2008, J. Mach. Learn. Res..

[2]  Jean-Marc Odobez,et al.  EYEDIAP: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras , 2014, ETRA.

[3]  S. Martinez-Conde,et al.  The impact of microsaccades on vision: towards a unified theory of saccadic function , 2013, Nature Reviews Neuroscience.

[4]  Ramakant Nevatia,et al.  Recognition and Segmentation of 3-D Human Action Using HMM and Multi-class AdaBoost , 2006, ECCV.

[5]  Fei-Fei Li,et al.  Modeling mutual context of object and human pose in human-object interaction activities , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  M. Seckington,et al.  Using Dynamic Bayesian Networks for Posed versus Spontaneous Facial Expression Recognition , 2011 .

[7]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  Yoichi Sato,et al.  Appearance-Based Gaze Estimation Using Visual Saliency , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Marcus Nyström,et al.  Detection of fixations and smooth pursuit movements in high-speed eye-tracking data , 2015, Biomed. Signal Process. Control..

[11]  Makoto Miyatani,et al.  Spontaneous Facial Expressions Are Different from Posed Facial Expressions: Morphological Properties and Dynamic Sequences , 2017 .

[12]  Manuela M. Veloso,et al.  Conditional random fields for activity recognition , 2007, AAMAS '07.

[13]  Mohammad H. Mahoor,et al.  Extended DISFA Dataset: Investigating Posed and Spontaneous Facial Expressions , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[15]  Hao Hu,et al.  State-Frequency Memory Recurrent Neural Networks , 2017, ICML.

[16]  Jingdong Wang,et al.  Interleaved Group Convolutions , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Shaogang Gong,et al.  Video Behaviour Mining Using a Dynamic Topic Model , 2011, International Journal of Computer Vision.

[18]  Yang Wang,et al.  Discriminative Latent Models for Recognizing Contextual Group Activities , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[20]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[21]  Tsuhan Chen,et al.  Spatio-Temporal Phrases for Activity Recognition , 2012, ECCV.

[22]  Qiang Ji,et al.  Capturing Complex Spatio-temporal Relations among Facial Muscles for Facial Expression Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Qiang Ji,et al.  3D gaze estimation without explicit personal calibration , 2018, Pattern Recognit..

[24]  Thiago Santini,et al.  Bayesian identification of fixations, saccades, and smooth pursuits , 2015, ETRA.

[25]  Ole Winther,et al.  Sequential Neural Models with Stochastic Layers , 2016, NIPS.

[26]  Constantin F. Aliferis,et al.  Time and sample efficient discovery of Markov blankets and direct causal relations , 2003, KDD '03.

[27]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[28]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[29]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[30]  Qiang Ji,et al.  Differentiating Between Posed and Spontaneous Expressions with Latent Regression Bayesian Network , 2017, AAAI.

[31]  Jamaliah Taslim,et al.  Eye Tracking Analysis of User Behavior in Online Social Networks , 2013, HCI.

[32]  Sebastian Thrun,et al.  Bayesian Network Induction via Local Neighborhoods , 1999, NIPS.

[33]  Qiang Ji,et al.  In the Eye of the Beholder: A Survey of Models for Eyes and Gaze , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Charu C. Aggarwal,et al.  Link prediction across networks by biased cross-network sampling , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[35]  Qiang Ji,et al.  Incorporating contextual knowledge to Dynamic Bayesian Networks for event recognition , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[36]  Maja Pantic,et al.  Automatic Analysis of Facial Actions: A Survey , 2019, IEEE Transactions on Affective Computing.

[37]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Georgios Tzimiropoulos,et al.  How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Yoshua Bengio,et al.  Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Geoffrey E. Hinton,et al.  Modeling Human Motion Using Binary Latent Variables , 2006, NIPS.

[41]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[42]  Kamal Nasrollahi,et al.  Deep Pain: Exploiting Long Short-Term Memory Networks for Facial Expression Classification , 2017, IEEE Transactions on Cybernetics.

[43]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[44]  P. Ekman Darwin, Deception, and Facial Expression , 2003, Annals of the New York Academy of Sciences.

[45]  Qiong Huang,et al.  TabletGaze: dataset and analysis for unconstrained appearance-based gaze estimation in mobile tablets , 2017, Machine Vision and Applications.

[46]  Jiji Zhang,et al.  On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias , 2008, Artif. Intell..

[47]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[48]  Zhi Liu,et al.  Saccadic model of eye movements for free-viewing condition , 2015, Vision Research.

[49]  André Elisseeff,et al.  Finding Latent Causes in Causal Networks: an Efficient Approach Based on Markov Blankets , 2008, NIPS.

[50]  Matthew Turk,et al.  View-based interpretation of real-time optical flow for gesture recognition , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[51]  Amit K. Roy-Chowdhury,et al.  Context-Aware Modeling and Recognition of Activities in Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Miguel Á. Carreira-Perpiñán,et al.  Multiscale conditional random fields for image labeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[53]  Jiajia Yang,et al.  Capturing Spatial and Temporal Patterns for Distinguishing between Posed and Spontaneous Expressions , 2017, ACM Multimedia.

[54]  Luc Van Gool,et al.  Variations of a Hough-Voting Action Recognition System , 2010, ICPR Contests.

[55]  Qiang Ji,et al.  Video event recognition with deep hierarchical context model , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Jeffrey F. Cohn,et al.  The Timing of Facial Motion in posed and Spontaneous Smiles , 2003, Int. J. Wavelets Multiresolution Inf. Process..

[57]  Anthony Hoogs,et al.  Unsupervised Learning of Activities in Video Using Scene Context , 2010, 2010 20th International Conference on Pattern Recognition.

[58]  Rui Zhao,et al.  Generalizing Eye Tracking With Bayesian Adversarial Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[60]  Juan Carlos Niebles,et al.  A Hierarchical Model of Shape and Appearance for Human Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[61]  Yoshua Bengio,et al.  A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[62]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2002, J. Mach. Learn. Res..

[63]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[64]  Albert Ali Salah,et al.  Eyes do not lie: spontaneous versus posed smiles , 2010, ACM Multimedia.

[65]  Antonio Torralba,et al.  Learning hierarchical models of scenes, objects, and parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[66]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[67]  Maja Pantic,et al.  Fully Automatic Recognition of the Temporal Phases of Facial Actions , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[68]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[69]  Yiannis Demiris,et al.  RT-GENE: Real-Time Eye Gaze Estimation in Natural Environments , 2018, ECCV.

[70]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[71]  Jun Wang,et al.  Posed and spontaneous expression recognition through modeling their spatial patterns , 2015, Machine Vision and Applications.

[72]  Marcus Nyström,et al.  An adaptive algorithm for fixation, saccade, and glissade detection in eyetracking data , 2010, Behavior research methods.

[73]  Zhe Gan,et al.  Deep Temporal Sigmoid Belief Networks for Sequence Modeling , 2015, NIPS.

[74]  Meng Wang,et al.  Tri-Clustered Tensor Completion for Social-Aware Image Tag Refinement , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[75]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[76]  Richard Scheines,et al.  Constructing Bayesian Network Models of Gene Expression Networks from Microarray Data , 2000 .

[77]  A. G. Amitha Perera,et al.  Human Action Recognition in Large-Scale Datasets Using Histogram of Spatiotemporal Gradients , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[78]  Laurent Itti,et al.  The role of memory in guiding attention during natural vision. , 2006, Journal of vision.

[79]  Rajeev Motwani,et al.  Scalable Techniques for Mining Causal Structures , 1998, Data Mining and Knowledge Discovery.

[80]  Qiang Ji,et al.  A Hierarchical Context Model for Event Recognition in Surveillance Video , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[81]  Geoffrey E. Hinton,et al.  An Efficient Learning Procedure for Deep Boltzmann Machines , 2012, Neural Computation.

[82]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[83]  Silvio Savarese,et al.  Structured Recurrent Temporal Restricted Boltzmann Machines , 2014, ICML.

[84]  John M. Henderson,et al.  Clustering of Gaze During Dynamic Scene Viewing is Predicted by Motion , 2011, Cognitive Computation.

[85]  Mario Fritz,et al.  Appearance-based gaze estimation in the wild , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[86]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[87]  Ryan P. Adams,et al.  Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.

[88]  Qiang Ji,et al.  Learning dynamic Bayesian network discriminatively for human activity recognition , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[89]  Martial Hebert,et al.  An Uncertain Future: Forecasting from Static Images Using Variational Autoencoders , 2016, ECCV.

[90]  Qiang Ji,et al.  A Probabilistic Approach to Online Eye Gaze Tracking Without Explicit Personal Calibration , 2015, IEEE Transactions on Image Processing.

[91]  Jake K. Aggarwal,et al.  A hierarchical Bayesian network for event recognition of human actions and interactions , 2004, Multimedia Systems.

[92]  Qian Zhao,et al.  Gaze Prediction for Recommender Systems , 2016, RecSys.

[93]  Juan Carlos Niebles,et al.  Spatio-temporal Human-Object Interactions for Action Recognition in Videos , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[94]  Charu C. Aggarwal,et al.  Online community detection in social sensing , 2013, WSDM.

[95]  Thomas S. Richardson,et al.  Causal Inference in the Presence of Latent Variables and Selection Bias , 1995, UAI.

[96]  Larry S. Davis,et al.  Objects in Action: An Approach for Combining Action Understanding and Object Perception , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[97]  Constantin F. Aliferis,et al.  Bayesian Algorithms for Causal Data Mining , 2008, NIPS Causality: Objectives and Assessment.

[98]  PietikainenMatti,et al.  Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions , 2007 .

[99]  Joseph H. Goldberg,et al.  Eye tracking in web search tasks: design implications , 2002, ETRA.

[100]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[101]  Qiang Ji,et al.  Human Computer Interaction with Head Pose, Eye Gaze and Body Gestures , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[102]  Daniel Weiskopf,et al.  Benchmark data for evaluating visualization and analysis techniques for eye tracking for video stimuli , 2014, BELIV.

[103]  Leonid Sigal,et al.  Poselet Key-Framing: A Model for Human Activity Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[104]  Qiang Ji,et al.  Head Pose Estimation on Low-Quality Images , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[105]  David Beymer,et al.  Eye gaze tracking using an active stereo head , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[106]  Feng Lu,et al.  Appearance-Based Gaze Estimation via Evaluation-Guided Asymmetric Regression , 2018, ECCV.

[107]  Vladimir Pavlovic,et al.  A New Adaptive Segmental Matching Measure for Human Activity Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[108]  E. Ross,et al.  Posed versus spontaneous facial expressions are modulated by opposite cerebral hemispheres , 2013, Cortex.

[109]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[110]  Andreas Glöckner,et al.  The Dynamics of Decision Making in Risky Choice: An Eye-Tracking Analysis , 2012, Front. Psychology.

[111]  Jinhui Tang,et al.  Supervised Quantization for Similarity Search , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[112]  Gregory F. Cooper,et al.  Causal Discovery Using A Bayesian Local Causal Discovery Algorithm , 2004, MedInfo.

[113]  Mohan M. Trivedi,et al.  Head Pose Estimation in Computer Vision: A Survey , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[114]  Sebastian Nowozin,et al.  f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[115]  Carlos Hitoshi Morimoto,et al.  Eye gaze tracking techniques for interactive applications , 2005, Comput. Vis. Image Underst..

[116]  Qiang Ji,et al.  CIRA: An Architecture for Building Configurable Immersive Smart-Rooms , 2018, IntelliSys.

[117]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[118]  Moshe Eizenman,et al.  General theory of remote gaze estimation using the pupil center and corneal reflections , 2006, IEEE Transactions on Biomedical Engineering.

[119]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[120]  Xiaogang Wang,et al.  Multi-stage Contextual Deep Learning for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[121]  P. Ekman Emotions revealed , 2004, BMJ.

[122]  Geoffrey E. Hinton,et al.  Learning Multilevel Distributed Representations for High-Dimensional Sequences , 2007, AISTATS.

[123]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[124]  Jake K. Aggarwal,et al.  Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[125]  Qiang Ji,et al.  Hybrid model and appearance based eye tracking with kinect , 2016, ETRA.

[126]  Qiang Ji,et al.  An Immersive System with Multi-Modal Human-Computer Interaction , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[127]  Constantin F. Aliferis,et al.  Causal Feature Selection , 2007 .

[128]  M. Pantic,et al.  Induced Disgust , Happiness and Surprise : an Addition to the MMI Facial Expression Database , 2010 .

[129]  Yoram Singer,et al.  The Hierarchical Hidden Markov Model: Analysis and Applications , 1998, Machine Learning.

[130]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[131]  Shunta Saito,et al.  Temporal Generative Adversarial Nets with Singular Value Clipping , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[132]  Jesper Tegnér,et al.  Towards scalable and data efficient learning of Markov boundaries , 2007, Int. J. Approx. Reason..

[133]  Nicu Sebe,et al.  Facial expression recognition from video sequences: temporal and static modeling , 2003, Comput. Vis. Image Underst..

[134]  Matti Pietikäinen,et al.  Differentiating spontaneous from posed facial expressions within a generic facial expression recognition framework , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[135]  Lifeng Shang,et al.  Nonparametric discriminant HMM and application to facial expression recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[136]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[137]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[138]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[139]  Qiang Ji,et al.  A Hierarchical Generative Model for Eye Image Synthesis and Eye Gaze Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[140]  Christof Koch,et al.  Predicting human gaze using low-level saliency combined with face detection , 2007, NIPS.

[141]  Albert Ali Salah,et al.  Recognition of Genuine Smiles , 2015, IEEE Transactions on Multimedia.

[142]  Christopher Meek,et al.  Causal inference and causal explanation with background knowledge , 1995, UAI.

[143]  James M. Joyce Kullback-Leibler Divergence , 2011, International Encyclopedia of Statistical Science.

[144]  Kevin P. Murphy Hidden semi-Markov models ( HSMMs ) , 2002 .

[145]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[146]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[147]  Hichem Sahbi,et al.  Mid-level features and spatio-temporal context for activity recognition , 2012, Pattern Recognit..

[148]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[149]  Karen L. Schmidt,et al.  Comparison of Deliberate and Spontaneous Facial Movement in Smiles and Eyebrow Raises , 2009, Journal of nonverbal behavior.

[150]  Qiang Ji,et al.  Context augmented Dynamic Bayesian Networks for event recognition , 2014, Pattern Recognit. Lett..

[151]  Yann LeCun,et al.  Convolutional Learning of Spatio-temporal Features , 2010, ECCV.

[152]  Gregory F. Cooper,et al.  Causal Discovery from Population-Based Infant Birth and Death Records , 1999, AAAI/IAAI.

[153]  Jiajun Wu,et al.  Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks , 2016, NIPS.

[154]  Jonathon Shlens,et al.  Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[155]  Xuewu Zhang,et al.  Spontaneous versus posed smile recognition via region-specific texture descriptor and geometric facial dynamics , 2017, Frontiers of Information Technology & Electronic Engineering.

[156]  P. Ekman,et al.  Felt, false, and miserable smiles , 1982 .

[157]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[158]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[159]  Andrea Cavallaro,et al.  Learning Bases of Activity for Facial Expression Recognition , 2017, IEEE Transactions on Image Processing.

[160]  Wojciech Matusik,et al.  Eye Tracking for Everyone , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[161]  Tsuhan Chen,et al.  Estimating age, gender, and identity using first name priors , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[162]  Jean-Marc Odobez,et al.  A Sequential Topic Model for Mining Recurrent Activities from Long Term Video Logs , 2013, International Journal of Computer Vision.

[163]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[164]  Zicheng Liu,et al.  Eye gaze tracking using an RGBD camera: a comparison with a RGB solution , 2014, UbiComp Adjunct.

[165]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[166]  Qiang Ji,et al.  Deep eye fixation map learning for calibration-free eye gaze tracking , 2016, ETRA.

[167]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[168]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[169]  Sergio Escalera,et al.  Survey on RGB, 3D, Thermal, and Multimodal Approaches for Facial Expression Recognition: History, Trends, and Affect-Related Applications , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[170]  Ruslan Salakhutdinov,et al.  On the quantitative analysis of deep belief networks , 2008, ICML '08.

[171]  Ruzena Bajcsy,et al.  Berkeley MHAD: A comprehensive Multimodal Human Action Database , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[172]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[173]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[174]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[175]  Jintao Li,et al.  Hierarchical spatio-temporal context modeling for action recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[176]  Fei-Fei Li,et al.  Social Role Discovery in Human Events , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[177]  Eric Moulines,et al.  Quasi-Newton method for maximum likelihood estimation of hidden Markov models , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[178]  Gang Yu,et al.  Propagative Hough Voting for Human Activity Recognition , 2012, ECCV.

[179]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[180]  Qiang Ji,et al.  Capturing global spatial patterns for distinguishing posed and spontaneous expressions , 2016, Comput. Vis. Image Underst..

[181]  Mohamed R. Amer,et al.  Sum-product networks for modeling activities with stochastic structure , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[182]  Maja Pantic,et al.  Spontaneous vs. posed facial behavior: automatic analysis of brow actions , 2006, ICMI '06.

[183]  Prabir Bhattacharya,et al.  A driver fatigue recognition model based on information fusion and dynamic Bayesian network , 2010, Inf. Sci..

[184]  Albert Ali Salah,et al.  Are You Really Smiling at Me? Spontaneous versus Posed Enjoyment Smiles , 2012, ECCV.

[185]  Charu C. Aggarwal,et al.  Factorized Similarity Learning in Networks , 2014, 2014 IEEE International Conference on Data Mining.

[186]  P. Ekman,et al.  The symmetry of emotional and deliberate facial actions. , 1981, Psychophysiology.

[187]  Larry S. Davis,et al.  AVSS 2011 demo session: A large-scale benchmark dataset for event recognition in surveillance video , 2011, AVSS.

[188]  Wolfgang Rosenstiel,et al.  Bayesian online clustering of eye movement data , 2012, ETRA.

[189]  Shunzheng Yu,et al.  Hidden semi-Markov models , 2010, Artif. Intell..

[190]  Shigang Li,et al.  Eye-Model-Based Gaze Estimation by RGB-D Camera , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[191]  Qiang Ji,et al.  Real time eye gaze tracking with Kinect , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[192]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[193]  Michael S. Ryoo,et al.  Human activity prediction: Early recognition of ongoing activities from streaming videos , 2011, 2011 International Conference on Computer Vision.

[194]  Mohammed Bennamoun,et al.  A spatio-temporal RBM-based model for facial expression recognition , 2016, Pattern Recognit..

[195]  Margrit Betke,et al.  Personalizing Gesture Recognition Using Hierarchical Bayesian Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[196]  Jitendra Malik,et al.  Recurrent Network Models for Human Dynamics , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[197]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[198]  S. B. Hutton,et al.  Eye Tracking Methodology , 2019, Eye Movement Research.

[199]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[200]  Qiang Ji,et al.  Real Time Eye Gaze Tracking with 3D Deformable Eye-Face Model , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[201]  Ying Wu,et al.  Action recognition with multiscale spatio-temporal contexts , 2011, CVPR 2011.

[202]  Yusheng Ji,et al.  Hidden Markov Model for eye gaze prediction in networked video streaming , 2011, 2011 IEEE International Conference on Multimedia and Expo.

[203]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[204]  Peter Robinson,et al.  Real-Time Inference of Complex Mental States from Facial Expressions and Head Gestures , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[205]  Marian Stewart Bartlett,et al.  Facial expression recognition using Gabor motion energy filters , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[206]  Gwen Littlewort,et al.  Automatic coding of facial expressions displayed during posed and genuine pain , 2009, Image Vis. Comput..

[207]  Gregory F. Cooper,et al.  A Simple Constraint-Based Algorithm for Efficiently Mining Observational Databases for Causal Relationships , 1997, Data Mining and Knowledge Discovery.

[208]  A. L. Yarbus Eye Movements During Perception of Complex Objects , 1967 .

[209]  Amit K. Roy-Chowdhury,et al.  Continuous Learning of Human Activity Models Using Deep Nets , 2014, ECCV.

[210]  Hong Liu,et al.  Spontaneous versus posed smile recognition using discriminative local spatial-temporal descriptors , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).