Modeling Multiple Time Series Annotations as Noisy Distortions of the Ground Truth: An Expectation-Maximization Approach

Studies of time-continuous human behavioral phenomena often rely on ratings from multiple annotators. Since the ground truth of the target construct is often latent, the standard practice is to use ad-hoc metrics (such as averaging annotator ratings). Despite being easy to compute, such metrics may not provide accurate representations of the underlying construct. In this paper, we present a novel method for modeling multiple time series annotations over a continuous variable that computes the ground truth by modeling annotator specific distortions. We condition the ground truth on a set of features extracted from the data and further assume that the annotators provide their ratings as modification of the ground truth, with each annotator having specific distortion tendencies. We train the model using an Expectation-Maximization based algorithm and evaluate it on a study involving natural interaction between a child and a psychologist, to predict confidence ratings of the children’s smiles. We compare and analyze the model against two baselines where: (i) the ground truth in considered to be framewise mean of ratings from various annotators and, (ii) each annotator is assumed to bear a distinct time delay in annotation and their annotations are aligned before computing the framewise mean.

[1]  Dilek Z. Hakkani-Tür,et al.  Using context to improve emotion detection in spoken dialog systems , 2005, INTERSPEECH.

[2]  Tom Minka,et al.  How To Grade a Test Without Knowing the Answers - A Bayesian Graphical Model for Adaptive Crowdsourcing and Aptitude Testing , 2012, ICML.

[3]  G. S. Mudholkar Fisher's z‐Transformation , 2006 .

[4]  Björn W. Schuller,et al.  AVEC 2014: 3D Dimensional Affect and Depression Recognition Challenge , 2014, AVEC '14.

[5]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[6]  James M. Joyce Kullback-Leibler Divergence , 2011, International Encyclopedia of Statistical Science.

[7]  Tanaya Guha,et al.  Multimodal Prediction of Affective Dimensions and Depression in Human-Computer Interactions , 2014, AVEC '14.

[8]  Yi Ma,et al.  The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices , 2010, Journal of structural biology.

[9]  Carlos Busso,et al.  Correcting Time-Continuous Emotional Labels by Modeling the Reaction Lag of Evaluators , 2015, IEEE Transactions on Affective Computing.

[10]  Masashi Sugiyama,et al.  Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise , 2010 .

[11]  Vladimir Pavlovic,et al.  Dynamic Probabilistic CCA for Analysis of Affective Behaviour , 2012, ECCV.

[12]  Zoubin Ghahramani,et al.  A Unifying Review of Linear Gaussian Models , 1999, Neural Computation.

[13]  T. Ulrych,et al.  Physical Wavelet Frame Denoising , 2003 .

[14]  N. Rathlev,et al.  Time series analysis of variables associated with daily mean emergency department length of stay. , 2007, Annals of emergency medicine.

[15]  Shrikanth S. Narayanan,et al.  A Globally-Variant Locally-Constant Model for Fusion of Labels from Multiple Diverse Experts without Using Reference Labels , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Yishay Mansour,et al.  An Information-Theoretic Analysis of Hard and Soft Assignment Methods for Clustering , 1997, UAI.

[17]  D. Messinger,et al.  The interactive development of social smiling. , 2007, Advances in child development and behavior.

[18]  Wilson H. Tang,et al.  Probability concepts in engineering planning and design , 1984 .

[19]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[20]  Arthur Albert,et al.  Regression and the Moore-Penrose Pseudoinverse , 2012 .

[21]  Björn W. Schuller,et al.  AVEC 2013: the continuous audio/visual emotion and depression recognition challenge , 2013, AVEC@ACM Multimedia.

[22]  Rahul Gupta,et al.  Analysis of engagement behavior in children during dyadic interactions using prosodic cues , 2016, Comput. Speech Lang..

[23]  Chong Gu Adaptive Spline Smoothing in Non-Gaussian Regression Models , 1990 .

[24]  Yan Yan,et al.  $L_{1}$ -Norm Low-Rank Matrix Factorization by Variational Bayesian Method , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[25]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Mark W. Schmidt,et al.  Modeling annotator expertise: Learning when everybody knows a bit of something , 2010, AISTATS.

[27]  Bernard Widrow,et al.  Adaptive Signal Processing , 1985 .

[28]  Stephen L Taylor,et al.  Modelling Financial Time Series , 1987 .

[29]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[30]  Athanasios Katsamanis,et al.  Tracking continuous emotional trends of participants during affective dyadic interactions using body language and speech information , 2013, Image Vis. Comput..

[31]  Tanaya Guha,et al.  Affective Feature Design and Predicting Continuous Affective Dimensions from Music , 2014, MediaEval.

[32]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[33]  Panayiotis G. Georgiou,et al.  A Case Study: Detecting Counselor Reflections in Psychotherapy for Addictions using Linguistic Features , 2012, INTERSPEECH.

[34]  Yi-Hsuan Yang,et al.  Emotional Analysis of Music: A Comparison of Methods , 2014, ACM Multimedia.

[35]  Frederick Jelinek,et al.  Speech Recognition by Statistical Methods , 1976 .

[36]  Vladimir Pavlovic,et al.  Dynamic Probabilistic CCA for Analysis of Affective Behavior and Fusion of Continuous Annotations , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Dimitris G. Manolakis,et al.  Statistical and Adaptive Signal Processing: Spectral Estimation, Signal Modeling, Adaptive Filtering and Array Processing , 1999 .

[38]  Yu-Hao Huang FACE DETECTION AND SMILE DETECTION , 2009 .

[39]  Rana El Kaliouby,et al.  Smile or smirk? Automatic detection of spontaneous asymmetric smiles to understand viewer experience , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[40]  Marianne Baxter,et al.  Measuring Business Cycles: Approximate Band-Pass Filters for Economic Time Series , 1995, Review of Economics and Statistics.

[41]  Shrikanth Narayanan,et al.  ASSESSMENT OF A CHILD ’ S ENGAGEMENT USING SEQUENCE MODEL BASED FEATURES , 2013 .

[42]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[43]  Athanasios Katsamanis,et al.  Automatic classification of married couples' behavior using audio features , 2010, INTERSPEECH.

[44]  Richard A. Berk,et al.  Applied Time Series Analysis for the Social Sciences , 1980 .

[45]  Anders P. Eriksson,et al.  Efficient computation of robust low-rank matrix approximations in the presence of missing data using the L1 norm , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[46]  Gwen Littlewort,et al.  Toward Practical Smile Detection , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  D. Simonton,et al.  Sociocultural context of individual creativity: a transhistorical time-series analysis. , 1975, Journal of personality and social psychology.

[48]  G. Sapiro,et al.  A collaborative framework for 3D alignment and classification of heterogeneous subvolumes in cryo-electron tomography. , 2013, Journal of structural biology.

[49]  Grant T. Harris,et al.  Comparing Effect Sizes in Follow-Up Studies: ROC Area, Cohen's d, and r , 2005, Law and human behavior.

[50]  Rahul Gupta,et al.  Predicting client's inclination towards target behavior change in motivational interviewing and investigating the role of laughter , 2014, INTERSPEECH.

[51]  J. Neter,et al.  Applied Linear Regression Models , 1983 .

[52]  Vinod Chandran,et al.  Representation of facial expression categories in continuous arousal-valence space: Feature and correlation , 2014, Image Vis. Comput..