Uncertainty Flow Facilitates Zero-Shot Multi-Label Learning in Affective Facial Analysis

Featured Application: The proposed Uncertainty Flow framework may benefit the facial analysis with its promised elevation in discriminability in multi-label affective classification tasks. Moreover, this framework also allows the efficient model training and between tasks knowledge transfer. The applications that rely heavily on continuous prediction on emotional valance, e.g., to monitor prisoners’ emotional stability in jail, can be directly benefited from our framework. Abstract: To lower the single-label dependency on affective facial analysis, it urges the fruition of multi-label affective learning. The impediment to practical implementation of existing multi-label algorithms pertains to scarcity of scalable multi-label training datasets. To resolve this, an inductive transfer learning based framework, i.e.,Uncertainty Flow, is put forward in this research to allow knowledge transfer from a single labelled emotion recognition task to a multi-label affective recognition task. I.e., the model uncertainty—which can be quantified in Uncertainty Flow—is distilled from a single-label learning task. The distilled model uncertainty ensures the later efficient zero-shot multi-label affective learning. On the theoretical perspective, within our proposed Uncertainty Flow framework, the feasibility of applying weakly informative priors, e.g., uniform and Cauchy prior, is fully explored in this research. More importantly, based on the derived weight uncertainty, three sets of prediction related uncertainty indexes, i.e., soft-max uncertainty, pure uncertainty and uncertainty plus are proposed to produce reliable and accurate multi-label predictions. Validated on our manual annotated evaluation dataset, i.e., the multi-label annotated FER2013, our proposed Uncertainty Flow in multi-label facial expression analysis exhibited superiority to conventional multi-label learning algorithms and multi-label compatible neural networks. The success of our proposed Uncertainty Flow provides a glimpse of future in continuous, uncertain, and multi-label affective computing.

[1]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[2]  Prachi Srivastava,et al.  A Practical Iterative Framework for Qualitative Data Analysis , 2009 .

[3]  Erik C. Nook,et al.  A new look at emotion perception: Concepts speed and shape facial emotion recognition. , 2015, Emotion.

[4]  Shlomo Bentin,et al.  Inherently Ambiguous: Facial Expressions of Emotions, in Context , 2013 .

[5]  Ioannis Hatzilygeroudis,et al.  Recognizing Emotions from Facial Expressions Using Neural Network , 2014, AIAI.

[6]  Naftali Tishby,et al.  Consistent inference of probabilities in layered networks: predictions and generalizations , 1989, International 1989 Joint Conference on Neural Networks.

[7]  Min-Ling Zhang,et al.  Ml-rbf: RBF Neural Networks for Multi-Label Learning , 2009, Neural Processing Letters.

[8]  Bingbing Ni,et al.  HCP: A Flexible CNN Framework for Multi-Label Image Classification , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Razvan Pascanu,et al.  Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.

[10]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[11]  Yann LeCun,et al.  Transforming Neural-Net Output Levels to Probability Distributions , 1990, NIPS.

[12]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[13]  Vinay Bettadapura,et al.  Face Expression Recognition and Analysis: The State of the Art , 2012, ArXiv.

[14]  Jacob Cohen,et al.  The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability , 1973 .

[15]  Andrea Cavallaro,et al.  Automatic Analysis of Facial Affect: A Survey of Registration, Representation, and Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Goran Martinović,et al.  Emotion Recognition System by a Neural Network Based Facial Expression Analysis , 2013 .

[17]  H. Jeffreys An invariant form for the prior probability in estimation problems , 1946, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[18]  Emad Barsoum,et al.  Training deep networks for facial expression recognition with crowd-sourced label distribution , 2016, ICMI.

[19]  Wei Tsih Lee,et al.  On Optimal Adaptive Classifier Design Criterion- How many hidden units are necessary for an optimal neural network classifier? , 1991 .

[20]  Changqin Quan,et al.  Harness the Model Uncertainty via Hierarchical Weakly Informative Priors in Bayesian Neural Network , 2017, ICRA 2017.

[21]  Creasy Problem,et al.  Reference Posterior Distributions for Bayesian Inference , 1979 .

[22]  Peter M. Williams,et al.  Bayesian Regularization and Pruning Using a Laplace Prior , 1995, Neural Computation.

[23]  M. G. Pittau,et al.  A weakly informative default prior distribution for logistic and other regression models , 2008, 0901.4011.

[24]  Carlos Busso,et al.  Interpreting ambiguous emotional expressions , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[25]  Grigorios Tsoumakas,et al.  Random k -Labelsets: An Ensemble Method for Multilabel Classification , 2007, ECML.

[26]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[27]  Zoubin Ghahramani,et al.  Probabilistic machine learning and artificial intelligence , 2015, Nature.

[28]  A. Gelman Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper) , 2004 .

[29]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[30]  Zhi-Hua Zhou,et al.  Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[31]  John Salvatier,et al.  Probabilistic programming in Python using PyMC3 , 2016, PeerJ Comput. Sci..

[32]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[33]  A. Lacic [A new look]. , 1989, Pielegniarka i polozna.

[34]  Yoshua Bengio,et al.  Knowledge Matters: Importance of Prior Information for Optimization , 2013, J. Mach. Learn. Res..

[35]  Juan José del Coz,et al.  Binary relevance efficacy for multilabel classification , 2012, Progress in Artificial Intelligence.

[36]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[37]  Saso Dzeroski,et al.  An extensive experimental comparison of methods for multi-label learning , 2012, Pattern Recognit..

[38]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[39]  Maja J. Mataric,et al.  A Framework for Automatic Human Emotion Classification Using Emotion Profiles , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[40]  Yoshua Bengio,et al.  Challenges in representation learning: A report on three machine learning contests , 2013, Neural Networks.

[41]  Honggang Zhang,et al.  Multi-label learning with prior knowledge for facial expression analysis , 2015, Neurocomputing.

[42]  Fernando Benites,et al.  HARAM: A Hierarchical ARAM Neural Network for Large-Scale Text Classification , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[43]  Zoubin Ghahramani,et al.  Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference , 2015, ArXiv.

[44]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.