The Cost of Dichotomizing Continuous Labels for Binary Classification Problems: Deriving a Bayesian-Optimal Classifier

Many pattern recognition problems involve characterizing samples with continuous labels instead of discrete categories. While regression models are suitable for these learning tasks, these labels are often discretized into binary classes to formulate the problem as a conventional classification task (e.g., classes with low versus high values). This methodology brings intrinsic limitations on the classification performance. The continuous labels are typically normally-distributed, with many samples close to the boundary threshold, resulting in poor classification rates. Previous studies only use the discretized labels to train binary classifiers, neglecting the original, continuous labels. This study demonstrates that, even in binary classification problems, exploiting the original labels before splitting the classes can lead to better classification performance. This work proposes an optimal classifier based on the Bayesian maximum a posterior (MAP) criterion for these problems, which effectively utilizes the real-valued labels. We derive the theoretical average performance of this classifier, which can be considered as the expected upper bound performance for the task. Experimental evaluations on synthetic and real data sets show the improvement achieved by the proposed classifier, in contrast to conventional classifiers trained with binary labels. These evaluations clearly demonstrate the optimality of the proposed classifier, and the precision of the expected upper bound obtained by our derivation.

[1]  Fabio Valente,et al.  Predicting the conflict level in television political debates: an approach based on crowdsourcing, nonverbal communication and gaussian processes , 2012, ACM Multimedia.

[2]  Björn W. Schuller,et al.  AVEC 2011-The First International Audio/Visual Emotion Challenge , 2011, ACII.

[3]  Elmar Nöth,et al.  The INTERSPEECH 2012 Speaker Trait Challenge , 2012, INTERSPEECH.

[4]  J. Russell,et al.  Evidence for a three-factor theory of emotions , 1977 .

[5]  Marcello Gallucci,et al.  A conceptual and empirical examination of justifications for dichotomization. , 2009, Psychological methods.

[6]  Roddy Cowie,et al.  FEELTRACE: an instrument for recording perceived emotion in real time , 2000 .

[7]  Carlos Busso,et al.  Feature and model level compensation of lexical content for facial emotion recognition , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[8]  G. Zararsiz,et al.  MVN: An R Package for Assessing Multivariate Normality , 2014, R J..

[9]  R. Fisher The Advanced Theory of Statistics , 1943, Nature.

[10]  Jacob Cohen The Cost of Dichotomization , 1983 .

[11]  Kristopher J Preacher,et al.  On the practice of dichotomization of quantitative variables. , 2002, Psychological methods.

[12]  David A. Landgrebe,et al.  Covariance Matrix Estimation and Classification With Limited Training Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Joakim Gustafson,et al.  What makes a good speaker? subject ratings, acoustic measurements and perceptual evaluations , 2008, INTERSPEECH.

[14]  K. Scherer,et al.  Conscious emotional experience emerges as a function of multilevel, appraisal-driven response synchronization , 2008, Consciousness and Cognition.

[15]  Björn W. Schuller,et al.  The INTERSPEECH 2011 Speaker State Challenge , 2011, INTERSPEECH.

[16]  G Kendall Maurice,et al.  The Advanced Theory Of Statistics Vol-i , 1943 .

[17]  Carlos Busso,et al.  Predicting Perceived Visual and Cognitive Distractions of Drivers With Multimodal Features , 2015, IEEE Transactions on Intelligent Transportation Systems.

[18]  Carlos Busso,et al.  Analysis and Compensation of the Reaction Lag of Evaluators in Continuous Emotional Annotations , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[19]  Carlos Busso,et al.  A personalized emotion recognition system using an unsupervised feature adaptation scheme , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Björn W. Schuller,et al.  AVEC 2013: the continuous audio/visual emotion and depression recognition challenge , 2013, AVEC@ACM Multimedia.

[21]  K. Scherer,et al.  The World of Emotions is not Two-Dimensional , 2007, Psychological science.

[22]  Norbert Henze,et al.  A class of invariant consistent tests for multivariate normality , 1990 .

[23]  Patrick Royston,et al.  The cost of dichotomising continuous variables , 2006, BMJ : British Medical Journal.

[24]  H. Schlosberg Three dimensions of emotion. , 1954, Psychological review.

[25]  Andrew K. C. Wong,et al.  Synthesizing Statistical Knowledge from Incomplete Mixed-Mode Data , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Julia Hirschberg,et al.  Acoustic/prosodic and lexical correlates of charismatic speech , 2005, INTERSPEECH.

[27]  Björn W. Schuller,et al.  "Would You Buy a Car from Me?" - On the Likability of Telephone Voices , 2011, INTERSPEECH.

[28]  Tony R. Martinez,et al.  Improved Heterogeneous Distance Functions , 1996, J. Artif. Intell. Res..

[29]  Björn W. Schuller,et al.  Context-Sensitive Learning for Enhanced Audiovisual Emotion Classification , 2012, IEEE Transactions on Affective Computing.

[30]  Björn W. Schuller,et al.  AVEC 2012: the continuous audio/visual emotion challenge , 2012, ICMI '12.

[31]  Gwen Littlewort,et al.  Automatic Recognition of Facial Actions in Spontaneous Expressions , 2006, J. Multim..

[32]  Carlos Busso,et al.  Supervised domain adaptation for emotion recognition from speech , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[33]  Maja Pantic,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING , 2022 .

[34]  Björn W. Schuller,et al.  Data-driven clustering in emotional space for affect recognition using discriminatively trained LSTM networks , 2009, INTERSPEECH.

[35]  K. Mardia Measures of multivariate skewness and kurtosis with applications , 1970 .

[36]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[37]  P. Royston Approximating the Shapiro-Wilk W-test for non-normality , 1992 .

[38]  Dilek Z. Hakkani-Tür,et al.  Automatic characterization of speaking styles in educational videos , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[39]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[40]  W. Sheppard On the Application of the Theory of Error to Cases of Normal Distribution and Normal Correlation , 1899 .

[41]  Thomas S. Huang,et al.  Emotional expressions in audiovisual human computer interaction , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[42]  Jamie DeCoster,et al.  Best Practices for Using Median Splits, Artificial Categorization, and their Continuous Alternatives , 2011 .

[43]  Carlos Busso,et al.  Exploring Cross-Modality Affective Reactions for Audiovisual Emotion Recognition , 2013, IEEE Transactions on Affective Computing.

[44]  David P. Farrington,et al.  Some benefits of dichotomization in psychiatric and criminological research , 2000 .

[45]  Fabio Valente,et al.  The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism , 2013, INTERSPEECH.

[46]  Andreas Wendemuth,et al.  Determining the Smallest Emotional Unit for Level of Arousal Classification , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.