Classifying Skewed Data: Importance Weighting to Optimize Average Recall

Promoted in part by its use in the Interspeech Challenges in 2009-2012, Average Recall has emerged as an attractive evaluation measure of classifier performance where the data has a skewed class distribution. In this paper, we show that importance weighting can be used to optimize Average Recall directly. We compare this approach to sampling techniques that have been previously used to classify skewed data. We demonstrate the use of this approach on the Interspeech 2009 Emotion Challenge tasks, and prosodic analysis tasks.

[1]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[2]  Foster Provost,et al.  The effect of class distribution on classifier learning , 2001 .

[3]  Rong Yan,et al.  On predicting rare classes with SVM ensembles in scene classification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[4]  Ian H. Witten,et al.  Weka: Practical machine learning tools and techniques with Java implementations , 1999 .

[5]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[6]  Claire Cardie,et al.  Improving Minority Class Prediction Using Case-Specific Feature Weights , 1997, ICML.

[7]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.

[8]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[9]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[10]  Carlos Busso,et al.  Emotion recognition using a hierarchical binary decision tree approach , 2011, Speech Commun..

[11]  Stephen Cox,et al.  Automatic pitch accent prediction for text-to-speech synthesis , 2007, INTERSPEECH.

[12]  Julia Hirschberg,et al.  Discourse Structure in Spoken Language: Studies on Speech Corpora , 1995 .

[13]  Julia Hirschberg,et al.  Turn-taking and affirmative cue words in task-oriented dialogue , 2009 .

[14]  Kai Ming Ting,et al.  An Instance-weighting Method to Induce Cost-sensitive Trees , 2001 .

[15]  Andrew Rosenberg,et al.  AutoBI - a tool for automatic toBI annotation , 2010, INTERSPEECH.

[16]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[17]  Yufeng Liu,et al.  Adaptive Weighted Learning for Unbalanced Multicategory Classification , 2009, Biometrics.

[18]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.