Sequential Labeling With Structural SVM Under Nondecomposable Losses

Sequential labeling addresses the classification of sequential data, which are widespread in fields as diverse as computer vision, finance, and genomics. The model traditionally used for sequential labeling is the hidden Markov model (HMM), where the sequence of class labels to be predicted is encoded as a Markov chain. In recent years, HMMs have benefited from minimum-loss training approaches, such as the structural support vector machine (SSVM), which, in many cases, has reported higher classification accuracy. However, the loss functions available for training are restricted to decomposable cases, such as the 0–1 loss and the Hamming loss. In many practical cases, other loss functions, such as those based on the $F_{1}$ measure, the precision/recall break-even point, and the average precision (AP), can describe desirable performance more effectively. For this reason, in this paper, we propose a training algorithm for SSVM that can minimize any loss based on the classification contingency table, and we present a training algorithm that minimizes an AP loss. Experimental results over a set of diverse and challenging data sets (TUM Kitchen, CMU Multimodal Activity, and Ozone Level Detection) show that the proposed training algorithms achieve significant improvements of the $F_{1}$ measure and AP compared with the conventional SSVM, and their performance is in line with or above that of other state-of-the-art sequential labeling approaches.

[1]  Klaus-Robert Müller,et al.  Efficient Algorithms for Exact Inference in Sequence Labeling SVMs , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Johan Bos,et al.  Elephant: Sequence Labeling for Word and Sentence Segmentation , 2013, EMNLP.

[3]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[4]  Massimo Piccardi,et al.  Sequential labeling with structural SVM under the F1 loss , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[5]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[6]  Tibério S. Caetano,et al.  Reverse Multi-Label Learning , 2010, NIPS.

[7]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[8]  Ivor W. Tsang,et al.  A Feature Selection Method for Multivariate Performance Measures , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Fernando De la Torre,et al.  Joint segmentation and classification of human actions in video , 2011, CVPR 2011.

[10]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[11]  Kun Zhang,et al.  Forecasting skewed biased stochastic ozone days: analyses, solutions and beyond , 2008, Knowledge and Information Systems.

[12]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[13]  Geoffrey Zweig,et al.  Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[14]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[15]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[16]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[17]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[18]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[19]  J. Galagan,et al.  Conrad: gene prediction using conditional random fields. , 2007, Genome research.

[20]  Moritz Tenorth,et al.  The TUM Kitchen Data Set of everyday manipulation activities for motion tracking and action recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[21]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[22]  nhnguyen,et al.  Comparisons of Sequence Labeling Algorithms and Extensions , 2007 .

[23]  Ofer Meshi,et al.  Learning Structured Models with the AUC Loss and Its Generalizations , 2014, AISTATS.

[24]  Jessica K. Hodgins,et al.  Detailed Human Data Acquisition of Kitchen Activities: the CMU-Multimodal Activity Database (CMU-MMAC) , 2008 .

[25]  Massimo Piccardi,et al.  Sequential Labeling with Structural SVM Under an Average Precision Loss , 2016, S+SSPR.

[26]  Yunsong Guo,et al.  Comparisons of sequence labeling algorithms and extensions , 2007, ICML '07.

[27]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..