Action Recognition in Videos

We construct a fully automatic large-scale system for visual retrieval of realistic action samples of different human action classes from TV-series and movies. We first propose a text-driven approach using the synchronization of transcripts and subtitles. In practice this yields a substantial part of irrelevant samples. We handle them by ranking the whole retrieved set by visual consistency using some partially incorrect training data. Our main contribution is to provide a new generic and fully automatic iterative training scheme for support vector regression that can handle such erroneous supervision to improve the ranking quality. We validate our approach by conducting experiments on realistic video data and showing that it performs better than existing state-of-the-art unsupervised and supervised methods.

[1]  Ivor W. Tsang,et al.  Maximum Margin Clustering Made Practical , 2007, IEEE Transactions on Neural Networks.

[2]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Cordelia Schmid,et al.  Automatic face naming with caption-based supervision , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Antonio Criminisi,et al.  Harvesting Image Databases from the Web , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[5]  Fei-Fei Li,et al.  OPTIMOL: Automatic Online Picture Collection via Incremental Model Learning , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Sebastian Nowozin,et al.  Discriminative Subsequence Mining for Action Classification , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[7]  Thomas Serre,et al.  A Biologically Inspired System for Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[8]  Martial Hebert,et al.  Event Detection in Crowded Videos , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[9]  Mubarak Shah,et al.  Chaotic Invariants for Human Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[10]  Patrick Pérez,et al.  Retrieving actions in movies , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[11]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  Cordelia Schmid,et al.  Learning Object Representations for Visual Object Class Recognition , 2007, ICCV 2007.

[13]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[14]  Ling Li,et al.  Ordinal Regression by Extended Binary Classification , 2006, NIPS.

[15]  Hang Li,et al.  Cost-Sensitive Learning of SVM for Ranking , 2006, ECML.

[16]  Ben Carterette,et al.  Learning a ranking from pairwise preferences , 2006, SIGIR '06.

[17]  Tong Zhang,et al.  Subset Ranking Using Regression , 2006, COLT.

[18]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[19]  David A. Forsyth,et al.  Animals on the Web , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[20]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[21]  Pinar Duygulu Sahin,et al.  A Graph Based Approach for Naming Faces in News Photos , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[22]  Andrew Zisserman,et al.  Hello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video , 2006, BMVC.

[23]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[24]  Massih-Reza Amini,et al.  Semi-supervised learning with an imperfect supervisor , 2005, Knowledge and Information Systems.

[25]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[26]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[27]  Dan Roth,et al.  Learnability of Bipartite Ranking Functions , 2005, COLT.

[28]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[30]  Patrice Y. Simard,et al.  Using Machine Learning to Break Visual Human Interaction Proofs (HIPs) , 2004, NIPS.

[31]  Dan Roth,et al.  A Large Deviation Bound for the Area Under the ROC Curve , 2004, NIPS.

[32]  Dale Schuurmans,et al.  Maximum Margin Clustering , 2004, NIPS.

[33]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[34]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[35]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[36]  Lixin Fan,et al.  Categorizing Nine Visual Classes using Local Appearance Descriptors , 2004 .

[37]  Yunqian Ma,et al.  Practical selection of SVM parameters and noise estimation for SVM regression , 2004, Neural Networks.

[38]  Tamara L. Berg,et al.  Names and faces in the news , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[39]  B. Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[40]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[41]  Bernhard Schölkopf,et al.  Cluster Kernels for Semi-Supervised Learning , 2002, NIPS.

[42]  Bernhard Schölkopf,et al.  Estimating a Kernel Fisher Discriminant in the Presence of Label Noise , 2001, ICML.

[43]  Moses Charikar,et al.  Greedy approximation algorithms for finding dense components in a graph , 2000, APPROX.

[44]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[45]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[46]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[47]  Patrick Haffner,et al.  Support vector machines for histogram-based image classification , 1999, IEEE Trans. Neural Networks.

[48]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[49]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[50]  Bernhard Schölkopf,et al.  SV Estimation of a Distribution's Support , 1999, NIPS 1999.

[51]  Ayhan Demiriz,et al.  Semi-Supervised Support Vector Machines , 1998, NIPS.

[52]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[53]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[54]  J. C. BurgesChristopher A Tutorial on Support Vector Machines for Pattern Recognition , 1998 .

[55]  Alexander J. Smola,et al.  Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[56]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[57]  Stephen V. Rice,et al.  The Fourth Annual Test of OCR Accuracy , 1995 .

[58]  John Lafferty,et al.  Grammatical Trigrams: A Probabilistic Model of Link Grammar , 1992 .

[59]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[60]  L. R. Rabiner,et al.  A comparative study of several dynamic time-warping algorithms for connected-word recognition , 1981, The Bell System Technical Journal.

[61]  W. R. Buckland,et al.  Outliers in Statistical Data , 1979 .

[62]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[63]  I. G. BONNER CLAPPISON Editor , 1960, The Electric Power Engineering Handbook - Five Volume Set.