论文信息 - Fast unsupervised ego-action learning for first-person sports videos

Fast unsupervised ego-action learning for first-person sports videos

Portable high-quality sports cameras (e.g. head or helmet mounted) built for recording dynamic first-person video footage are becoming a common item among many sports enthusiasts. We address the novel task of discovering first-person action categories (which we call ego-actions) which can be useful for such tasks as video indexing and retrieval. In order to learn ego-action categories, we investigate the use of motion-based histograms and unsupervised learning algorithms to quickly cluster video content. Our approach assumes a completely unsupervised scenario, where labeled training videos are not available, videos are not pre-segmented and the number of ego-action categories are unknown. In our proposed framework we show that a stacked Dirichlet process mixture model can be used to automatically learn a motion histogram codebook and the set of ego-action categories. We quantitatively evaluate our approach on both in-house and public YouTube videos and demonstrate robust ego-action categorization across several sports genres. Comparative analysis shows that our approach outperforms other state-of-the-art topic models with respect to both classification accuracy and computational speed. Preliminary results indicate that on average, the categorical content of a 10 minute video sequence can be indexed in under 5 seconds.

[1] S. Geisser,et al. A Predictive Approach to Model Selection , 1979 .

[2] Alex Pentland,et al. Visual Context Awareness via Wearable Computing , 1998 .

[3] Alex Pentland,et al. Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[4] David W. Murray,et al. Wearable hand activity recognition for event summarization , 2005, Ninth IEEE International Symposium on Wearable Computers (ISWC'05).

[5] Michael I. Jordan,et al. Hierarchical Dirichlet Processes , 2006 .

[6] Michael I. Jordan,et al. Variational inference for Dirichlet process mixtures , 2006 .

[7] W. Eric L. Grimson,et al. Unsupervised Activity Perception by Hierarchical Bayesian Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8] Hal Daumé,et al. Fast search for Dirichlet process mixture models , 2007, AISTATS.

[9] Radford M. Neal,et al. Splitting and merging components of a nonconjugate Dirichlet process mixture model , 2007 .

[10] Alexei A. Efros,et al. Unsupervised discovery of visual object class hierarchies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11] Bernt Schiele,et al. Discovery of activity patterns using topic models , 2008 .

[12] Martial Hebert,et al. Temporal segmentation and activity classification from first-person sensing , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[13] Bernt Schiele,et al. Multi Activity Recognition Based on Bodymodel-Derived Primitives , 2009, LoCA.

[14] Walterio W. Mayol-Cuevas,et al. High level activity recognition using low resolution wearable vision , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[15] Michael Beetz,et al. EYEWATCHME—3D Hand and object tracking for inside out activity analysis , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[16] Xiaofeng Ren,et al. Figure-ground segmentation improves handled object recognition in egocentric video , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17] Lianming Wang,et al. Fast Bayesian Inference in Dirichlet Process Mixture Models , 2011, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.