论文信息 - Semi-Supervised First-Person Activity Recognition in Body-Worn Video

Semi-Supervised First-Person Activity Recognition in Body-Worn Video

Author(s): Chen, Honglin; Li, Hao; Song, Alexander; Haberland, Matt; Akar, Osman; Dhillon, Adam; Zhou, Tiankuang; Bertozzi, Andrea L; Brantingham, P Jeffrey | Abstract: Body-worn cameras are now commonly used for logging daily life, sports, and law enforcement activities, creating a large volume of archived footage. This paper studies the problem of classifying frames of footage according to the activity of the camera-wearer with an emphasis on application to real-world police body-worn video. Real-world datasets pose a different set of challenges from existing egocentric vision datasets: the amount of footage of different activities is unbalanced, the data contains personally identifiable information, and in practice it is difficult to provide substantial training footage for a supervised approach. We address these challenges by extracting features based exclusively on motion information then segmenting the video footage using a semi-supervised classification algorithm. On publicly available datasets, our method achieves results comparable to, if not better than, supervised and/or deep learning methods using a fraction of the training data. It also shows promising results on real-world police body-worn video.

[1] A. Bertozzi,et al. AN MBO SCHEME ON GRAPHS FOR SEGMENTATION AND IMAGE PROCESSING , 2012 .

[2] Arjuna Flenner,et al. Announcement : Di ff use Interface Methods for Multiclass Segmentation of High-Dimensional Data , 2014 .

[3] C. V. Jawahar,et al. Unsupervised Learning of Deep Feature Representation for Clustering Egocentric Actions , 2017, IJCAI.

[4] Charles L. Lawson,et al. Solving least squares problems , 1976, Classics in applied mathematics.

[5] Kiyoharu Aizawa,et al. Summarizing wearable video , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[6] Selim Esedoglu,et al. Auction dynamics: A volume constrained MBO scheme , 2018, J. Comput. Phys..

[7] Takahiro Okabe,et al. Fast unsupervised ego-action learning for first-person sports videos , 2011, CVPR 2011.

[8] Shmuel Peleg,et al. Compact CNN for indexing egocentric videos , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[9] Deva Ramanan,et al. Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10] Jean-Michel Morel,et al. Ego-Motion Classification for Body-Worn Videos , 2016 .

[11] A. Bertozzi,et al. $\Gamma$-convergence of graph Ginzburg-Landau functionals , 2012, Advances in Differential Equations.

[12] Fatih Ozkan,et al. Boosted multiple kernel learning for first-person activity recognition , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[13] Larry H. Matthies,et al. First-Person Activity Recognition: What Are They Doing to Me? , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14] Kris M. Kitani,et al. Going Deeper into First-Person Activity Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Shmuel Peleg,et al. Temporal Segmentation of Egocentric Videos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16] Nicu Sebe,et al. Deep appearance and motion learning for egocentric activity recognition , 2018, Neurocomputing.

[17] Larry H. Matthies,et al. Pooled motion features for first-person videos , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Samuel Williams,et al. OpenMP Parallelization and Optimization of Graph-Based Machine Learning Algorithms , 2016, IWOMP.

[19] David J. Fleet,et al. Performance of optical flow techniques , 1994, International Journal of Computer Vision.

[20] A. Bertozzi,et al. Γ-CONVERGENCE OF GRAPH GINZBURG–LANDAU FUNCTIONALS , 2012 .

[21] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[22] A. Bertozzi,et al. Mean Curvature, Threshold Dynamics, and Phase Field Theory on Finite Graphs , 2013, 1307.0045.

[23] Martial Hebert,et al. Temporal segmentation and activity classification from first-person sensing , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[24] Andrea L. Bertozzi,et al. Graph MBO method for multiclass segmentation of hyperspectral stand-off detection video , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[25] James M. Rehg,et al. Delving into egocentric actions , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Andrew M. Stuart,et al. Uncertainty Quantification in Graph-Based Classification of High Dimensional Data , 2017, SIAM/ASA J. Uncertain. Quantification.

[27] Jocelyn Chanussot,et al. A graph-based approach for feature extraction and segmentation of multimodal images , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[28] Jitendra Malik,et al. Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29] Gunnar Farnebäck,et al. Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.

[30] H. Sebastian Seung,et al. Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[31] Pietro Perona,et al. Self-Tuning Spectral Clustering , 2004, NIPS.

[32] Joo-Hwee Lim,et al. Summarization of Egocentric Videos: A Comprehensive Survey , 2017, IEEE Transactions on Human-Machine Systems.

[33] Stanley Osher,et al. Unsupervised Classification in Hyperspectral Imagery With Nonlocal Total Variation and Primal-Dual Hybrid Gradient Algorithm , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[34] Andrea L. Bertozzi,et al. Convergence of the Graph Allen–Cahn Scheme , 2017, Journal of Statistical Physics.

[35] Ali Farhadi,et al. Understanding egocentric activities , 2011, 2011 International Conference on Computer Vision.

[36] Arjuna Flenner,et al. Multiclass Data Segmentation Using Diffuse Interface Methods on Graphs , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37] C. V. Jawahar,et al. Trajectory aligned features for first person action recognition , 2016, Pattern Recognit..

[38] Arjuna Flenner,et al. Diffuse Interface Models on Graphs for Classification of High Dimensional Data , 2012, SIAM Rev..

[39] Berthold K. P. Horn,et al. Determining Optical Flow , 1981, Other Conferences.

[40] Andrea Cavallaro,et al. A Long Short-Term Memory Convolutional Neural Network for First-Person Vision Activity Recognition , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[41] Alice Koniges,et al. Hyperspectral Image Classiﬁcation Using Graph Clustering Methods , 2022 .

[42] James M. Rehg,et al. Learning to Recognize Daily Actions Using Gaze , 2012, ECCV.

[43] Takeo Kanade,et al. An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.