Semi-Supervised First-Person Activity Recognition in Body-Worn Video

Author(s): Chen, Honglin; Li, Hao; Song, Alexander; Haberland, Matt; Akar, Osman; Dhillon, Adam; Zhou, Tiankuang; Bertozzi, Andrea L; Brantingham, P Jeffrey | Abstract: Body-worn cameras are now commonly used for logging daily life, sports, and law enforcement activities, creating a large volume of archived footage. This paper studies the problem of classifying frames of footage according to the activity of the camera-wearer with an emphasis on application to real-world police body-worn video. Real-world datasets pose a different set of challenges from existing egocentric vision datasets: the amount of footage of different activities is unbalanced, the data contains personally identifiable information, and in practice it is difficult to provide substantial training footage for a supervised approach. We address these challenges by extracting features based exclusively on motion information then segmenting the video footage using a semi-supervised classification algorithm. On publicly available datasets, our method achieves results comparable to, if not better than, supervised and/or deep learning methods using a fraction of the training data. It also shows promising results on real-world police body-worn video.

[1]  A. Bertozzi,et al.  AN MBO SCHEME ON GRAPHS FOR SEGMENTATION AND IMAGE PROCESSING , 2012 .

[2]  Arjuna Flenner,et al.  Announcement : Di ff use Interface Methods for Multiclass Segmentation of High-Dimensional Data , 2014 .

[3]  C. V. Jawahar,et al.  Unsupervised Learning of Deep Feature Representation for Clustering Egocentric Actions , 2017, IJCAI.

[4]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[5]  Kiyoharu Aizawa,et al.  Summarizing wearable video , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[6]  Selim Esedoglu,et al.  Auction dynamics: A volume constrained MBO scheme , 2018, J. Comput. Phys..

[7]  Takahiro Okabe,et al.  Fast unsupervised ego-action learning for first-person sports videos , 2011, CVPR 2011.

[8]  Shmuel Peleg,et al.  Compact CNN for indexing egocentric videos , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[9]  Deva Ramanan,et al.  Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Jean-Michel Morel,et al.  Ego-Motion Classification for Body-Worn Videos , 2016 .

[11]  A. Bertozzi,et al.  $\Gamma$-convergence of graph Ginzburg-Landau functionals , 2012, Advances in Differential Equations.

[12]  Fatih Ozkan,et al.  Boosted multiple kernel learning for first-person activity recognition , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[13]  Larry H. Matthies,et al.  First-Person Activity Recognition: What Are They Doing to Me? , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Kris M. Kitani,et al.  Going Deeper into First-Person Activity Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Shmuel Peleg,et al.  Temporal Segmentation of Egocentric Videos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Nicu Sebe,et al.  Deep appearance and motion learning for egocentric activity recognition , 2018, Neurocomputing.

[17]  Larry H. Matthies,et al.  Pooled motion features for first-person videos , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Samuel Williams,et al.  OpenMP Parallelization and Optimization of Graph-Based Machine Learning Algorithms , 2016, IWOMP.

[19]  David J. Fleet,et al.  Performance of optical flow techniques , 1994, International Journal of Computer Vision.

[20]  A. Bertozzi,et al.  Γ-CONVERGENCE OF GRAPH GINZBURG–LANDAU FUNCTIONALS , 2012 .

[21]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  A. Bertozzi,et al.  Mean Curvature, Threshold Dynamics, and Phase Field Theory on Finite Graphs , 2013, 1307.0045.

[23]  Martial Hebert,et al.  Temporal segmentation and activity classification from first-person sensing , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[24]  Andrea L. Bertozzi,et al.  Graph MBO method for multiclass segmentation of hyperspectral stand-off detection video , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[25]  James M. Rehg,et al.  Delving into egocentric actions , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Andrew M. Stuart,et al.  Uncertainty Quantification in Graph-Based Classification of High Dimensional Data , 2017, SIAM/ASA J. Uncertain. Quantification.

[27]  Jocelyn Chanussot,et al.  A graph-based approach for feature extraction and segmentation of multimodal images , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[28]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Gunnar Farnebäck,et al.  Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.

[30]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[31]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[32]  Joo-Hwee Lim,et al.  Summarization of Egocentric Videos: A Comprehensive Survey , 2017, IEEE Transactions on Human-Machine Systems.

[33]  Stanley Osher,et al.  Unsupervised Classification in Hyperspectral Imagery With Nonlocal Total Variation and Primal-Dual Hybrid Gradient Algorithm , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[34]  Andrea L. Bertozzi,et al.  Convergence of the Graph Allen–Cahn Scheme , 2017, Journal of Statistical Physics.

[35]  Ali Farhadi,et al.  Understanding egocentric activities , 2011, 2011 International Conference on Computer Vision.

[36]  Arjuna Flenner,et al.  Multiclass Data Segmentation Using Diffuse Interface Methods on Graphs , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  C. V. Jawahar,et al.  Trajectory aligned features for first person action recognition , 2016, Pattern Recognit..

[38]  Arjuna Flenner,et al.  Diffuse Interface Models on Graphs for Classification of High Dimensional Data , 2012, SIAM Rev..

[39]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[40]  Andrea Cavallaro,et al.  A Long Short-Term Memory Convolutional Neural Network for First-Person Vision Activity Recognition , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[41]  Alice Koniges,et al.  Hyperspectral Image Classification Using Graph Clustering Methods , 2022 .

[42]  James M. Rehg,et al.  Learning to Recognize Daily Actions Using Gaze , 2012, ECCV.

[43]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.