论文信息 - An Improved Model for Segmentation and Recognition of Fine-Grained Activities with Application to Surgical Training Tasks

An Improved Model for Segmentation and Recognition of Fine-Grained Activities with Application to Surgical Training Tasks

Automated segmentation and recognition of fine-grained activities is important for enabling new applications in industrial automation, human-robot collaboration, and surgical training. Many existing approaches to activity recognition assume that a video has already been segmented and perform classification using an abstract representation based on spatio-temporal features. While some approaches perform joint activity segmentation and recognition, they typically suffer from a poor modeling of the transitions between actions and a representation that does not incorporate contextual information about the scene. In this paper, we propose a model for action segmentation and recognition that improves upon existing work in two directions. First, we develop a variation of the Skip-Chain Conditional Random Field that captures long-range state transitions between actions by using higher-order temporal relationships. Second, we argue that in constrained environments, where the relevant set of objects is known, it is better to develop features using high-level object relationships that have semantic meaning instead of relying on abstract features. We apply our approach to a set of tasks common for training in robotic surgery: suturing, knot tying, and needle passing, and show that our method increases micro and macro accuracy by 18.46% and 44.13% relative to the state of the art on a widely used robotic surgery dataset.

[1] Martin A. Fischler,et al. The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[2] Blake Hannaford,et al. Markov modeling of minimally invasive surgery based on tool/tissue interaction and force/torque signatures for evaluating surgical skills , 2001, IEEE Transactions on Biomedical Engineering.

[3] C. Barden,et al. Effects of Limited Work Hours on Surgical Training , 2003 .

[4] Barbara Caputo,et al. Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[5] Thomas Hofmann,et al. Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[6] Ramakant Nevatia,et al. Coupled Hidden Semi Markov Models for Activity Recognition , 2007, 2007 IEEE Workshop on Motion and Video Computing (WMVC'07).

[7] Andrew McCallum,et al. An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[8] Greg Mori,et al. Max-margin hidden conditional random fields for human action recognition , 2009, CVPR.

[9] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10] Gregory D. Hager,et al. Task versus Subtask Surgical Skill Evaluation of Robotic Minimally Invasive Surgery , 2009, MICCAI.

[11] Ben Taskar,et al. Cascaded Models for Articulated Pose Estimation , 2010, ECCV.

[12] Shunzheng Yu,et al. Hidden semi-Markov models , 2010, Artif. Intell..

[13] Larry S. Davis,et al. Multi-agent event recognition in structured scenarios , 2011, CVPR 2011.

[14] Sanjeev Khudanpur,et al. Learning and inference algorithms for dynamical system models of dextrous motion , 2011 .

[15] Yi Yang,et al. Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[16] Tsuhan Chen,et al. Pictorial structures for object recognition and part labeling in drawings , 2011, 2011 18th IEEE International Conference on Image Processing.

[17] Dieter Fox,et al. Fine-grained kitchen activity recognition using RGB-D , 2012, UbiComp.

[18] Bernt Schiele,et al. A database for fine grained activity detection of cooking activities , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19] René Vidal,et al. Surgical Gesture Classification from Video Data , 2012, MICCAI.

[20] Fei-Fei Li,et al. Learning latent temporal structure for complex event detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21] Austin Reiter,et al. Feature Classification for Tracking Articulated Surgical Tools , 2012, MICCAI.

[22] Gregory D. Hager,et al. Sparse Hidden Markov Models for Surgical Gesture Classification and Skill Evaluation , 2012, IPCAI.

[23] Joachim Hornegger,et al. Self-gated Radial MRI for Respiratory Motion Compensation on Hybrid PET/MR Systems , 2013, MICCAI.

[24] Gregory D. Hager,et al. Surgical gesture classification from video and kinematic data , 2013, Medical Image Anal..

[25] Alan L. Yuille,et al. An Approach to Pose-Based Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26] Jason J. Corso,et al. Product of tracking experts for visual tracking of surgical tools , 2013, 2013 IEEE International Conference on Automation Science and Engineering (CASE).

[27] Gregory D. Hager,et al. Surgical Gesture Segmentation and Recognition , 2013, MICCAI.

[28] Constantinos Loukas,et al. Surgical workflow analysis with Gaussian mixture multivariate autoregressive (GMMAR) models: a simulation study , 2013, Computer aided surgery : official journal of the International Society for Computer Aided Surgery.

[29] Leonid Sigal,et al. Poselet Key-Framing: A Model for Human Activity Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[30] Mark W. Schmidt,et al. Block-Coordinate Frank-Wolfe Optimization for Structural SVMs , 2012, ICML.

[31] Mubarak Shah,et al. Spatiotemporal Deformable Part Models for Action Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[32] Sven Behnke,et al. PyStruct: learning structured prediction in python , 2014, J. Mach. Learn. Res..

[33] Henry C. Lin,et al. JHU-ISI Gesture and Skill Assessment Working Set ( JIGSAWS ) : A Surgical Activity Dataset for Human Motion Modeling , 2014 .