Surgical Gesture Classification from Video Data

Much of the existing work on automatic classification of gestures and skill in robotic surgery is based on kinematic and dynamic cues, such as time to completion, speed, forces, torque, or robot trajectories. In this paper we show that in a typical surgical training setup, video data can be equally discriminative. To that end, we propose and evaluate three approaches to surgical gesture classification from video. In the first one, we model each video clip from each surgical gesture as the output of a linear dynamical system (LDS) and use metrics in the space of LDSs to classify new video clips. In the second one, we use spatio-temporal features extracted from each video clip to learn a dictionary of spatio-temporal words and use a bag-of-features (BoF) approach to classify new video clips. In the third approach, we use multiple kernel learning to combine the LDS and BoF approaches. Our experiments show that methods based on video data perform equally well as the state-of-the-art approaches based on kinematic data.

[1]  Gregory D. Hager,et al.  Task versus Subtask Surgical Skill Evaluation of Robotic Minimally Invasive Surgery , 2009, MICCAI.

[2]  Christopher J. Taylor,et al.  Medical Image Computing and Computer-Assisted Intervention – MICCAI 2009 , 2009, Lecture Notes in Computer Science.

[3]  Richard J. Martin A metric for ARMA processes , 2000, IEEE Trans. Signal Process..

[4]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[5]  Guang-Zhong Yang,et al.  HMM Assessment of Quality of Movement Trajectory in Laparoscopic Surgery , 2006, MICCAI.

[6]  B. Hannaford,et al.  Task decomposition of laparoscopic surgery for objective evaluation of surgical residents' learning curve using hidden Markov model. , 2002, Computer aided surgery : official journal of the International Society for Computer Aided Surgery.

[7]  Manik Varma,et al.  More generality in efficient multiple kernel learning , 2009, ICML '09.

[8]  Gregory D. Hager,et al.  Structure in surgical motion , 2010 .

[9]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[10]  Lasse Riis Østergaard,et al.  Active Surface Approach for Extraction of the Human Cerebral Cortex from MRI , 2006, MICCAI.

[11]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12]  Pierre Jannin,et al.  An Application-Dependent Framework for the Recognition of High-Level Surgical Tasks in the OR , 2011, MICCAI.

[13]  Russell H. Taylor,et al.  Information Processing in Computer-Assisted Interventions - Second International Conference, IPCAI 2011, Berlin, Germany, June 22, 2011. Proceedings , 2011, IPCAI.

[14]  René Vidal,et al.  Recognition of Visual Dynamical Processes: Theory, Kernels, and Experimental Evaluation , 2009 .

[15]  Mark Jenkinson,et al.  Non-local Shape Descriptor: A New Similarity Metric for Deformable Multi-modal Registration , 2011, MICCAI.

[16]  Gregory D. Hager,et al.  Sparse Hidden Markov Models for Surgical Gesture Classification and Skill Evaluation , 2012, IPCAI.

[17]  Gregory D. Hager,et al.  Automatic Recognition of Surgical Motions Using Statistical Modeling for Capturing Variability , 2008, MMVR.

[18]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[19]  Stefano Soatto,et al.  Dynamic Textures , 2003, International Journal of Computer Vision.

[20]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[21]  Nassir Navab,et al.  Statistical modeling and recognition of surgical workflow , 2012, Medical Image Anal..

[22]  Nassir Navab,et al.  Modeling and Segmentation of Surgical Workflow from Laparoscopic Video , 2010, MICCAI.

[23]  Bart De Moor,et al.  Subspace angles between ARMA models , 2002, Syst. Control. Lett..

[24]  Nassir Navab,et al.  Medical Image Computing and Computer-Assisted Intervention - MICCAI 2010, 13th International Conference, Beijing, China, September 20-24, 2010, Proceedings, Part III , 2010, MICCAI.

[25]  B. Moor,et al.  Subspace angles and distances between ARMA models , 2000 .

[26]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[27]  Sanjeev Khudanpur,et al.  Learning and inference algorithms for dynamical system models of dextrous motion , 2011 .

[28]  Rajesh Aggarwal,et al.  Laparoscopic task recognition using Hidden Markov Models. , 2005, Studies in health technology and informatics.

[29]  L. MacKenzie,et al.  Hierarchical decomposition of laparoscopic surgery: a human factors approach to investigating the operating room environment , 2001, Minimally invasive therapy & allied technologies : MITAT : official journal of the Society for Minimally Invasive Therapy.

[30]  Ken Masamune,et al.  Scrub nurse robot system-intraoperative motion analysis of a scrub nurse and timed-automata-based model for surgery , 2005, IEEE Transactions on Industrial Electronics.

[31]  Andrew J. Davison,et al.  Active Matching , 2008, ECCV.

[32]  Gregory D. Hager,et al.  Data-Derived Models for Segmentation with Application to Surgical Assessment and Training , 2009, MICCAI.

[33]  Gregory D. Hager,et al.  Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions , 2009, CVPR.