Real-Time Task Recognition in Cataract Surgery Videos Using Adaptive Spatiotemporal Polynomials

This paper introduces a new algorithm for recognizing surgical tasks in real-time in a video stream. The goal is to communicate information to the surgeon in due time during a video-monitored surgery. The proposed algorithm is applied to cataract surgery, which is the most common eye surgery. To compensate for eye motion and zoom level variations, cataract surgery videos are first normalized. Then, the motion content of short video subsequences is characterized with spatiotemporal polynomials: a multiscale motion characterization based on adaptive spatiotemporal polynomials is presented. The proposed solution is particularly suited to characterize deformable moving objects with fuzzy borders, which are typically found in surgical videos. Given a target surgical task, the system is trained to identify which spatiotemporal polynomials are usually extracted from videos when and only when this task is being performed. These key spatiotemporal polynomials are then searched in new videos to recognize the target surgical task. For improved performances, the system jointly adapts the spatiotemporal polynomial basis and identifies the key spatiotemporal polynomials using the multiple-instance learning paradigm. The proposed system runs in real-time and outperforms the previous solution from our group, both for surgical task recognition ( Az = 0.851 on average, as opposed to Az = 0.794 previously) and for the joint segmentation and recognition of surgical tasks ( Az = 0.856 on average, as opposed to Az = 0.832 previously).

[1]  C. G. Broyden The Convergence of a Class of Double-rank Minimization Algorithms 1. General Considerations , 1970 .

[2]  Chong-Wah Ngo,et al.  Towards optimal bag-of-features for object categorization and semantic video retrieval , 2007, CIVR '07.

[3]  Olivier Kihl,et al.  Human activities discrimination with motion approximation in polynomial bases , 2010, 2010 IEEE International Conference on Image Processing.

[4]  Nassir Navab,et al.  Statistical modeling and recognition of surgical workflow , 2012, Medical Image Anal..

[5]  Gunnar Farnebäck,et al.  Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.

[6]  Wen Gao,et al.  Learning to Distribute Vocabulary Indexing for Scalable Visual Search , 2013, IEEE Transactions on Multimedia.

[7]  C. G. Broyden The Convergence of a Class of Double-rank Minimization Algorithms 2. The New Algorithm , 1970 .

[8]  Jean Ponce,et al.  Automatic annotation of human actions in video , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Gwénolé Quellec,et al.  Real-Time Segmentation and Recognition of Surgical Tasks in Cataract Surgery Videos , 2014, IEEE Transactions on Medical Imaging.

[10]  B. B. Meshram,et al.  Content based video retrieval systems , 2012, ArXiv.

[11]  Kiyoharu Aizawa,et al.  Motion Segmentation and Retrieval for 3D Video Based on Modified Shape Distribution , 2007, EURASIP J. Adv. Signal Process..

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  James R. Foulds,et al.  Speeding Up and Boosting Diverse Density Learning , 2010, Discovery Science.

[14]  Gwénolé Quellec,et al.  A multiple-instance learning framework for diabetic retinopathy screening , 2012, Medical Image Anal..

[15]  Gwénolé Quellec,et al.  Real-time recognition of surgical tasks in eye surgery videos , 2014, Medical Image Anal..

[16]  J. Ponce,et al.  Segmenting, modeling, and matching video clips containing multiple moving objects , 2004, CVPR 2004.

[17]  Gregory D. Hager,et al.  Surgical gesture classification from video and kinematic data , 2013, Medical Image Anal..

[18]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[19]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[20]  Jacques Wainer,et al.  Assessing the Need for Referral in Automatic Diabetic Retinopathy Detection , 2013, IEEE Transactions on Biomedical Engineering.

[21]  Qi Tian,et al.  Contextual Hashing for Large-Scale Image Search , 2014, IEEE Transactions on Image Processing.

[22]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[23]  Gwénolé Quellec,et al.  Normalizing videos of anterior eye segment surgeries , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[24]  Li Li,et al.  A Survey on Visual Content-Based Video Indexing and Retrieval , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[25]  Narendra Ahuja,et al.  Long Image Sequence Motion Analysis Using Polynomial Motion Models , 1992, MVA.

[26]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[27]  Pierre Jannin,et al.  A Framework for the Recognition of High-Level Surgical Tasks From Video Images for Cataract Surgeries , 2012, IEEE Transactions on Biomedical Engineering.

[28]  Sven Nomm,et al.  Polynomial based approach in analysis and detection of surgeon's motions , 2008, 2008 10th International Conference on Control, Automation, Robotics and Vision.

[29]  Pierre Jannin,et al.  An Application-Dependent Framework for the Recognition of High-Level Surgical Tasks in the OR , 2011, MICCAI.

[30]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[31]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[32]  Gregory D. Hager,et al.  Surgical Gesture Segmentation and Recognition , 2013, MICCAI.

[33]  X. Castells,et al.  Clinical outcomes and costs of cataract surgery performed by planned ECCE and phacoemulsification , 2004, International Ophthalmology.

[34]  Sylvie Jeannin On the combination of a polynomial motion estimation with a hierarchical segmentation based video coding scheme , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[35]  Jaume Amores,et al.  Multiple instance classification: Review, taxonomy and comparative study , 2013, Artif. Intell..