Real-time recognition of surgical tasks in eye surgery videos

Nowadays, many surgeries, including eye surgeries, are video-monitored. We present in this paper an automatic video analysis system able to recognize surgical tasks in real-time. The proposed system relies on the Content-Based Video Retrieval (CBVR) paradigm. It characterizes short subsequences in the video stream and searches for video subsequences with similar structures in a video archive. Fixed-length feature vectors are built for each subsequence: the feature vectors are unchanged by variations in duration and temporal structure among the target surgical tasks. Therefore, it is possible to perform fast nearest neighbor searches in the video archive. The retrieved video subsequences are used to recognize the current surgical task by analogy reasoning. The system can be trained to recognize any surgical task using weak annotations only. It was applied to a dataset of 23 epiretinal membrane surgeries and a dataset of 100 cataract surgeries. Three surgical tasks were annotated in the first dataset. Nine surgical tasks were annotated in the second dataset. To assess its generality, the system was also applied to a dataset of 1,707 movie clips in which 12 human actions were annotated. High task recognition scores were measured in all three datasets. Real-time task recognition will be used in future works to communicate with surgeons (trainees in particular) or with surgical devices.

[1]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[2]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[3]  Nassir Navab,et al.  Modeling and Segmentation of Surgical Workflow from Laparoscopic Video , 2010, MICCAI.

[4]  M. O'Neill,et al.  Grammatical evolution , 2001, GECCO '09.

[5]  Pierre Jannin,et al.  A Framework for the Recognition of High-Level Surgical Tasks From Video Images for Cataract Surgeries , 2012, IEEE Transactions on Biomedical Engineering.

[6]  Nicholas Ayache,et al.  An Image Retrieval Approach to Setup Difficulty Levels in Training Systems for Endomicroscopy Diagnosis , 2010, MICCAI.

[7]  Qianjin Feng,et al.  Retrieval of Brain Tumors with Region-Specific Bag-of-Visual-Words Representations in Contrast-Enhanced MRI Images , 2012, Comput. Math. Methods Medicine.

[8]  Gregory D. Hager,et al.  Sparse Hidden Markov Models for Surgical Gesture Classification and Skill Evaluation , 2012, IPCAI.

[9]  Tanveer F. Syeda-Mahmood,et al.  Validating cardiac echo diagnosis through video similarity , 2005, MULTIMEDIA '05.

[10]  Pablo Lamata,et al.  Laparoscopic Tool Tracking Method for Augmented Reality Surgical Applications , 2008, ISBMS.

[11]  Cordelia Schmid,et al.  Actions in context , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Gregory D. Hager,et al.  Task versus Subtask Surgical Skill Evaluation of Robotic Minimally Invasive Surgery , 2009, MICCAI.

[13]  Yu Cao,et al.  Computer-Aided Detection of Diagnostic and Therapeutic Operations in Colonoscopy Videos , 2007, IEEE Transactions on Biomedical Engineering.

[14]  Sunil Arya,et al.  Approximate nearest neighbor queries in fixed dimensions , 1993, SODA '93.

[15]  Han-ping Gao,et al.  Content Based Video Retrieval Using Spatiotemporal Salient Objects , 2010, 2010 International Symposium on Intelligence Information Processing and Trusted Computing.

[16]  A. V. Deorankar,et al.  Content based video retrieval using entropy, edge detection, black and white color features , 2010, 2010 2nd International Conference on Computer Engineering and Technology.

[17]  René Vidal,et al.  Surgical Gesture Classification from Video Data , 2012, MICCAI.

[18]  X. Castells,et al.  Clinical outcomes and costs of cataract surgery performed by planned ECCE and phacoemulsification , 2004, International Ophthalmology.

[19]  Gwénolé Quellec,et al.  Fast Wavelet-Based Image Characterization for Highly Adaptive Image Retrieval , 2012, IEEE Transactions on Image Processing.

[20]  Dong Xu,et al.  Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Patrick Bouthemy,et al.  Recognition of Dynamic Video Contents With Global Probabilistic Models of Visual Motion , 2006, IEEE Transactions on Image Processing.

[22]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[23]  Eric Bruno,et al.  Design of Multimodal Dissimilarity Spaces for Retrieval of Video Documents , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  J. Pulido,et al.  Visual outcomes after pars plana vitrectomy for epiretinal membranes associated with pars planitis. , 1999, Ophthalmology.

[26]  Guang-Zhong Yang,et al.  Content-Based Surgical Workflow Representation Using Probabilistic Motion Modeling , 2010, MIAR.

[27]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[28]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[29]  Pierre Jannin,et al.  An Application-Dependent Framework for the Recognition of High-Level Surgical Tasks in the OR , 2011, MICCAI.

[30]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[31]  Michael R. Lyu,et al.  A Multimodal and Multilevel Ranking Framework for Content-Based Video Retrieval , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[32]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[33]  Kong Juan,et al.  Content-based video retrieval system research , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[34]  Kazufumi Kaneda,et al.  Computer-Aided Colorectal Tumor Classification in NBI Endoscopy Using CNN Features , 2016, ArXiv.

[35]  Jean Ponce,et al.  Automatic annotation of human actions in video , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[36]  Sukhendu Das,et al.  Combining Features for Shape and Motion Trajectory of Video Objects for Efficient Content Based Video Retrieval , 2009, 2009 Seventh International Conference on Advances in Pattern Recognition.

[37]  Cordelia Schmid,et al.  An Image-Based Approach to Video Copy Detection With Spatio-Temporal Post-Filtering , 2010, IEEE Transactions on Multimedia.

[38]  I. Good,et al.  Mathematical Theory of Probability and Statistics , 1966 .

[39]  Zhouyu Fu,et al.  Semantic-Based Surveillance Video Retrieval , 2007, IEEE Transactions on Image Processing.

[40]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[41]  C. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[42]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[43]  Nassir Navab,et al.  Statistical modeling and recognition of surgical workflow , 2012, Medical Image Anal..

[44]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[45]  Lior Rokach,et al.  Recommender Systems Handbook , 2010 .

[46]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[47]  Patrick Gros,et al.  Detecting repeats for video structuring , 2007, Multimedia Tools and Applications.