Attention-Based Surgical Phase Boundaries Detection in Laparoscopic Videos

A new deep learning-based method is proposed for identifying the boundaries of all surgical phases in a laparoscopic video. The model is designed based on the sequence-to-sequence architecture with an attention mechanism, to map the extracted visual features to the frame numbers of the beginning and the ending of each phase. The main novelty is that the alignment vectors for each phase are taken as the outputs, and are trained directly to select the indices. We evaluated our model using a large publicly available dataset of laparoscopic cholecystectomy procedure and obtained the Mean Absolute Error (MAE) of 48 seconds.