Ultrasound Video Summarization using Deep Reinforcement Learning

Video is an essential imaging modality for diagnostics, e.g. in ultrasound imaging, for endoscopy, or movement assessment. However, video hasn't received a lot of attention in the medical image analysis community. In the clinical practice, it is challenging to utilise raw diagnostic video data efficiently as video data takes a long time to process, annotate or audit. In this paper we introduce a novel, fully automatic video summarization method that is tailored to the needs of medical video data. Our approach is framed as reinforcement learning problem and produces agents focusing on the preservation of important diagnostic information. We evaluate our method on videos from fetal ultrasound screening, where commonly only a small amount of the recorded data is used diagnostically. We show that our method is superior to alternative video summarization methods and that it preserves essential information required by clinical diagnostic standards.

[1]  Luc Van Gool,et al.  Latent Dictionary Learning for Sparse Representation Based Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Ananda S. Chowdhury,et al.  Video key frame extraction through dynamic Delaunay clustering with a structural constraint , 2013, J. Vis. Commun. Image Represent..

[3]  Jiebo Luo,et al.  Towards Scalable Summarization of Consumer Videos Via Sparse Dictionary Selection , 2012, IEEE Transactions on Multimedia.

[4]  John R. Kender,et al.  Optimization Algorithms for the Selection of Key Frame Sequences of Variable Length , 2002, ECCV.

[5]  Shing-Chow Chan,et al.  Automatic shot boundary detection algorithm using structure-aware histogram metric , 2014, 2014 19th International Conference on Digital Signal Processing.

[6]  Sung Wook Baik,et al.  Video summarization based tele-endoscopy: a service to efficiently manage visual data generated during wireless capsule endoscopy procedure , 2014, Journal of Medical Systems.

[7]  Kristen Grauman,et al.  Diverse Sequential Subset Selection for Supervised Video Summarization , 2014, NIPS.

[8]  Luc Van Gool,et al.  Creating Summaries from User Videos , 2014, ECCV.

[9]  Kaiyang Zhou,et al.  Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward , 2017, AAAI.

[10]  Konstantinos Kamnitsas,et al.  SonoNet: Real-Time Detection and Localisation of Fetal Standard Scan Planes in Freehand Ultrasound , 2016, IEEE Transactions on Medical Imaging.

[11]  Yang Wang,et al.  Video Summarization Using Fully Convolutional Sequence Networks , 2018, ECCV.

[12]  Ke Zhang,et al.  Video Summarization with Long Short-Term Memory , 2016, ECCV.

[13]  Yale Song,et al.  TVSum: Summarizing web videos using titles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Jan-Michael Frahm,et al.  Hysteroscopy video summarization and browsing by estimating the physician's attention on video segments , 2012, Medical Image Anal..

[15]  Loïc Le Folgoc,et al.  Evaluating reinforcement learning agents for anatomical landmark detection , 2019, Medical Image Anal..

[16]  Luc Van Gool,et al.  Video summarization by learning submodular mixtures of objectives , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Konstantinos Kamnitsas,et al.  Multiple Landmark Detection using Multi-Agent Reinforcement Learning , 2019, MICCAI.

[18]  C. Schmid,et al.  Category-Specific Video Summarization , 2014, ECCV.