The development of a video retrieval system using a clinician-led approach

Abstract Patient video taken at home can provide valuable insights into the recovery progress during a programme of physical therapy, but is very time consuming for clinician review. Our work focussed on (i) enabling any patient to share information about progress at home, simply by sharing video and (ii) building intelligent systems to support Physical Therapists (PTs) in reviewing this video data and extracting the necessary detail. This paper reports the development of the system, appropriate for future clinical use without reliance on a technical team, and the clinician involvement in that development. We contribute an interactive content-based video retrieval system that significantly reduces the time taken for clinicians to review videos, using human head movement as an example. The system supports query-by-movement (clinicians move their own body to define search queries) and retrieves the essential fine-grained movements needed for clinical interpretation. This is done by comparing sequences of image-based pose estimates (here head rotations) through a distance metric (here Frechet distance) and presenting a ranked list of similar movements to clinicians for review. In contrast to existing intelligent systems for retrospective review of human movement, the system supports a flexible analysis where clinicians can look for any movement that interests them. Evaluation by a group of PTs with expertise in training movement control showed that 96% of all relevant movements were identified with time savings of as much as 99.1% compared to reviewing target videos in full. The novelty of this contribution includes retrospective progress monitoring that preserves context through video, and content-based video retrieval that supports both fine-grained human actions and query-by-movement. Future research, including large clinician-led studies, will refine the technical aspects and explore the benefits in terms of patient outcomes, PT time, and financial savings over the course of a programme of therapy. It is anticipated that this clinician-led approach will mitigate the reported slow clinical uptake of technology with resulting patient benefit.

[1]  Peter Robinson,et al.  Constrained Local Neural Fields for Robust Facial Landmark Detection in the Wild , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[2]  Yiannis Kompatsiaris,et al.  VERGE in VBS 2017 , 2017, MMM.

[3]  Du Q. Huynh,et al.  Metrics for 3D Rotations: Comparison and Analysis , 2009, Journal of Mathematical Imaging and Vision.

[4]  Majid Sarrafzadeh,et al.  User-optimized activity recognition for exergaming , 2016, Pervasive Mob. Comput..

[5]  Alex Mihailidis,et al.  A review on video-based active and assisted living technologies for automated lifelogging , 2020, Expert Syst. Appl..

[6]  David Picard,et al.  2D/3D Pose Estimation and Action Recognition Using Multitask Deep Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[8]  Luc Van Gool,et al.  Does Human Action Recognition Benefit from Pose Estimation? , 2011, BMVC.

[9]  Gazihan Alankus,et al.  Reducing Compensatory Motions in Motion-Based Video Games for Stroke Rehabilitation , 2015, Hum. Comput. Interact..

[10]  Stefanie Wechtitsch,et al.  Selecting User Generated Content for Use in Media Productions , 2016, MMM.

[11]  Kai Uwe Barthel,et al.  Graph-Based Browsing for Large Video Collections , 2015, MMM.

[12]  Xing-Dong Yang,et al.  Physio@Home: Exploring Visual Guidance and Feedback Techniques for Physiotherapy Exercises , 2015, CHI.

[13]  S. Saini,et al.  A low-cost game framework for a home-based stroke rehabilitation system , 2012, 2012 International Conference on Computer & Information Science (ICCIS).

[14]  Wolfgang Hürst,et al.  A Storyboard-Based Interface for Mobile Video Browsing , 2015, MMM.

[15]  Chong-Wah Ngo,et al.  Concept-Based Interactive Search System , 2017, MMM.

[16]  Bingbing Ni,et al.  Multiple Granularity Analysis for Fine-Grained Action Detection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[18]  Deven McGraw,et al.  For telehealth to succeed, privacy and security risks must be identified and addressed. , 2014, Health affairs.

[19]  Tetsunori Kobayashi,et al.  Waseda at TRECVID 2016: Ad-hoc Video Search , 2016, TRECVID.

[20]  Luc Van Gool,et al.  Real time 3D head pose estimation: Recent achievements and future challenges , 2012, 2012 5th International Symposium on Communications, Control and Signal Processing.

[21]  José Luis Martínez-Fernández,et al.  Combining heterogeneous sources in an interactive multimedia content retrieval model , 2017, Expert Syst. Appl..

[22]  Georges Quénot,et al.  TRECVID 2017: Evaluating Ad-hoc and Instance Video Search, Events Detection, Video Captioning and Hyperlinking , 2017, TRECVID.

[23]  George Awad,et al.  On Influential Trends in Interactive Video Retrieval: Video Browser Showdown 2015–2017 , 2018, IEEE Transactions on Multimedia.

[24]  Michael S. Lew,et al.  Deep learning for visual understanding: A review , 2016, Neurocomputing.

[25]  Philippe Robert,et al.  PRAXIS: Towards automatic cognitive assessment using gesture recognition , 2018, Expert Syst. Appl..

[26]  Louis-Philippe Morency,et al.  OpenFace 2.0: Facial Behavior Analysis Toolkit , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[27]  Mohan M. Trivedi,et al.  Head Pose Estimation in Computer Vision: A Survey , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Frank Vetere,et al.  Doctor, Can You See My Squats?: Understanding Bodily Communication in Video Consultations for Physiotherapy , 2016, Conference on Designing Interactive Systems.

[29]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[30]  Zhouyu Fu,et al.  Semantic-Based Surveillance Video Retrieval , 2007, IEEE Transactions on Image Processing.

[31]  Luc Van Gool,et al.  A Hough transform-based voting framework for action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[33]  Cordelia Schmid,et al.  P-CNN: Pose-Based CNN Features for Action Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34]  Frank Vetere,et al.  How Therapists Use Visualizations of Upper Limb Movement Information From Stroke Patients: A Qualitative Study With Simulated Information , 2016, JMIR rehabilitation and assistive technologies.

[35]  Lynne Baillie,et al.  Ethics, Privacy, and Trust in Serious Games , 2015 .

[36]  Stefanos Zafeiriou,et al.  Incremental Face Alignment in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Janna C. Kimel Thera-Network: a wearable computing network to motivate exercise in patients undergoing physical therapy , 2005, 25th IEEE International Conference on Distributed Computing Systems Workshops.

[38]  Lynne Baillie,et al.  Exploring & designing tools to enhance falls rehabilitation in the home , 2013, CHI.

[39]  Frank Hopfgartner,et al.  Video browsing interfaces and applications: a review , 2010 .

[40]  Matthew J. Hausknecht,et al.  Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Tetsuji Ogawa,et al.  Waseda_Meisei at TRECVID 2018: Ad-hoc Video Search , 2018, TRECVID.

[42]  J. Kvedar,et al.  Privacy and Security Concerns in Telehealth. , 2014, The virtual mentor : VM.

[43]  Klaus Schöffmann,et al.  Video Interaction Tools , 2015, ACM Comput. Surv..

[44]  Suranga Nanayakkara,et al.  ArmSleeve: A Patient Monitoring System to Support Occupational Therapists in Stroke Rehabilitation , 2016, Conference on Designing Interactive Systems.

[45]  Jonathan G. Fiscus,et al.  TRECVID 2018: Benchmarking Video Activity Detection, Video Captioning and Matching, Video Storytelling Linking and Video Search , 2018, TRECVID.

[46]  Ralph Gasser,et al.  Interactive Search or Sequential Browsing? A Detailed Analysis of the Video Browser Showdown 2018 , 2019, ACM Trans. Multim. Comput. Commun. Appl..

[47]  Peter Robinson,et al.  OpenFace: An open source facial behavior analysis toolkit , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[48]  Cordelia Schmid,et al.  Human Focused Action Localization in Video , 2010, ECCV Workshops.

[49]  Bernt Schiele,et al.  A database for fine grained activity detection of cooking activities , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Ananda Hochstenbach-Waelen,et al.  Embracing change: practical and theoretical considerations for successful implementation of technology assisting upper limb training in stroke , 2012, Journal of NeuroEngineering and Rehabilitation.

[51]  Daniel P. Siewiorek,et al.  A technology probe of wearable in-home computer-assisted physical therapy , 2014, CHI.

[52]  Steven M. LaValle,et al.  Generating Uniform Incremental Grids on SO(3) Using the Hopf Fibration , 2010, WAFR.

[53]  Douglas Schuler,et al.  Participatory Design: Principles and Practices , 1993 .

[54]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[55]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[56]  Li Li,et al.  A Survey on Visual Content-Based Video Indexing and Retrieval , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[57]  Marcel Worring,et al.  A Learned Lexicon-Driven Paradigm for Interactive Video Retrieval , 2007, IEEE Transactions on Multimedia.

[58]  Gazihan Alankus,et al.  Towards customizable games for stroke rehabilitation , 2010, CHI.

[59]  Cordelia Schmid,et al.  PoTion: Pose MoTion Representation for Action Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[60]  Mehrtash Tafazzoli Harandi,et al.  Going deeper into action recognition: A survey , 2016, Image Vis. Comput..

[61]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.