IBM High-Five: Highlights From Intelligent Video Engine

We introduce a novel multi-modal system for auto-curating golf highlights that fuses information from players' reactions (celebration actions), spectators (crowd cheering), and commentator (tone of the voice and word analysis) to determine the most interesting moments of a game. The start of a highlight is determined with additional metadata (player's name and the hole number), allowing personalized content summarization and retrieval. Our system was demonstrated at Masters 2017, a major golf tournament, generating real-time highlights from four live video streams over four days.

[1]  Antonio Torralba,et al.  Anticipating Visual Representations from Unlabeled Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  John R. Smith,et al.  Automatic Curation of Golf Highlights Using Multimodal Excitement Features , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  Antonio Torralba,et al.  SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.