Smart Director: An Event-Driven Directing System for Live Broadcasting

Live video broadcasting normally requires a multitude of skills and expertise with domain knowledge to enable multi-camera productions. As the number of cameras keeps increasing, directing a live sports broadcast has now become more complicated and challenging than ever before. The broadcast directors need to be much more concentrated, responsive, and knowledgeable, during the production. To relieve the directors from their intensive efforts, we develop an innovative automated sports broadcast directing system, called Smart Director, which aims at mimicking the typical human-in-the-loop broadcasting process to automatically create near-professional broadcasting programs in real-time by using a set of advanced multi-view video analysis algorithms. Inspired by the so-called “three-event” construction of sports broadcast [14], we build our system with an event-driven pipeline consisting of three consecutive novel components: (1) the Multi-View Event Localization to detect events by modeling multi-view correlations, (2) the Multi-View Highlight Detection to rank camera views by the visual importance for view selection, and (3) the Auto-Broadcasting Scheduler to control the production of broadcasting videos. To our best knowledge, our system is the first end-to-end automated directing system for multi-camera sports broadcasting, completely driven by the semantic understanding of sports events. It is also the first system to solve the novel problem of multi-view joint event detection by cross-view relation modeling. We conduct both objective and subjective evaluations on a real-world multi-camera soccer dataset, which demonstrate the quality of our auto-generated videos is comparable to that of the human-directed videos. Thanks to its faster response, our system is able to capture more fast-passing and short-duration events which are usually missed by human directors.

[1]  Jim Owens Television Sports Production , 2015 .

[2]  Ke Wang,et al.  Robust Visual Object Tracking with Two-Stream Residual Convolutional Networks , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).

[3]  A. Barnfield Soccer, Broadcasting, and Narrative , 2013 .

[4]  Tao Mei,et al.  Gaussian Temporal Awareness Networks for Action Localization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Chong-Wah Ngo,et al.  Click-through-based Subspace Learning for Image Search , 2014, ACM Multimedia.

[7]  Huang-Chia Shih,et al.  A Survey of Content-Aware Video Analysis for Sports , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Tao Mei,et al.  Learning Deep Intrinsic Video Representation by Exploring Temporal Coherence and Graph Structure , 2016, IJCAI.

[9]  Cees Snoek,et al.  Pointly-Supervised Action Localization , 2018, International Journal of Computer Vision.

[10]  Maneesh Agrawala,et al.  Computational video editing for dialogue-driven scenes , 2017, ACM Trans. Graph..

[11]  C. Schmid,et al.  Category-Specific Video Summarization , 2014, ECCV.

[12]  Chng Eng Siong,et al.  Automatic composition of broadcast sports video , 2008, Multimedia Systems.

[13]  Xu Zhao,et al.  Single Shot Temporal Action Detection , 2017, ACM Multimedia.

[14]  Aljoscha Smolic,et al.  Computational sports broadcasting: Automated director assistance for live sports , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[15]  Patrick Charpentier,et al.  Automatic Camera Selection in the Context of Basketball Game , 2018, ICISP.

[16]  Cordelia Schmid,et al.  Action and Event Recognition with Fisher Vectors on a Compact Feature Set , 2013, 2013 IEEE International Conference on Computer Vision.

[17]  Takatsugu Hirayama,et al.  Context-Dependent Viewpoint Sequence Recommendation System for Multi-view Video , 2014, 2014 IEEE International Symposium on Multimedia.

[18]  Yun Fu,et al.  Deep Sequential Context Networks for Action Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Ke Zhang,et al.  Retrospective Encoders for Video Summarization , 2018, ECCV.

[20]  James J. Little,et al.  Camera Selection for Broadcasting Soccer Games , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[21]  Tao Mei,et al.  Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22]  Tao Mei,et al.  Hierarchy Parsing for Image Captioning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Ramakant Nevatia,et al.  CTAP: Complementary Temporal Action Proposal Generation , 2018, ECCV.

[24]  Tao Mei,et al.  Deep Metric Learning With Density Adaptivity , 2019, IEEE Transactions on Multimedia.

[25]  Jesús Chamorro-Martínez,et al.  Diatom autofocusing in brightfield microscopy: a comparative study , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[26]  Tao Mei,et al.  SeCo: Exploring Sequence Supervision for Unsupervised Representation Learning , 2020, ArXiv.

[27]  Anil K. Jain,et al.  Lane boundary detection using a multiresolution Hough transform , 1997, Proceedings of International Conference on Image Processing.

[28]  Andrea Cavallaro,et al.  Multi-camera Scheduling for Video Production , 2011, 2011 Conference for Visual Media Production.

[29]  Limin Wang,et al.  Temporal Action Detection with Structured Segment Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Luis Torres,et al.  Automatic summarization of soccer highlights using audio-visual descriptors , 2015, SpringerPlus.

[31]  James J. Little,et al.  Learning Sports Camera Selection From Internet Videos , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[32]  Cordelia Schmid,et al.  Relational Action Forecasting , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Tao Mei,et al.  Relation Distillation Networks for Video Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Chng Eng Siong,et al.  Automatic replay generation for soccer video broadcasting , 2004, MULTIMEDIA '04.

[35]  Cees Snoek,et al.  Dance With Flow: Two-In-One Stream Action Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Shih-Fu Chang,et al.  Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Yannis Kalantidis,et al.  Less Is More: Learning Highlight Detection From Video Duration , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Chong-Wah Ngo,et al.  Exploring Object Relation in Mean Teacher for Cross-Domain Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Ke Zhang,et al.  Video Summarization with Long Short-Term Memory , 2016, ECCV.

[40]  Tao Mei,et al.  Learning Click-Based Deep Structure-Preserving Embeddings with Visual Attention , 2019, ACM Trans. Multim. Comput. Commun. Appl..

[41]  Takatsugu Hirayama,et al.  Personal Multi-view Viewpoint Recommendation based on Trajectory Distribution of the Viewing Target , 2016, ACM Multimedia.

[42]  Tao Mei,et al.  Mocycle-GAN: Unpaired Video-to-Video Translation , 2019, ACM Multimedia.

[43]  James J. Little,et al.  Sports Camera Calibration via Synthetic Data , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[44]  J. Goldlust,et al.  Playing for Keeps: Sport, the Media, and Society. , 1989 .

[45]  Tao Mei,et al.  X-Linear Attention Networks for Image Captioning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Tao Mei,et al.  Exploring Visual Relationship for Image Captioning , 2018, ECCV.

[47]  Shifeng Zhang,et al.  FaceBoxes: A CPU real-time face detector with high accuracy , 2017, 2017 IEEE International Joint Conference on Biometrics (IJCB).

[48]  Cordelia Schmid,et al.  Temporal Localization of Actions with Actoms. , 2013, IEEE transactions on pattern analysis and machine intelligence.

[49]  Tao Mei,et al.  Jointly Localizing and Describing Events for Dense Video Captioning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  Anoop Gupta,et al.  Automatically extracting highlights for TV Baseball programs , 2000, ACM Multimedia.

[51]  Chong-Wah Ngo,et al.  Semi-supervised Hashing with Semantic Confidence for Large Scale Visual Search , 2015, SIGIR.

[52]  Tao Mei,et al.  Exploiting Web Images for Video Highlight Detection With Triplet Deep Ranking , 2018, IEEE Transactions on Multimedia.

[53]  Xiaoyan Gu,et al.  psDirector: An Automatic Director for Watching View Generation from Panoramic Soccer Video , 2019, MMM.

[54]  Bernard Ghanem,et al.  SST: Single-Stream Temporal Action Proposals , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Tao Mei,et al.  Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Ali Javed,et al.  An Efficient Framework for Automatic Highlights Generation from Sports Videos , 2016, IEEE Signal Processing Letters.

[57]  Yue Chen,et al.  iDirector: An Intelligent Directing System for Live Broadcast , 2020, ACM Multimedia.