Automatic Content Curation System for Multiple Live Sport Video Streams

In this paper, we aim to develop a method to create personalized and high-presence multi-channel contents for a sport game through realtime content curation from various media streams captured/created by spectators. We use the live TV broadcast as a ground truth data and construct a machine learning-based model to automatically conduct curation from multiple videos which spectators captured from different angles and zoom levels. The live TV broadcast of a baseball game has some curation rules which select a specific angle camera for some specific scenes (e.g., a pitcher throwing a ball). As inputs for constructing a model, we use meta data such as image feature data (e.g., a pitcher is on the screen) in each fixed interval of baseball videos and game progress data (e.g., the inning number and the batting order). Output is the camera ID (among multiple cameras of spectators) at each point of time. For evaluation, we targeted Spring-Selection high-school baseball games. As training data, we used image features, game progress data, and the camera position at each point of time in the TV broadcast. We used videos of a baseball game captured from 7 different points in Hanshin Koshien Stadium with handy video cameras and generated sample data set by dividing the videos to fixed interval segments. We divided the sample data set into the training data set and the test data set and evaluated our method through two validation methods: (1) 10-fold crossvalidation method and (2) hold-out methods (e.g., learning first and second innings and testing third inning). As a result, our method predicted the camera switching timings with accuracy (F-measure) of 72.53% on weighted average for the base camera work and 92.1% for the fixed camera work.

[1]  Anoop Gupta,et al.  Automating lecture capture and broadcast: technology and videography , 2004, Multimedia Systems.

[2]  Shihong Lao,et al.  Multiple Player Tracking in Sports Video: A Dual-Mode Two-Way Bayesian Inference Approach With Progressive Observation Modeling , 2011, IEEE Transactions on Image Processing.

[3]  Michael Bianchi Automatic video production of lectures using an intelligent and aware environment , 2004, MUM '04.

[4]  James J. Little,et al.  Identifying players in broadcast sports videos using conditional random fields , 2011, CVPR 2011.

[5]  Gregory D. Abowd,et al.  Automatic Synchronization of Wearable Sensors and Video-Cameras for Ground Truth Annotation -- A Practical Approach , 2012, 2012 16th International Symposium on Wearable Computers.

[6]  Marc Davis,et al.  Metadata creation system for mobile images , 2004, MobiSys '04.

[7]  Abhishek Ranjan,et al.  Improving meeting capture by applying television production principles with audio and motion detection , 2008, CHI.

[8]  Takeo Igarashi,et al.  AnnoTone: Record-time Audio Watermarking for Context-aware Video Editing , 2015, CHI.

[9]  Osamu Nakamura,et al.  SENSeTREAM: enhancing online live experience with sensor-federated video stream using animated two-dimensional code , 2014, UbiComp.

[10]  David Conejero,et al.  Audio based soccer game summarization , 2009, 2009 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting.

[11]  Jason J. Jung,et al.  Real-time Event Detection on Social Data Stream , 2014, Mobile Networks and Applications.

[12]  Takatsugu Hirayama,et al.  Context-Dependent Viewpoint Sequence Recommendation System for Multi-view Video , 2014, 2014 IEEE International Symposium on Multimedia.

[13]  Takuro Yonezawa,et al.  Enhancing communication and dramatic impact of online live performance with cooperative audience control , 2012, UbiComp '12.