Wisdom of the Crowd in Egocentric Video Curation

Videos recorded by wearable egocentric cameras can suffer from quality degradations that cannot always be fixed by current methods. When several wearable video cameras are viewing the same scene, each having highly variable quality, it is possible to combine them into a single high-quality video. Current techniques select for each point in time the highest quality video stream, but the highest quality video may not be relevant. E.g. the best quality video can come from a person that happen to look sideways from the main attraction. We propose the curation of a single video stream from multiple egocentric videos by requiring that the selected video will also view the most interesting region in the scene. Importance of a region is determined by the "wisdom of the crowd", i.e. the number of cameras looking at a region. The resulting video is more interesting and of higher quality than any individual video streams can possibly obtain. Several examples are presented demonstrating the effectiveness of this technique.

[1]  Peter H. N. de With,et al.  Automatic mashup generation from multiple-camera concert recordings , 2010, ACM Multimedia.

[2]  Michael Bosse,et al.  Non-metric image-based rendering for video stabilization , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[3]  Michael Gleicher,et al.  Subspace video stabilization , 2011, TOGS.

[4]  Frédo Durand,et al.  Understanding and evaluating blind deconvolution algorithms , 2009, CVPR.

[5]  Stanislav Sumec Multi Camera Automatic Video Editing , 2004, ICCVG.

[6]  Irfan A. Essa,et al.  Auto-directed video stabilization with robust L1 optimal camera paths , 2011, CVPR 2011.

[7]  Raanan Fattal,et al.  Video stabilization using epipolar geometry , 2012, TOGS.

[8]  Harry Shum,et al.  Full-frame video stabilization , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Wei Tsang Ooi,et al.  MoViMash: online mobile video mashup , 2012, ACM Multimedia.

[10]  R Bellman,et al.  DYNAMIC PROGRAMMING AND LAGRANGE MULTIPLIERS. , 1956, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[12]  Cheng Li,et al.  Pixel-Level Hand Detection in Ego-centric Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Sunghyun Cho,et al.  Fast motion deblurring , 2009, SIGGRAPH 2009.

[14]  Jean Ponce,et al.  Non-uniform Deblurring for Shaken Images , 2012, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Azriel Rosenfeld,et al.  A distance metric for multidimensional histograms , 1985, Comput. Vis. Graph. Image Process..

[16]  Yaser Sheikh,et al.  Automatic editing of footage from multiple social cameras , 2014, ACM Trans. Graph..

[17]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[18]  Rob Fergus,et al.  Blind deconvolution using a normalized sparsity measure , 2011, CVPR 2011.

[19]  Luc Van Gool,et al.  Visual interestingness in image sequences , 2013, MM '13.