Viewpoint Selection for Human Actions

In many scenarios a dynamic scene is filmed by multiple video cameras located at different viewing positions. Visualizing such multi-view data on a single display raises an immediate question—which cameras capture better views of the scene? Typically, (e.g. in TV broadcasts) a human producer manually selects the best view. In this paper we wish to automate this process by evaluating the quality of a view, captured by every single camera. We regard human actions as three-dimensional shapes induced by their silhouettes in the space-time volume. The quality of a view is then evaluated based on features of the space-time shape, which correspond with limb visibility. Resting on these features, two view quality approaches are proposed. One is generic while the other can be trained to fit any preferred action recognition method. Our experiments show that the proposed view selection provide intuitive results which match common conventions. We further show that it improves action recognition results.

[1]  M. Pollefeys,et al.  Unstructured video-based rendering: interactive exploration of casually captured videos , 2010, ACM Trans. Graph..

[2]  Mateu Sbert,et al.  Automatic View Selection Using Viewpoint Entropy and its Application to Image‐Based Modelling , 2003, Comput. Graph. Forum.

[3]  Du Tran,et al.  Human Activity Recognition with Metric Learning , 2008, ECCV.

[4]  Daniel Cohen-Or,et al.  Motion overview of human actions , 2008, SIGGRAPH Asia '08.

[5]  Hamid K. Aghajan,et al.  Architecture for Cluster-Based Automated Surveillance Network for Detecting and Tracking Multiple Persons , 2007, 2007 First ACM/IEEE International Conference on Distributed Smart Cameras.

[6]  Ting Yu,et al.  Collaborative Real-Time Control of Active Cameras in Large Scale Surveillance Systems , 2008 .

[7]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[8]  F. Attneave Some informational aspects of visual perception. , 1954, Psychological review.

[9]  Daniel Cohen-Or,et al.  The Virtual Director: a Correlation‐Based Online Viewing of Human Motion , 2010, Comput. Graph. Forum.

[10]  Ross T. Whitaker,et al.  Curvature-based transfer functions for direct volume rendering: methods and applications , 2003, IEEE Visualization, 2003. VIS 2003..

[11]  Luiz Velho,et al.  Learning good views through intelligent galleries , 2009, Comput. Graph. Forum.

[12]  Patrick Olivier,et al.  Camera Control in Computer Graphics , 2006, Eurographics.

[13]  Han-Wei Shen,et al.  View selection for volume rendering , 2005, VIS 05. IEEE Visualization, 2005..

[14]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Dimitri Plemenos,et al.  Viewpoint quality and scene understanding , 2005, VAST.

[16]  David W. Jacobs,et al.  Mesh saliency , 2005, ACM Trans. Graph..

[17]  Tony DeRose,et al.  ACM SIGGRAPH 2010 papers , 2010, SIGGRAPH 2010.

[18]  J. Feldman,et al.  Information along contours and object boundaries. , 2005, Psychological review.

[19]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[20]  Patrick Pérez,et al.  View-Independent Action Recognition from Temporal Self-Similarities , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Larry S. Davis,et al.  Assigning cameras to subjects in video surveillance systems , 2009, 2009 IEEE International Conference on Robotics and Automation.

[22]  G. Jansson,et al.  Perceiving events and objects , 2013 .

[23]  G. Johansson Visual perception of biological motion and a model for its analysis , 1973 .