Segmenting visual actions based on spatio-temporal motion patterns

The analysis of human action captured in video sequences has been a topic of considerable interest in computer vision. Much of the previous work has focused on the problem of action or activity recognition, but ignored the problem of detecting action boundaries in a video sequence containing unfamiliar and arbitrary visual actions. This paper presents an approach to this problem based on detecting temporal discontinuities of the spatial pattern of image motion that captures the action. We represent frame to frame optical-flow in terms of the coefficients of the most significant principal components computed from all the flow-fields within a given video sequence. We then detect the discontinuities in the temporal trajectories of these coefficients based on three different measures. We compare our segment boundaries against those detected by human observers on the same sequences in a recent independent psychological study of human perception of visual events. We show experimental results on the two sequences that were used in this study. Our experimental results are promising both from visual evaluation and when compared against the results of the psychological study.

[1]  Darren Newtson Attribution and the unit of perception of ongoing behavior. , 1973 .

[2]  Michael J. Black,et al.  Recognizing Human Motion Using Parameterized Models of Optical Flow , 1997 .

[3]  Richard Szeliski,et al.  A multi-view approach to motion and stereo , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[4]  Joan L. Mitchell,et al.  MPEG Video Compression Standard , 1996, Springer US.

[5]  Michael J. Black,et al.  Parameterized Modeling and Recognition of Activities , 1999, Comput. Vis. Image Underst..

[6]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[7]  Anil K. Jain Fundamentals of Digital Image Processing , 2018, Control of Color Imaging Systems.

[8]  Ullas Gargi,et al.  Performance characterization and comparison of video indexing algorithms , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[9]  James W. Davis,et al.  The Representation and Recognition of Action Using Temporal Templates , 1997, CVPR 1997.

[10]  David J. Fleet,et al.  Learning parameterized models of image motion , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  John G. Proakis,et al.  Digital Signal Processing: Principles, Algorithms, and Applications , 1992 .

[12]  Jeffrey M. Zacks,et al.  Perceiving, remembering, and communicating structure in events. , 2001, Journal of experimental psychology. General.

[13]  John G. Apostolopoulos,et al.  Video Compression Standards , 1999 .

[14]  Kentaro Toyama,et al.  Wallflower: principles and practice of background maintenance , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.