Application of 3D-wavelet statistics to video analysis

Video activity analysis is used in various video applications such as human action recognition, video retrieval, video archiving. In this paper, we propose to apply 3D wavelet transform statistics to natural video signals and employ the resulting statistical attributes for video modeling and analysis. From the 3D wavelet transform, we investigate the marginal and joint statistics as well as the Mutual Information (MI) estimates. We show that marginal histograms are approximated quite well by Generalized Gaussian Density (GGD) functions; and the MI between coefficients decreases when the activity level increases in videos. Joint statistics attributes are applied to scene activity grouping, leading to 87.3% accurate grouping of videos. Also, marginal and joint statistics features extracted from the video are used for human action classification employing Support Vector Machine (SVM) classifiers and 93.4% of the human activities are properly classified.

[1]  Mona Omidyeganeh,et al.  Autoregressive Video Modeling through 2D Wavelet Statistics , 2009, 2010 Sixth International Conference on Intelligent Information Hiding and Multimedia Signal Processing.

[2]  Eero P. Simoncelli,et al.  Embedded wavelet image compression based on a joint probability model , 1997, Proceedings of International Conference on Image Processing.

[3]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[4]  Shiqiang Yang,et al.  An HMM-based framework for video semantic analysis , 2005, IEEE Trans. Circuits Syst. Video Technol..

[5]  Minh N. Do,et al.  A Stochastic Model for Video and its Information Rates , 2007, 2007 Data Compression Conference (DCC'07).

[6]  Michael T. Orchard,et al.  Synthesizing processed video by filtering temporal relationships , 2002, IEEE Trans. Image Process..

[7]  Gary Marchionini,et al.  Open video: A framework for a test collection , 2000, J. Netw. Comput. Appl..

[8]  Eero P. Simoncelli,et al.  Texture characterization via joint statistics of wavelet coefficient magnitudes , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[9]  Minh N. Do,et al.  Directional multiscale statistical modeling of images , 2003, SPIE Optics + Photonics.

[10]  Ivan Laptev,et al.  Velocity adaptation of space-time interest points , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[11]  Wei Chen,et al.  Parametric model for video content analysis , 2008, Pattern Recognit. Lett..

[12]  Greg Mori,et al.  Action recognition by learning mid-level motion features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Chong-Wah Ngo,et al.  Motion-Based Video Representation for Scene Change Detection , 2004, International Journal of Computer Vision.

[14]  Roland Wilson,et al.  Video modelling and segmentation using Gaussian mixture models , 2004, ICPR 2004.

[15]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[16]  D. D.-Y. Po,et al.  Directional multiscale modeling of images using the contourlet transform , 2006, IEEE Transactions on Image Processing.

[17]  Shiguo Lian,et al.  A secure 3D-SPIHT codec , 2004, 2004 12th European Signal Processing Conference.

[18]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[19]  Pierre Moulin,et al.  Information-theoretic analysis of interscale and intrascale dependencies between image wavelet coefficients , 2001, IEEE Trans. Image Process..

[20]  Tae-Kyun Kim,et al.  Canonical Correlation Analysis of Video Volume Tensors for Action Categorization and Detection , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Rosli Besar,et al.  JPEG2000 and JPEG: image quality measures of compressed medical images , 2003, 4th National Conference of Telecommunication Technology, 2003. NCTT 2003 Proceedings..

[22]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[23]  Guizhong Liu,et al.  Video scene analysis in 3D wavelet transform domain , 2010, Multimedia Tools and Applications.

[24]  Xinghua Sun,et al.  Action recognition via local descriptors and holistic features , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[25]  Sudeep Sarkar,et al.  The humanID gait challenge problem: data sets, performance, and analysis , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Gu Xu,et al.  An HMM-based framework for video semantic analysis , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[27]  Bernhard Schölkopf,et al.  Efficient Approximations for Support Vector Machines in Object Detection , 2004, DAGM-Symposium.

[28]  Dominique Barba,et al.  Binkey: a system for video content analysis "on the fly" , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[29]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Weiyao Lin,et al.  Image classification with multiple feature channels , 2011 .

[31]  I. Patras,et al.  Spatiotemporal salient points for visual recognition of human actions , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[32]  Hayit Greenspan,et al.  Probabilistic space-time video modeling via piecewise GMM , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Minh N. Do,et al.  Texture similarity measurement using Kullback-Leibler distance on wavelet subbands , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[34]  Nagato Narita,et al.  Method for the Subjective Assessment of the Quality of Television Pictures Recommended by CCIR Rec. 500-5. , 1993 .

[35]  Pierre Moulin,et al.  Analysis of Multiresolution Image Denoising Schemes Using Generalized Gaussian and Complexity Priors , 1999, IEEE Trans. Inf. Theory.

[36]  Tae-Kyun Kim,et al.  Learning Motion Categories using both Semantic and Structural Information , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Pinar Duygulu Sahin,et al.  Human action recognition with line and flow histograms , 2008, 2008 19th International Conference on Pattern Recognition.

[38]  R. Moddemeijer On estimation of entropy and mutual information of continuous distributions , 1989 .

[39]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[40]  Thierry Alpert,et al.  SSCQE (Single Stimulus Continuous Quality Evaluation): A New Subjective Assessment Method Introduced in ITU-R Recommendation 500-7: Presentation and Results , 1996 .

[41]  Yannis Avrithis,et al.  Spatiotemporal saliency for event detection and representation in the 3D wavelet domain: potential in human action recognition , 2007, CIVR '07.

[42]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[45]  Paul Scheunders,et al.  Statistical texture characterization from discrete wavelet representations , 1999, IEEE Trans. Image Process..

[46]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[47]  Thomas Serre,et al.  A Biologically Inspired System for Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[48]  Sugato Chakravarty,et al.  Methodology for the subjective assessment of the quality of television pictures , 1995 .

[49]  Alberto Leon-Garcia,et al.  Estimation of shape parameter for generalized Gaussian distributions in subband decompositions of video , 1995, IEEE Trans. Circuits Syst. Video Technol..

[50]  Yang Song,et al.  Unsupervised Learning of Human Motion , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Ivan Laptev,et al.  Local Descriptors for Spatio-temporal Recognition , 2004, SCVMA.

[52]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[53]  M. Do Directional multiresolution image representations , 2002 .

[54]  Mubarak Shah,et al.  Video scene segmentation using Markov chain Monte Carlo , 2006, IEEE Transactions on Multimedia.

[55]  Dominique Barba,et al.  Grouping video shots into scenes based on 1D mosaic descriptors , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[56]  Maja Pantic,et al.  Sparse B-spline polynomial descriptors for human activity recognition , 2009, Image Vis. Comput..

[57]  Qi Tian,et al.  A unified framework for semantic shot classification in sports video , 2002, IEEE Transactions on Multimedia.

[58]  Roberto Cipolla,et al.  Extracting Spatiotemporal Interest Points using Global Information , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[59]  G. Kiczales,et al.  Proceedings the , 1997 .