Content based Video Retrieval , Classification and Summarization : The State-ofthe-Art and the Future

This chapter provides an overview of different video content modeling, retrieval and classification techniques employed in existing content-based video indexing and retrieval (CBVIR) systems. Based on the modeling requirements of a CBVIR system, we analyze and categorize existing modeling approaches. Starting with a review of video content modeling and representation techniques, we study view-invariant representation approaches and the corresponding performance analysis. Based on the current status of research in CBVIR systems, we identify the video retrieval approaches from spatial and temporal analysis. Subsequently, we present the video classification approaches from multidimensional distributed Hidden Markov Models. Finally, a summary of future trends and open problems of content-based video modeling retrieval and classification is provided. 1 Video Content Modeling and Representation In this section, we give a generalization of video content modelling and representation. We first investigate the general problems in video content modelling and representation. With regard to that, we present the state-of-the-art approaches: curvature scale space (CSS) and centroid distance functions (CDF)-based representations. We subsequently propose the null space invariant (NSI) representations for video classification and retrieval due to camera motions. Moreover, we propose the tensor null space invariant (TNSI) representation for high dimensional data. Finally, we give a brief overview of the other approaches in video content modelling and representation and future trends.

[1]  Dan Schonfeld,et al.  Video Event Classification and Image Segmentation Based on Noncausal Multidimensional Hidden Markov Models , 2009, IEEE Transactions on Image Processing.

[2]  Xu Chen,et al.  View-invariant tensor null-space representation for multiple motion trajectory retrieval and classification , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Dan Schonfeld,et al.  Dynamic updating and downdating matrix SVD and tensor HOSVD for adaptive indexing and retrieval of motion trajectories , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Dan Schonfeld,et al.  Event Analysis Based on Multiple Interactive Motion Trajectories , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  Vasumathi Narayanan,et al.  A Survey of Content-Based Video Retrieval , 2008 .

[6]  Xu Chen,et al.  Robust null space representation and sampling for view-invariant motion trajectory analysis , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Dan Schonfeld,et al.  Distributed multi-dimensional hidden Markov models for image and trajectory-based video classifications , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Diane J. Cook,et al.  Automatic Video Classification: A Survey of the Literature , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[9]  Dan Schonfeld,et al.  Image segmentation and classification based on a 2D distributed hidden Markov model , 2008, Electronic Imaging.

[10]  Dan Schonfeld,et al.  Object Trajectory-Based Activity Classification and Recognition Using Hidden Markov Models , 2007, IEEE Transactions on Image Processing.

[11]  Shih-Fu Chang,et al.  Using Geometry Invariants for Camera Response Function Estimation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Bo Zhang,et al.  A Formal Study of Shot Boundary Detection , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Dan Schonfeld,et al.  Real-Time Motion Trajectory-Based Indexing and Retrieval of Video Sequences , 2007, IEEE Transactions on Multimedia.

[14]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[15]  Dan Schonfeld,et al.  A new method for multidimensional optimization and its application in image and video processing , 2006, IEEE Signal Processing Letters.

[16]  Dan Schonfeld,et al.  Tensor-Based Multiple Object Trajectory Indexing and Retrieval , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[17]  Rama Chellappa,et al.  Principal components null space analysis for image and video classification , 2006, IEEE Transactions on Image Processing.

[18]  Dan Schonfeld,et al.  HMM-based motion recognition system using segmented PCA , 2005, IEEE International Conference on Image Processing 2005.

[19]  Dan Schonfeld,et al.  A hybrid system for affine-invariant trajectory retrieval , 2004, MIR '04.

[20]  Anil C. Kokaram,et al.  Semantic Event Detection in Sports Through Motion Understanding , 2004, CIVR.

[21]  Ramakant Nevatia,et al.  Tracking multiple humans in crowded environment , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[22]  Dan Schonfeld,et al.  Segmented trajectory based indexing and retrieval of video data , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[23]  Nicu Sebe,et al.  The State of the Art in Image and Video Retrieval , 2003, CIVR.

[24]  W. Eric L. Grimson,et al.  Answering Questions about Moving Objects in Surveillance Videos , 2003, New Directions in Question Answering.

[25]  Rainer Lienhart,et al.  Reliable Transition Detection in Videos: A Survey and Practitioner's Guide , 2001, Int. J. Image Graph..

[26]  HongJiang Zhang,et al.  Video Content Representation for Shot Retrieval and Scene Extraction , 2001, Int. J. Image Graph..

[27]  Yousry S. El Gamal,et al.  Compressed video indexing based on object motion , 2000, Visual Communications and Image Processing.

[28]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[29]  Robert M. Gray,et al.  Image classification by a two-dimensional hidden Markov model , 2000, IEEE Trans. Signal Process..

[30]  Stéphane Marchand-Maillet,et al.  Content-Based Video Retrieval: an Overview , 2000 .

[31]  Shih-Fu Chang,et al.  Motion trajectory matching of video objects , 1999, Electronic Imaging.

[32]  Christopher Raphael,et al.  Automatic Segmentation of Acoustic Musical Signals Using Hidden Markov Models , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Avideh Zakhor,et al.  A Trajectory Based Video Indexing System For Street Surveillance , 1999 .

[34]  Shih-Fu Chang,et al.  A fully automated content-based video search engine supporting spatiotemporal queries , 1998, IEEE Trans. Circuits Syst. Video Technol..

[35]  Stephen W. Smoliar,et al.  An integrated system for content-based video retrieval and browsing , 1997, Pattern Recognit..

[36]  Fillia Makedon,et al.  Automatic Video Pause Detection Filter , 1997 .

[37]  Wayne H. Wolf,et al.  Key frame selection by motion analysis , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[38]  L. Lathauwer,et al.  From Matrix to Tensor : Multilinear Algebra and Signal Processing , 1996 .

[39]  A. Pentland,et al.  Real-time American Sign Language recognition from video using hidden Markov models , 1995, Proceedings of International Symposium on Computer Vision - ISCV.

[40]  Minerva M. Yeung,et al.  Efficient matching and clustering of video shots , 1995, Proceedings., International Conference on Image Processing.

[41]  Stephen W. Smoliar,et al.  Content-based video browsing tools , 1995, Electronic Imaging.

[42]  K. Wakimoto,et al.  Efficient and Effective Querying by Image Content , 1994 .

[43]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[44]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[45]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .