Color aided motion-segmentation and object tracking for video sequences semantic analysis: Articles

The high rates at which digital multimedia is being generated and used makes it necessary to develop systems that can process it in an efficient manner. This can be achieved by extracting semantics from processing the video's low-level information. We present a novel algorithm which fuses color and motion information, in order to extract semantics from the video sequence. The motion estimates are processed statistically to give areas of activity in the video. Color segmentation is applied to these areas, and also to their complementary regions in each frame, in order to achieve the moving object segmentation. The extracted color layers in the activity and background areas are compared using the earth mover's distance (EMD), and a novel method, which we introduce, and which is based on a likelihood ratio test (LRT). The segmentation results of our LRT-based approach are shown to be more robust than the EMD results, and both methods are shown to be more accurate than the existing combined color-motion approaches. Furthermore, the LRT method allows the retrieval of additional semantics, namely of “maps” that indicate with what likelihood a pixel belongs to a moving object. The areas of activity can be used to retrieve semantics for the kind of activity taking place. The color-aided segmentation of the moving entities provides a full description of their appearance, so it can be used, for example, to classify the video based on the objects in it. Experiments with real sequences show that this method leads to accurate results and useful semantics. © 2007 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 17, 174–189, 2007

[1]  David J. Fleet,et al.  Performance of optical flow techniques , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Frédéric Dufaux,et al.  Efficient, robust, and fast global motion estimation for video coding , 2000, IEEE Trans. Image Process..

[4]  Michael Unser,et al.  Multiresolution moment filters: theory and applications , 2004, IEEE Transactions on Image Processing.

[5]  Gunther Wyszecki,et al.  Color Science: Concepts and Methods, Quantitative Data and Formulae, 2nd Edition , 2000 .

[6]  Dorin Comaniciu,et al.  Kernel-Based Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  A. Murat Tekalp,et al.  Automatic soccer video analysis and summarization , 2003, IEEE Trans. Image Process..

[8]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[9]  Alberto Del Bimbo,et al.  Semantic adaptation of sport videos with user-centred performance analysis , 2006, IEEE Transactions on Multimedia.

[10]  Aaron F. Bobick,et al.  Recognizing Planned, Multiperson Action , 2001, Comput. Vis. Image Underst..

[11]  Paulo Villegas,et al.  Objective evaluation of segmentation masks in video sequences , 2000, 2000 10th European Signal Processing Conference.

[12]  Langis Gagnon,et al.  Video Object Segmentation Based on Object Enhancement and Region Merging , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[13]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[14]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[16]  Kenneth M. Hanson,et al.  Estimators for the Cauchy Distribution , 1996 .

[17]  Ulrich Kressel,et al.  Tracking non-rigid, moving objects based on color cluster flow , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Xinguo Yu,et al.  A gridding Hough transform for detecting the straight lines in sports video , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[19]  Athanasios Papoulis,et al.  Probability, Random Variables and Stochastic Processes , 1965 .

[20]  David J. Marchette,et al.  Adaptive mixture density estimation , 1993, Pattern Recognit..

[21]  Gary R. Bradski,et al.  Motion segmentation and pose recognition with motion history gradients , 2002, Machine Vision and Applications.

[22]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[23]  Larry S. Davis,et al.  W4: Real-Time Surveillance of People and Their Activities , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  R. Gnanadesikan,et al.  Probability plotting methods for the analysis of data. , 1968, Biometrika.

[25]  Georgios B. Giannakis,et al.  Time-domain tests for Gaussianity and time-reversibility , 1994, IEEE Trans. Signal Process..

[26]  Wen-Nung Lie,et al.  Motion-based event detection and semantic classification for baseball sport videos , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[27]  A. Murat Tekalp,et al.  Shot type classification by dominant color for sports video segmentation and summarization , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[28]  Michael J. Black,et al.  The Robust Estimation of Multiple Motions: Parametric and Piecewise-Smooth Flow Fields , 1996, Comput. Vis. Image Underst..

[29]  Richard J. Qian,et al.  Detecting semantic events in soccer games: towards a complete solution , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[30]  Brendan McCane,et al.  Recovering Motion Fields: An Evaluation of Eight Optical Flow Algorithms , 1998, BMVC.

[31]  Larry S. Davis,et al.  Non-parametric Model for Background Subtraction , 2000, ECCV.

[32]  John L. Barron,et al.  Recursive estimation of time-varying motion and structure parameters , 1996, Pattern Recognit..

[33]  Hayit Greenspan,et al.  Context-based segmentation of image sequences , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  P. A. Delaney,et al.  Signal detection using third-order moments , 1994 .

[35]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[36]  Luc Van Gool,et al.  Modeling and Recognition of Human Actions Using a Stochastic Approach , 2002 .

[37]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[38]  L. Davis,et al.  Background and foreground modeling using nonparametric kernel density estimation for visual surveillance , 2002, Proc. IEEE.

[39]  Bruno O. Shubert,et al.  Random variables and stochastic processes , 1979 .

[40]  Bernd Heisele Motion-based Object Detection and Tracking in Color Image Sequence , 2000 .

[41]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[42]  Patrick Bouthemy,et al.  Multimodal Estimation of Discontinuous Optical Flow using Markov Random Fields , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  I. Haritaoglu,et al.  Background and foreground modeling using nonparametric kernel density estimation for visual surveillance , 2002 .

[44]  Dorin Comaniciu,et al.  Robust analysis of feature spaces: color image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[45]  Larry S. Davis,et al.  Efficient non-parametric adaptive color modeling using fast Gauss transform , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[46]  H. Vincent Poor,et al.  An Introduction to Signal Detection and Estimation , 1994, Springer Texts in Electrical Engineering.

[47]  H. Vincent Poor,et al.  An introduction to signal detection and estimation (2nd ed.) , 1994 .

[48]  Tim J. Ellis,et al.  Learning semantic scene models from observing activity in visual surveillance , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[49]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[50]  J.-Y. Bouguet,et al.  Pyramidal implementation of the lucas kanade feature tracker , 1999 .

[51]  Anthony G. Cohn,et al.  Generation of Semantic Regions from Image Sequences , 1996, ECCV.