A new approach for real-time target tracking in videos

Video segmentation is fundamentally challenging, but it has many computer vision applications. The goal of video segmentation is to isolate a target object from its background across a sequence of frames. It is a 3D problem that incorporates the dimension of time. That makes the input an order of magnitude larger than 2D problems, so computational efficiency is a major challenge. Time-efficient tracking of algorithms allows for tailing targets of interest in real-time and tracing these objects in large visual data sets. These capabilities are useful in law enforcement, analyzing shoppers’ behavior, sports broadcasting (having a camera zoom on a ball at all times), and many more applications. Tracking algorithms in the literature are categorized into three main classes, the active-contour approach,1, 2 statistical and stochastic methods,3, 4 and graph-theory-based tracking.5, 6 The active-contour approach uses continuous models coupled with consistency constraints to delineate a target object’s boundary. Since digital images or videos are innately discrete, these methods introduce errors when converting the discrete input to a certain continuous function and when converting the continuous output back to a discrete solution. Statistical and stochastic schemes, by comparison, rely heavily on iterative steps that are computationally intense and neither guarantee optimal solution nor the same output over different runs with the same input data. The third approach, which we employ, represents the issue as a graph problem. Until recently, graph formulations incorporated a set of motion-consistency-constraints, meaning the object’s location in the current frame is constrained to appear close to its location in the previous frame, shifted by the estimated motion. This approach is vulnerable to occlusions and often results in computationally complex problems. In contrast to existing graph-theoretic methods, our approach represents motion as a feature (like color or position), which applies neither heuristics Figure 1. Representing frames from surveillance sequences taken from the Context Aware Vision using Image-based Active Recognition (CAVIAR) data set. CAVIAR is a project of the European Commission’s Information Society Technology program.

[1]  Patrick Pérez,et al.  Track and Cut: Simultaneous Tracking and Segmentation of Multiple Objects with Graph Cuts , 2008, EURASIP J. Image Video Process..

[2]  Barak Fishbain,et al.  Real-time stabilization of long range observation system turbulent video , 2007, Journal of Real-Time Image Processing.

[3]  A. Murat Tekalp,et al.  Simultaneous motion estimation and segmentation , 1997, IEEE Trans. Image Process..

[4]  Kai-Kuang Ma,et al.  A new diamond search algorithm for fast block-matching motion estimation , 2000, IEEE Trans. Image Process..

[5]  Jie Yu,et al.  Multi-target Tracking in Crowded Scenes , 2011, DAGM-Symposium.

[6]  Dorit S. Hochbaum,et al.  The Pseudoflow Algorithm: A New Algorithm for the Maximum-Flow Problem , 2008, Oper. Res..

[7]  Dorit S. Hochbaum Polynomial Time Algorithms for Ratio Regions and a Variant of Normalized Cut , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Luc Van Gool,et al.  Markovian tracking-by-detection from a single, uncalibrated camera , 2009 .

[9]  Rainer Stiefelhagen,et al.  Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics , 2008, EURASIP J. Image Video Process..

[10]  Roman Goldenberg,et al.  Behavior classification by eigendecomposition of periodic motions , 2005, Pattern Recognit..

[11]  Aurélie Bugeau,et al.  Tracking with Occlusions via Graph Cuts , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Thomas Brox,et al.  Variational Motion Segmentation with Level Sets , 2006, ECCV.

[13]  Luc Van Gool,et al.  Online Multiperson Tracking-by-Detection from a Single, Uncalibrated Camera , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.