The OD theory of TOD: the use and limits of temporal information for object discovery

We present the theory behind TOD (the Temporal Object Discoverer), a novel unsupervised system that uses only temporal information to discover objects across image sequences acquired by any number of uncalibrated cameras. The process is divided into three phases: (1) Extraction of each pixel's <i>temporal signature</i>, a partition of the pixel's observations into sets that stem from different objects; (2) Construction of a global schedule that explains the signatures in terms of the lifetimes of a set of quasi-static objects; (3) Mapping of each pixel's observations to objects in the schedule according to the pixel's temporal signature. Our Global Scheduling (GSched) algorithm provably constructs a valid and complete global schedule when certain observability criteria are met. Our Quasi-Static Labeling (QSL) algorithm uses the schedule created by GSched to produce the maximally-informative mapping of each pixel's observations onto the objects they stem from. Using GSched and QSL, TOD ignores distracting motion, correctly deals with complicated occlusions, and naturally groups observations across cameras. The sets of 2D masks recovered are suitable for unsupervised training and initialization of object recognition and tracking systems.

[1]  Tomaso A. Poggio,et al.  A Trainable System for Object Detection , 2000, International Journal of Computer Vision.

[2]  Yee-Hong Yang,et al.  Multiresolution Color Image Segmentation , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Anil K. Jain,et al.  Texture classification and segmentation using multiresolution simultaneous autoregressive models , 1992, Pattern Recognit..

[4]  Rahul Sukthankar,et al.  Discovering Objects using Temporal Information , 2002 .

[5]  A. Murat Tekalp,et al.  Region-Based Parametric Motion Segmentation Using Color Information , 1998, Graph. Model. Image Process..

[6]  Bernt Schiele,et al.  Recognition without Correspondence using Multidimensional Receptive Field Histograms , 2004, International Journal of Computer Vision.

[7]  Edward H. Adelson,et al.  Representing moving images with layers , 1994, IEEE Trans. Image Process..

[8]  Kentaro Toyama,et al.  Wallflower: principles and practice of background maintenance , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[9]  Jitendra Malik,et al.  Color- and texture-based image segmentation using EM and its application to content-based image retrieval , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).