Multi-Observations Newscast EM for Distributed Multi-Camera Tracking

Visual surveillance in wide areas (e.g. airports) relies on multiple cameras which observe non-overlapping scenes. The focus of this thesis is multi-person tracking, where the task is to maintain a person's identity when he or she leaves the field of view of one camera and later re-appears at another camera. While current wide-area tracking systems are central systems, we propose to use a distributed system; where every camera learns from both its own observations and communication with other cameras. Multi-person tracking can be seen as a data association problem, where the observations of the same person, gathered from different cameras, have to be clustered into trajectories. For this correspondence between a person's identity and an observation, we use appearances features (such as colour or height) and spatial-temporal features (such as the motion of a person from one camera to another). Under the assumption that all appearances of a single person are Gaussian distributed, the appearance model in our approach consists of a Mixture of Gaussians. The Expectation-Maximization algorithm can be used to learn the parameters of the Mixture of Gaussians. In this thesis we introduce Multi-Observations Newscast EM to learn the parameters of the Mixture of Gaussians from distributed observations. Each camera learns its own Mixture of Gaussians model. Multi-Observations Newscast EM uses a gossip-based protocol for the M-step. We provide theoretical evidence, and using experiments show, that in an M-step each camera converges exponentially fast to the correct estimates. We propose to initialize Multi-Observations Newscast EM with a distributed K-Means to improve the performance. We found that Multi-Observations Newscast EM performs equally to a standard EM algorithm. In this thesis we present two probabilistic models for multi-person tracking. The first model uses only appearance features, while the second model also uses spatial-temporal models. Both models are implemented in a central system and in a distributed system. The distributed system uses Multi-Observations Newscast EM, while the central system uses a standard EM. The two models are tested on artificial data and on a collection of real-world observations gathered by several cameras in the university building. The results show that the central system and the distributed system perform equally well. While the more elaborate model, which uses appearance features and spatial-temporal features, outperforms the simple model.

[1]  James R. Hopgood,et al.  Nonconcurrent multiple speakers tracking based on extended Kalman particle filter , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Jakob J. Verbeek,et al.  Mixture models for clustering and dimension reduction , 2004 .

[3]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .

[4]  James Stewart,et al.  Calculus: Concepts and Contexts , 1999 .

[5]  Ujjwal Maulik,et al.  Clustering distributed data streams in peer-to-peer environments , 2006, Inf. Sci..

[6]  Svetha Venkatesh,et al.  Recognizing and monitoring high-level behaviors in complex spatial environments , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[7]  A.I.M. Weitenberg,et al.  Cameratoezicht: De menselijke factor , 2003 .

[8]  Sanjoy Dasgupta,et al.  Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[9]  Johannes Gehrke,et al.  Gossip-based computation of aggregate information , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[10]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[11]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[12]  Rita Cucchiara,et al.  Computer vision system for in-house video surveillance , 2005 .

[13]  W. P. Zajdel,et al.  Bayesian visual surveillance : from object detection to distributed cameras , 2006 .

[14]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[15]  Bin Zhang,et al.  Distributed data clustering can be efficient and exact , 2000, SKDD.

[16]  Dimitris K. Tasoulis,et al.  Unsupervised distributed clustering , 2004, Parallel and Distributed Computing and Networks.

[17]  Paolo Remagnino,et al.  Distributed intelligence for multi-camera visual surveillance , 2004, Pattern Recognit..

[18]  Marc Gelgon,et al.  Fast decentralized learning of a Gaussian mixture model for large-scale multimedia retrieval , 2006, 14th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP'06).

[19]  Atsushi Nakazawa,et al.  Human tracking using distributed vision systems , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[20]  Nikos A. Vlassis,et al.  Newscast EM , 2004, NIPS.

[21]  Ben J. A. Kröse,et al.  Distributed EM Learning for Appearance Based Multi-Camera Tracking , 2007, 2007 First ACM/IEEE International Conference on Distributed Smart Cameras.

[22]  Jie Wei,et al.  Illumination-invariant color object recognition via compressed chromaticity histograms of color-channel-normalized images , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[23]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[24]  Osama Masoud,et al.  A method for human action recognition , 2003, Image Vis. Comput..

[25]  M. V. Steen,et al.  Newscast Computing , 2003 .

[26]  Johannes D. Krijnders,et al.  CASSANDRA: audio-video sensor fusion for aggression detection , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[27]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[28]  Ramin Zabih,et al.  Bayesian multi-camera surveillance , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[29]  Ben J. A. Kröse,et al.  Efficient Greedy Learning of Gaussian Mixture Models , 2003, Neural Computation.

[30]  Chris Clifton,et al.  Privacy-preserving clustering with distributed EM mixture modeling , 2004, Knowledge and Information Systems.

[31]  Alfred O. Hero,et al.  Distributed maximum likelihood estimation for sensor networks , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[32]  Gerhard Weiss,et al.  Multiagent Systems , 1999 .

[33]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[34]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[35]  Sanjoy Dasgupta,et al.  Experiments with Random Projection , 2000, UAI.

[36]  Sergio A. Velastin,et al.  Intelligent distributed surveillance systems: a review , 2005 .

[37]  J. Wade Davis,et al.  Statistical Pattern Recognition , 2003, Technometrics.

[38]  Nikos Vlassis,et al.  Bayesian methods for tracking and localization , 2006 .

[39]  Nikos A. Vlassis,et al.  Gossip-Based Greedy Gaussian Mixture Learning , 2005, Panhellenic Conference on Informatics.

[40]  Ben J. A. Kröse,et al.  An EM-like algorithm for color-histogram-based object tracking , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[41]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[42]  Agostino Poggi,et al.  Multiagent Systems , 2006, Intelligenza Artificiale.

[43]  Robert D. Nowak,et al.  Distributed EM algorithms for density estimation and clustering in sensor networks , 2003, IEEE Trans. Signal Process..