Multiple-View Object Recognition in Smart Camera Networks

We study object recognition in low-power, low-bandwidth smart camera networks. The ability to perform robust object recognition is crucial for applications such as visual surveillance to track and identify objects of interest, and overcome visual nuisances such as occlusion and pose variations between multiple camera views. To accommodate limited bandwidth between the cameras and the base-station computer, the method utilizes the available computational power on the smart sensors to locally extract SIFT-type image features to represent individual camera views. We show that between a network of cameras, high-dimensional SIFT histograms exhibit a joint sparse pattern corresponding to a set of shared features in 3-D. Such joint sparse patterns can be explicitly exploited to encode the distributed signal via random projections. At the network station, multiple decoding schemes are studied to simultaneously recover the multiple-view object features based on a distributed compressive sensing theory. The system has been implemented on the Berkeley CITRIC smart camera platform. The efficacy of the algorithm is validated through extensive simulation and experiment.

[1]  Mark D. Plumbley Recovery of Sparse Representations by Polytope Faces Pursuit , 2006, ICA.

[2]  Luc Van Gool,et al.  Towards Multi-View Object Class Detection , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  Tinne Tuytelaars,et al.  Integrating multiple model views for object recognition , 2004, CVPR 2004.

[4]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[5]  Bernd Girod,et al.  Outdoors augmented reality on mobile phone using loxel-based visual feature organization , 2008, MIR '08.

[6]  Subhransu Maji,et al.  Distributed compression and fusion of nonnegative sparse signals for multiple-view object recognition , 2009, 2009 12th International Conference on Information Fusion.

[7]  R. A. McDonald,et al.  Noiseless Coding of Correlated Information Sources , 1973 .

[8]  Bernt Schiele,et al.  Pedestrian detection in crowded scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Different Scenes , 2008, ECCV.

[10]  R.G. Baraniuk,et al.  Distributed Compressed Sensing of Jointly Sparse Signals , 2005, Conference Record of the Thirty-Ninth Asilomar Conference onSignals, Systems and Computers, 2005..

[11]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[12]  Bernard Chazelle,et al.  Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform , 2006, STOC '06.

[13]  Emmanuel J. Candès,et al.  Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? , 2004, IEEE Transactions on Information Theory.

[14]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[15]  D. Donoho,et al.  Neighborliness of randomly projected simplices in high dimensions. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[17]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[18]  Dan Roth,et al.  Learning a Sparse Representation for Object Detection , 2002, ECCV.

[19]  Trevor Darrell,et al.  Transfer learning for image classification with sparse prototype representations , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[21]  Chuohao Yeo,et al.  Rate-efficient visual correspondences using random projections , 2008, 2008 15th IEEE International Conference on Image Processing.

[22]  Bernd Girod,et al.  Tree Histogram Coding for Mobile Image Matching , 2009, 2009 Data Compression Conference.

[23]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[24]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[25]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[26]  Trevor Darrell,et al.  Unsupervised feature selection via distributed coding for multi-view object recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Bernd Girod,et al.  Compression of image patches for local feature extraction , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[28]  Allen Y. Yang,et al.  CITRIC: A low-bandwidth wireless camera network platform , 2008, 2008 Second ACM/IEEE International Conference on Distributed Smart Cameras.

[29]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[30]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.