Multiple-view object recognition in band-limited distributed camera networks

In this paper, we study the classical problem of object recognition in low-power, low-bandwidth distributed camera networks. The ability to perform robust object recognition is crucial for applications such as visual surveillance to track and identify objects of interest, and compensate visual nuisances such as occlusion and pose variation between multiple camera views. We propose an effective framework to perform distributed object recognition using a network of smart cameras and a computer as the base station. Due to the limited bandwidth between the cameras and the computer, the method utilizes the available computational power on the smart sensors to locally extract and compress SIFT-type image features to represent individual camera views. In particular, we show that between a network of cameras, high-dimensional SIFT histograms share a joint sparse pattern corresponding to a set of common features in 3-D. Such joint sparse patterns can be explicitly exploited to accurately encode the distributed signal via random projection, which is unsupervised and independent to the sensor modality. On the base station, we study multiple decoding schemes to simultaneously recover the multiple-view object features based on the distributed compressive sensing theory. The system has been implemented on the Berkeley CITRIC smart camera platform. The efficacy of the algorithm is validated through extensive simulation and experiments.

[1]  Santosh S. Vempala,et al.  The Random Projection Method , 2005, DIMACS Series in Discrete Mathematics and Theoretical Computer Science.

[2]  Trevor Darrell,et al.  Unsupervised feature selection via distributed coding for multi-view object recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  R. DeVore,et al.  The Johnson-Lindenstrauss Lemma Meets Compressed Sensing , 2006 .

[4]  Bernard Chazelle,et al.  The Fast Johnson--Lindenstrauss Transform and Approximate Nearest Neighbors , 2009, SIAM J. Comput..

[5]  Allen Y. Yang,et al.  CITRIC: A low-bandwidth wireless camera network platform , 2008, 2008 Second ACM/IEEE International Conference on Distributed Smart Cameras.

[6]  David L. Donoho,et al.  Neighborly Polytopes And Sparse Solution Of Underdetermined Linear Equations , 2005 .

[7]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[9]  Bernard Chazelle,et al.  Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform , 2006, STOC '06.

[10]  Joel A. Tropp,et al.  Algorithms for simultaneous sparse approximation. Part I: Greedy pursuit , 2006, Signal Process..

[11]  B. Rao Analysis and extensions of the FOCUSS algorithm , 1996, Conference Record of The Thirtieth Asilomar Conference on Signals, Systems and Computers.

[12]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[13]  Zhaolin Cheng,et al.  Determining Vision Graphs for Distributed Camera Networks Using Feature Digests , 2007, EURASIP J. Adv. Signal Process..

[14]  Joel A. Tropp,et al.  ALGORITHMS FOR SIMULTANEOUS SPARSE APPROXIMATION , 2006 .

[15]  Subhransu Maji,et al.  Classification using intersection kernel support vector machines is efficient , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[17]  Bernd Girod,et al.  Tree Histogram Coding for Mobile Image Matching , 2009, 2009 Data Compression Conference.

[18]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[19]  Luc Van Gool,et al.  Towards Multi-View Object Class Detection , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[20]  Bernt Schiele,et al.  Pedestrian detection in crowded scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[21]  D. Donoho For most large underdetermined systems of equations, the minimal 𝓁1‐norm near‐solution approximates the sparsest near‐solution , 2006 .

[22]  Chuohao Yeo,et al.  Rate-efficient visual correspondences using random projections , 2008, 2008 15th IEEE International Conference on Image Processing.

[23]  Subhransu Maji,et al.  Distributed compression and fusion of nonnegative sparse signals for multiple-view object recognition , 2009, 2009 12th International Conference on Information Fusion.

[24]  Dan Roth,et al.  Learning a Sparse Representation for Object Detection , 2002, ECCV.

[25]  Trevor Darrell,et al.  Transfer learning for image classification with sparse prototype representations , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[27]  Emmanuel J. Candès,et al.  Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? , 2004, IEEE Transactions on Information Theory.

[28]  R.G. Baraniuk,et al.  Distributed Compressed Sensing of Jointly Sparse Signals , 2005, Conference Record of the Thirty-Ninth Asilomar Conference onSignals, Systems and Computers, 2005..

[29]  Kenneth Ward Church,et al.  Nonlinear Estimators and Tail Bounds for Dimension Reduction in l1 Using Cauchy Random Projections , 2006, J. Mach. Learn. Res..

[30]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Different Scenes , 2008, ECCV.

[31]  D. Donoho,et al.  Neighborliness of randomly projected simplices in high dimensions. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[33]  Yonina C. Eldar,et al.  Robust Recovery of Signals From a Union of Subspaces , 2008, ArXiv.

[34]  John J. Lee,et al.  LIBPMK: A Pyramid Match Toolkit , 2008 .

[35]  Mark D. Plumbley Recovery of Sparse Representations by Polytope Faces Pursuit , 2006, ICA.

[36]  Tinne Tuytelaars,et al.  Integrating multiple model views for object recognition , 2004, CVPR 2004.

[37]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..