Kinsight: Localizing and Tracking Household Objects Using Depth-Camera Sensors

We solve the problem of localizing and tracking household objects using a depth-camera sensor network. We design and implement Kin sight that tracks household objects indirectly -- by tracking human figures, and detecting and recognizing objects from human-object interactions. We devise two novel algorithms: (1) Depth Sweep -- that uses depth information to efficiently extract objects from an image, and (2) Context Oriented Object Recognition -- that uses location history and activity context along with an RGB image to recognize object sat home. We thoroughly evaluate Kinsight's performance with a rich set of controlled experiments. We also deploy Kinsightin real-world scenarios and show that it achieves an average localization error of about 13 cm.

[1]  Tony F. Chan,et al.  Active contours without edges , 2001, IEEE Trans. Image Process..

[2]  P. Fua,et al.  Towards Recognizing Feature Points using Classification Trees , 2004 .

[3]  Mohamed R. Amer,et al.  Multiobject tracking as maximum weight independent set , 2011, CVPR 2011.

[4]  Stephen Gould,et al.  Region-based Segmentation and Object Detection , 2009, NIPS.

[5]  Vincent Lepetit,et al.  Stable real-time 3D tracking using online and offline information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[7]  Prashant J. Shenoy,et al.  Sherlock: automatically locating objects for humans , 2008, MobiSys '08.

[8]  Tarek F. Abdelzaher,et al.  Range-free localization schemes for large scale sensor networks , 2003, MobiCom '03.

[9]  Junzhou Huang,et al.  Robust tracking using local sparse appearance model and K-selection , 2011, CVPR 2011.

[10]  John A. Stankovic,et al.  Context-aware wireless sensor networks for assisted living and residential monitoring , 2008, IEEE Network.

[11]  Chong Wang,et al.  RFID-Based 3-D Positioning Schemes , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[12]  Andrea Vedaldi,et al.  Objects in Context , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[13]  Derek Hoiem,et al.  Category Independent Object Proposals , 2010, ECCV.

[14]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[15]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[16]  Ian D. Reid,et al.  Real-time tracking of multiple occluding objects using level sets , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Tsuhan Chen,et al.  Extracting adaptive contextual cues from unlabeled regions , 2011, 2011 International Conference on Computer Vision.

[18]  James M. Rehg,et al.  A Scalable Approach to Activity Recognition based on Object Use , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[19]  Radu Stoleru,et al.  Mobile Sensor Network Localization in Harsh Environments , 2010, DCOSS.

[20]  Yong Jae Lee,et al.  Object-graphs for context-aware category discovery , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Prashant J. Shenoy,et al.  Ferret: RFID Localization for Pervasive Multimedia , 2006, UbiComp.

[22]  Bart Selman,et al.  Human Activity Detection from RGBD Images , 2011, Plan, Activity, and Intent Recognition.

[23]  Kikuo Fujimura,et al.  Visual Tracking Using Depth Data , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[24]  Michael R. Souryal,et al.  RFID-based localization and tracking technologies , 2011, IEEE Wireless Communications.

[25]  Reinhard German,et al.  ALF: An autonomous localization framework for self-localization in indoor environments , 2011, 2011 International Conference on Distributed Computing in Sensor Systems and Workshops (DCOSS).

[26]  G. McLachlan,et al.  The EM Algorithm and Extensions: Second Edition , 2008 .

[27]  Gary J. Sullivan,et al.  Reduced-complexity search for video coding geometry partitions using texture and depth data , 2011, 2011 Visual Communications and Image Processing (VCIP).

[28]  Vincent Lepetit,et al.  Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes , 2011, 2011 International Conference on Computer Vision.

[29]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[30]  Neil A. Thacker,et al.  The Bhattacharyya metric as an absolute similarity measure for frequency coded data , 1998, Kybernetika.

[31]  Dieter Fox,et al.  Toward object discovery and modeling via 3-D scene comparison , 2011, 2011 IEEE International Conference on Robotics and Automation.

[32]  Antonio Torralba,et al.  Sharing Visual Features for Multiclass and Multiview Object Detection , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.