Using context with statistical relational models: object recognition from observing user activity in home environment

Object recognition from images in a home environment is challenging since the object usually has low resolution in the image and the scene is usually cluttered. However, many objects have specific functions to the user and the interactions between the user and the object provides useful contextual information to recognize the object. In this paper, we use Markov logic network (MLN) to model such context information as relationship between the objects and user activities. We demonstrate that Markov logic network provides a flexible way in the syntax of first-order logic to incorporate relational context information. It is also a probabilistic graphical model which handles uncertainty in the knowledge base, observations and decisions. In our experiment, objects in the living room and kitchen in a home are recognized based on only user's activity. The user's activity is analyzed from images of cameras installed in the home. Relationship between user activity and objects is defined in a knowledge base with MLN. Experiments show that objects in the home can be recognized irrespective of their position, size and appearance in the image.

[1]  Ferdinand van der Heijden,et al.  Efficient adaptive density estimation per image pixel for the task of background subtraction , 2006, Pattern Recognit. Lett..

[2]  Cristian Sminchisescu,et al.  Conditional models for contextual human motion recognition , 2006, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[3]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[4]  Boris E. R. de Ruyter,et al.  User Centered Research in ExperienceLab , 2007, AmI.

[5]  Manuela M. Veloso,et al.  Learning visual object definitions by observing human activities , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[6]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[7]  Svetha Venkatesh,et al.  Combining image regions and human activity for indirect object recognition in indoor wide-angle views , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[8]  Irfan A. Essa,et al.  Exploiting human actions and object context for recognition tasks , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[9]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[10]  Matthew Richardson,et al.  Markov Logic , 2008, Probabilistic Inductive Logic Programming.

[11]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.