Infinite Latent Conditional Random Fields for Modeling Environments through Humans

Humans cast a substantial influence on their environments by interacting with it. Therefore, even though an environment may physically contain only objects, it cannot be modeled well without considering humans. In this paper, we model environments not only through objects, but also through latent human poses and human-object interactions. However, the number of potential human poses is large and unknown, and the human-object interactions vary not only in type but also in which human pose relates to each object. In order to handle such properties, we present Infinite Latent Conditional Random Fields (ILCRFs) that model a scene as a mixture of CRFs generated from Dirichlet processes. Each CRF represents one possible explanation of the scene. In addition to visible object nodes and edges, it generatively models the distribution of different CRF structures over the latent human nodes and corresponding edges. We apply the model to the challenging application of robotic scene arrangement. In extensive experiments, we show that our model significantly outperforms the state-of-the-art results. We further use our algorithm on a robot for placing objects in a new scene.

[1]  James M. Rehg,et al.  Perceiving clutter and surfaces for object placement in indoor environments , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[2]  Marco Grzegorczyk,et al.  Nonparametric Bayesian Networks , 2011 .

[3]  Rachid Alami,et al.  Taskability Graph: Towards analyzing effort based agent-agent affordances , 2012, 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication.

[4]  Yun Jiang,et al.  Learning Object Arrangements in 3D Scenes using Human Context , 2012, ICML.

[5]  Sebastian Nowozin,et al.  Non-parametric CRFs for Image Labeling , 2012 .

[6]  Yun Jiang,et al.  Hallucinated Humans as the Hidden Context for Labeling 3D Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  M. Escobar,et al.  Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[8]  Alexei A. Efros,et al.  Scene Semantics from Long-Term Observation of People , 2012, ECCV.

[9]  Anima Anandkumar,et al.  Learning Mixtures of Tree Graphical Models , 2012, NIPS.

[10]  Hema Swetha Koppula,et al.  Learning human activities and object affordances from RGB-D videos , 2012, Int. J. Robotics Res..

[11]  Yee Whye Teh,et al.  Dirichlet Process , 2017, Encyclopedia of Machine Learning and Data Mining.

[12]  Hema Swetha Koppula,et al.  Learning Spatio-Temporal Structure from RGB-D Videos for Human Activity Detection and Anticipation , 2013, ICML.

[13]  Nathan Srebro,et al.  Maximum likelihood bounded tree-width Markov networks , 2001, Artif. Intell..

[14]  Hema Swetha Koppula,et al.  Anticipating Human Activities Using Object Affordances for Reactive Robotic Response , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Trevor Darrell,et al.  Conditional Random Fields for Object Recognition , 2004, NIPS.

[16]  Danica Kragic,et al.  Visual object-action recognition: Inferring object affordances from human demonstration , 2011, Comput. Vis. Image Underst..

[17]  Michael Beetz,et al.  Equipping robot control programs with first-order probabilistic reasoning capabilities , 2009, 2009 IEEE International Conference on Robotics and Automation.

[18]  Bernt Schiele,et al.  Automatic discovery of meaningful object parts with latent CRFs , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Trevor Darrell,et al.  Hidden Conditional Random Fields for Gesture Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[20]  Ashutosh Saxena,et al.  Learning Depth from Single Monocular Images , 2005, NIPS.

[21]  Alexei A. Efros,et al.  From 3D scene geometry to human workspace , 2011, CVPR 2011.

[22]  Yun Jiang,et al.  Hallucinating Humans for Learning Robotic Placement of Objects , 2012, ISER.

[23]  Thorsten Joachims,et al.  Contextually Guided Semantic Labeling and Search for 3D Point Clouds , 2011, ArXiv.

[24]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[25]  Yee Whye Teh,et al.  The Infinite Factorial Hidden Markov Model , 2008, NIPS.

[26]  Thomas L. Griffiths,et al.  The Indian Buffet Process: An Introduction and Review , 2011, J. Mach. Learn. Res..

[27]  Trevor Darrell,et al.  Hidden Conditional Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Yun Jiang,et al.  Learning to place new objects , 2011, 2012 IEEE International Conference on Robotics and Automation.

[29]  Dafna Shahaf,et al.  Learning Thin Junction Trees via Graph Cuts , 2009, AISTATS.

[30]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[31]  Carl E. Rasmussen,et al.  Factorial Hidden Markov Models , 1997 .

[32]  Joseph K. Bradley,et al.  Learning Tree Conditional Random Fields , 2010, ICML.

[33]  Abel Rodríguez,et al.  Sparse covariance estimation in heterogeneous samples. , 2010, Electronic journal of statistics.

[34]  Charles C. Kemp,et al.  Manipulation in Human Environments , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[35]  Stefanos Zafeiriou,et al.  A Discriminative Nonparametric Bayesian Model: Infinite Hidden Conditional Random Fields , 2011, NIPS 2011.

[36]  Yun Jiang,et al.  Learning to place new objects in a scene , 2012, Int. J. Robotics Res..