The Surprising Effectiveness of Linear Models for Visual Foresight in Object Pile Manipulation

In this paper, we tackle the problem of pushing piles of small objects into a desired target set using visual feedback. Unlike conventional single-object manipulation pipelines, which estimate the state of the system parametrized by pose, the underlying physical state of this system is difficult to observe from images. Thus, we take the approach of reasoning directly in the space of images, and acquire the dynamics of visual measurements in order to synthesize a visual-feedback policy. We present a simple controller using an image-space Lyapunov function, and evaluate the closed-loop performance using three different class of models for image prediction: deep-learning-based models for image-to-image translation, an object-centric model obtained from treating each pixel as a particle, and a switched-linear system where an action-dependent linear map is used. Through results in simulation and experiment, we show that for this task, a linear model works surprisingly well -- achieving better prediction error, downstream task performance, and generalization to new environments than the deep models we trained on the same amount of data. We believe these results provide an interesting example in the spectrum of models that are most useful for vision-based feedback in manipulation, considering both the quality of visual prediction, as well as compatibility with rigorous methods for control design and analysis.

[1]  Robert C. Bolles,et al.  Parametric Correspondence and Chamfer Matching: Two New Techniques for Image Matching , 1977, IJCAI.

[2]  Matthew T. Mason,et al.  Mechanics and Planning of Manipulator Pushing Operations , 1986 .

[3]  Yong Yu,et al.  Estimation of mass and center of mass of graspless and shape-unknown object , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[4]  Emmanuel Trélat,et al.  Nonlinear Optimal Control via Occupation Measures and LMI-Relaxations , 2007, SIAM J. Control. Optim..

[5]  P. Colaneri ANALYSIS AND CONTROL OF LINEAR SWITCHED SYSTEMS , 2009 .

[6]  Jürgen Schmidhuber,et al.  Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.

[7]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  Clarence W. Rowley,et al.  A Data–Driven Approximation of the Koopman Operator: Extending Dynamic Mode Decomposition , 2014, Journal of Nonlinear Science.

[10]  nbsp,et al.  Do We Really Need To Study Rotorcraft as Linear Periodic Systems , 2015 .

[11]  Sergey Levine,et al.  Learning Visual Feature Spaces for Robotic Manipulation with Deep Spatial Autoencoders , 2015, ArXiv.

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Ian R. Manchester,et al.  Scalable identification of stable positive systems , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[14]  Jitendra Malik,et al.  Learning Visual Predictive Models of Physics for Playing Billiards , 2015, ICLR.

[15]  Jitendra Malik,et al.  Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[16]  Sergey Levine,et al.  Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[17]  Razvan Pascanu,et al.  Visual Interaction Networks: Learning a Physics Simulator from Video , 2017, NIPS.

[18]  Paul A. Gagniuc,et al.  Markov Chains: From Theory to Implementation and Experimentation , 2017 .

[19]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Maya Cakmak,et al.  Robotic Cleaning Through Dirt Rearrangement Planning with Learned Transition Models , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Alberto Rodriguez,et al.  Learning Synergies Between Pushing and Grasping with Self-Supervised Deep Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  Alberto Rodriguez,et al.  Friction Variability in Planar Pushing Data: Anisotropic Friction and Data-Collection Bias , 2018, IEEE Robotics and Automation Letters.

[23]  Chen Sun,et al.  Unsupervised Learning of Object Structure and Dynamics from Videos , 2019, NeurIPS.

[24]  Debasish Ghose,et al.  Planning Robot Motion using Deep Visual Prediction , 2019, ArXiv.

[25]  Abhinav Gupta,et al.  Object-centric Forward Modeling for Model Predictive Control , 2019, CoRL.

[26]  Sergey Levine,et al.  Reasoning About Physical Interactions with Object-Oriented Prediction and Planning , 2018, ICLR.

[27]  Tucker Hermans,et al.  Learning to Manipulate Object Collections Using Grounded State Representations , 2019, CoRL.

[28]  Jonathan Ragan-Kelley,et al.  DiffTaichi: Differentiable Programming for Physical Simulation , 2019, ICLR.

[29]  Silvio Savarese,et al.  KETO: Learning Keypoint Representations for Tool Manipulation , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).