Learning Instance Segmentation by Interaction

Objects are a fundamental component of visual perception. How are humans able to effortlessly reorganize their visual observations into a discrete set of objects is a question that has puzzled researchers for centuries. The Gestalt school of thought put forth the proposition that humans use similarity in color, texture and motion to group pixels into individual objects [21]. Various methods for object segmentation based on color and texture cues have been proposed [3, 6, 7, 14, 16]. These approaches are, however, known to over-segment multi-colored and textured objects.

[1]  M. Wertheimer Laws of organization in perceptual forms. , 1938 .

[2]  P. J. Huber Robust Estimation of a Location Parameter , 1964 .

[3]  J. Gibson The Ecological Approach to Visual Perception , 1979 .

[4]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[5]  Yiannis Aloimonos,et al.  Active vision , 2004, International Journal of Computer Vision.

[6]  Elizabeth S. Spelke,et al.  Principles of Object Perception , 1990, Cogn. Sci..

[7]  Paul M. Fitzpatrick,et al.  First contact: an active vision approach to segmentation , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[8]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[9]  Oliver Brock,et al.  Interactive segmentation for manipulation in unstructured environments , 2009, 2009 IEEE International Conference on Robotics and Automation.

[10]  Nicholas J. Butko,et al.  Active perception , 2010 .

[11]  Danica Kragic,et al.  Active 3D scene segmentation and detection of unknown objects , 2010, 2010 IEEE International Conference on Robotics and Automation.

[12]  Jianbo Shi,et al.  Multi-hypothesis motion planning for visual object tracking , 2011, 2011 International Conference on Computer Vision.

[13]  Danica Kragic,et al.  YES - YEt another object segmentation: Exploiting camera movement , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Cristian Sminchisescu,et al.  CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Anthony Cowley,et al.  Perception and motion planning for pick-and-place of dynamic objects , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[17]  Alexei A. Efros,et al.  Context as Supervisory Signal: Discovering Objects with Predictable Context , 2014, ECCV.

[18]  Alan Yuille,et al.  Active Vision , 2014, Computer Vision, A Reference Guide.

[19]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[20]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Edward H. Adelson,et al.  Crisp Boundary Detection Using Pointwise Mutual Information , 2014, ECCV.

[22]  Oliver Kroemer,et al.  Probabilistic Segmentation and Targeted Exploration of Objects in Cluttered Environments , 2014, IEEE Transactions on Robotics.

[23]  Vladlen Koltun,et al.  Geodesic Object Proposals , 2014, ECCV.

[24]  Joni Pajarinen,et al.  Decision making under uncertain segmentations , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Ronan Collobert,et al.  Learning to Segment Object Candidates , 2015, NIPS.

[26]  Nitish Srivastava Unsupervised Learning of Visual Representations using Videos , 2015 .

[27]  Trevor Darrell,et al.  Constrained Convolutional Neural Networks for Weakly Supervised Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Gaurav S. Sukhatme,et al.  Interactive Segmentation of Textured and Textureless Objects , 2015 .

[29]  Kristen Grauman,et al.  Learning Image Representations Tied to Ego-Motion , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  George Papandreou,et al.  Weakly-and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Jitendra Malik,et al.  Learning to See by Moving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[33]  Paolo Favaro,et al.  Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[34]  Lucian Busoniu,et al.  Handling Uncertainty and Networked Structure in Robot Control , 2016 .

[35]  Martial Hebert,et al.  Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification , 2016, ECCV.

[36]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[38]  Jitendra Malik,et al.  Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[39]  Andrew Owens,et al.  Ambient Sound Provides Supervision for Visual Learning , 2016, ECCV.

[40]  Abhinav Gupta,et al.  The Curious Robot: Learning Visual Representations via Physical Interactions , 2016, ECCV.

[41]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[42]  Dieter Fox,et al.  SE3-nets: Learning rigid body motion using deep neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[43]  Trevor Darrell,et al.  Adversarial Feature Learning , 2016, ICLR.

[44]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Trevor Darrell,et al.  Learning Features by Watching Objects Move , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Xinyu Liu,et al.  Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[47]  Abhinav Gupta,et al.  Learning to fly by crashing , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[48]  Jitendra Malik,et al.  Combining self-supervised learning and imitation for vision-based rope manipulation , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[49]  Paolo Favaro,et al.  Representation Learning by Learning to Count , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[50]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[51]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[52]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[53]  Jitendra Malik,et al.  Zero-Shot Visual Imitation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[54]  John K. Tsotsos,et al.  Revisiting active perception , 2016, Autonomous Robots.