The Role of Multisensory Data for Automatic Segmentation of Manipulation Skills

Due to the improvement of technology and the decline in the cost of robots, there is an increasing push to place robots in homes. However, robots placed in unstructured environments such as the home need to deal with the uncertainties of the environment. A multisensory representation of the world is important for robust interaction with the environment and recent work in the area show that using multiple sensory modalities can be learned through data collected through exploration and is beneficial for manipulation [12, 13, 10, 6]. Take for example a lamp similar to the one seen in Fig. 1. While the robot could rely on visual information to determine if the light has been turned on, it can also utilize touch to detect the change in pressure and sound to hear the click of the switch. This allows the robot to naturally develop contingency cases (e.g. light bulb is broken). Furthermore, a robot could adapt its control schemes to the environment by using feedback on each of its sensor modalities (e.g. pull until a particular force, hears a click, or sees light). This also shows how skills can be represented as low-level primitives (e.g. pull, grasp), which connect naturally to the different sensory modalities. For a robot to quickly to learn about the different sensory modalities and utilize them for manipulation, we use learning from demonstration (LfD) [2] to gather data of various manipulation skills. From these demonstrations, we use segmentation to separate the skills into low-level primitives, which are each individually modeled, similar to work from [9, 7, 8, 11, 3]. This allows the robot to adapt its actions and repeat and/or select different segments to successfully complete a task. While prior work autonomously segment trajectories, they do not utilize all three sensory modalities (i.e. visual, haptic, and audio). Furthermore, any combination of modalities (typically visual and proprioceptive), use carefully crafted features spaces (e.g. contact features for pick and place) to guide segmentation. The closest work that uses all sensory input is Kappler et al. [6], where they expand the notion of Associated Skill Memories (ASM) to include multimodal information about the effects of the actions. The sensory inputs include visual, haptic, and auditory information. However, this work does not segment the trajectory automatically. Instead, they rely on a provided manipulation graph. Unlike prior work, we Fig. 1: Our experimental platform, with the four different objects it interacted with. From left-to-right they are a lamp, a pasta jar, a plastic drawer set, and a breadbox

[1]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[2]  Jivko Sinapov,et al.  Vibrotactile Recognition and Categorization of Surfaces by a Humanoid Robot , 2011, IEEE Transactions on Robotics.

[3]  Maya Cakmak,et al.  Keyframe-based Learning from Demonstration , 2012, Int. J. Soc. Robotics.

[4]  Scott Kuindersma,et al.  Robot learning from demonstration by constructing skill trees , 2012, Int. J. Robotics Res..

[5]  Oliver Kroemer,et al.  Learning robot tactile sensing for object manipulation , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6]  Connor Schenck,et al.  Grounding semantic categories in behavioral interactions: Experiments with 100 objects , 2014, Robotics Auton. Syst..

[7]  Gaurav S. Sukhatme,et al.  An autonomous manipulation system based on force control and optimization , 2014, Auton. Robots.

[8]  Stefan Schaal,et al.  Data-Driven Online Decision Making for Autonomous Manipulation , 2015, Robotics: Science and Systems.

[9]  Scott Niekum,et al.  Learning grounded finite-state representations from unstructured demonstrations , 2015, Int. J. Robotics Res..

[10]  Oliver Kroemer,et al.  Towards learning hierarchical skills for multi-phase manipulation tasks , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Andrea Lockerd Thomaz,et al.  Learning haptic affordances from demonstration and human-guided exploration , 2016, 2016 IEEE Haptics Symposium (HAPTICS).

[12]  Chu Vivian,et al.  Learning object affordances by leveraging the combination of human-guidance and self-exploration , 2016 .

[13]  Charles C. Kemp,et al.  Multimodal execution monitoring for anomaly detection during robot manipulation , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).