Leeds Robotic Commands.

Leeds Robotic Commands is a dataset of real-world RGB-D scenes of a robot manipulating different objects together with natural language descriptions of these actions. The scenes were recorded using a Microsoft Kinect2 sensor, and the descriptions were annotated by non-expert volunteers. The dataset includes 204 videos consisting of 17,373 frames in total. The dataset contains a total of 1024 commands, average of five-per video. A total of 51 different objects are manipulated in the videos such as basic block shapes, fruits, cutlery, and office supplies.