Real-Time Dynamic Object Recognition and Grasp Detection for Robotic Arm using Streaming Video: A Design for Visually Impaired Persons

The use of robotic arms in industrial applications such as manufacturing, medical and aerospace industry has been steadily gaining popularity. Research is also ongoing into how to improve the lives of ordinary citizens using robotic arms. Often, it is difficult for physically impaired individuals to complete tasks such as obtaining an object from a shelf, opening a refrigerator door, obtaining food or drink from the refrigerator, and picking a needed item from a drawer. Thus, they must rely on a caregiver to aid them with such tasks or complete these tasks for them. In such cases, a robotic arm could potentially be used to assist the visually impaired individual to pick up items. Prior work has described a system that takes in audio commands from an individual, applies natural language processing to identify the action required by the user, and combines this with object recognition and grasp detection to identify the object in the fridge. However, the limitation in that work was the different components were not integrated and there were inefficiencies in the process of grasp point detection from images.This work builds on existing research by integrating the object recognition and grasp detection components. The work describes a technique to dynamically adjust the size of the object image sent to the grasp point detection module by extracting only the portion necessary for grasp detection from the stream of camera images. This approach of separately performing the object recognition, identifying the portion of the object suitable for grasping (i.e., the handle), and dynamically determining the buffer zone around the handle for use in the grasp detection module reduces the amount of data transmitted between steps and enables real-time object recognition and grasp identification. The integrated framework with object recognition using a pre-trained model and dynamic grasp point detection program was successfully tested with an Intel RealSense D455 camera.