Incorporating hand gesture recognition in human–robot interaction has the potential to provide a natural way of communication, thus contributing to a more fluid collaboration toward optimizing the efficiency of the application at hand and overcoming possible challenges. A very promising field of interest is agriculture, owing to its complex and dynamic environments. The aim of this study was twofold: (a) to develop a real-time skeleton-based recognition system for five hand gestures using a depth camera and machine learning, and (b) to enable a real-time human–robot interaction framework and test it in different scenarios. For this purpose, six machine learning classifiers were tested, while the Robot Operating System (ROS) software was utilized for “translating” the gestures into five commands to be executed by the robot. Furthermore, the developed system was successfully tested in outdoor experimental sessions that included either one or two persons. In the last case, the robot, based on the recognized gesture, could distinguish which of the two workers required help, follow the “locked” person, stop, return to a target location, or “unlock” them. For the sake of safety, the robot navigated with a preset socially accepted speed while keeping a safe distance in all interactions.