A Situationally Aware Voice‐commandable Robotic Forklift Working Alongside People in Unstructured Outdoor Environments

One long-standing challenge in robotics is the realization of mobile autonomous robots able to operate safely in human workplaces, and be accepted by the human occupants. We describe the development of a multiton robotic forklift intended to operate alongside people and vehicles, handling palletized materials within existing, active outdoor storage facilities. The system has four novel characteristics. The first is a multimodal interface that allows users to efficiently convey task-level commands to the robot using a combination of pen-based gestures and natural language speech. These tasks include the manipulation, transport, and placement of palletized cargo within dynamic, human-occupied warehouses. The second is the robot's ability to learn the visual identity of an object from a single user-provided example and use the learned model to reliably and persistently detect objects despite significant spatial and temporal excursions. The third is a reliance on local sensing that allows the robot to handle variable palletized cargo and navigate within dynamic, minimally prepared environments without a global positioning system. The fourth concerns the robot's operation in close proximity to people, including its human supervisor, pedestrians who may cross or block its path, moving vehicles, and forklift operators who may climb inside the robot and operate it manually. This is made possible by interaction mechanisms that facilitate safe, effective operation around people. This paper provides a comprehensive description of the system's architecture and implementation, indicating how real-world operational requirements motivated key design choices. We offer qualitative and quantitative analyses of the robot operating in real settings and discuss the lessons learned from our effort.

[1]  Emilio Frazzoli,et al.  Closed-loop Pallet Engagement in Unstructured Environments , 2010, IROS 2010.

[2]  Horst Bischof,et al.  Semi-supervised On-Line Boosting for Robust Tracking , 2008, ECCV.

[3]  Seth J. Teller,et al.  Extrinsic Calibration from Per-Sensor Egomotion , 2012, Robotics: Science and Systems.

[4]  Emilio Frazzoli,et al.  Anytime Motion Planning using the RRT* , 2011, 2011 IEEE International Conference on Robotics and Automation.

[5]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[6]  Siddhartha S. Srinivasa,et al.  Object recognition and full pose registration from a single image for robotic manipulation , 2009, 2009 IEEE International Conference on Robotics and Automation.

[7]  Hande Kaymaz-Keskinpala,et al.  Objective data analysis for a PDA-based human robotic interface , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[8]  Alonzo Kelly,et al.  A An Infrastructure-Free Automated Guided Vehicle Based on Computer Vision , 2007 .

[9]  Wendy Ju,et al.  Expressing thought: Improving robot readability with animation principles , 2011, 2011 6th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[10]  Takafumi Matsumaru,et al.  Mobile Robot with Eyeball Expression as the Preliminary-Announcement and Display of the Robot’s Following Motion , 2005, Auton. Robots.

[11]  Magdalena D. Bugajska,et al.  Building a Multimodal Human-Robot Interface , 2001, IEEE Intell. Syst..

[12]  Wolfram Burgard,et al.  Mobile robot mapping in populated environments , 2003, Adv. Robotics.

[13]  Matthew R. Walter,et al.  One-shot visual appearance learning for mobile manipulation , 2012, Int. J. Robotics Res..

[14]  James R. Glass A probabilistic framework for segment-based speech recognition , 2003, Comput. Speech Lang..

[15]  James R. Glass,et al.  Robust Voice Activity Detector for Real World Applications Using Harmonicity and Modulation Frequency , 2011, INTERSPEECH.

[16]  Alonzo Kelly,et al.  Field and service applications - An infrastructure-free automated guided vehicle based on computer vision - An Effort to Make an Industrial Robot Vehicle that Can Operate without Supporting Infrastructure , 2007, IEEE Robotics & Automation Magazine.

[17]  Michael Seelinger,et al.  Automatic visual guidance of a forklift engaging a pallet , 2006, Robotics Auton. Syst..

[18]  Kostas E. Bekris,et al.  Asymptotically Near-Optimal Is Good Enough for Motion Planning , 2011, ISRR.

[19]  Rachid Alami,et al.  Toward Human-Aware Robot Task Planning , 2006, AAAI Spring Symposium: To Boldly Go Where No Human-Robot Team Has Gone Before.

[20]  Dieter Fox,et al.  Following directions using statistical machine translation , 2010, 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[21]  Emilio Frazzoli,et al.  Optimal kinodynamic motion planning using incremental sampling-based methods , 2010, 49th IEEE Conference on Decision and Control (CDC).

[22]  Yanxi Liu,et al.  Online selection of discriminative tracking features , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Marjorie Skubic,et al.  Spatial language for human-robot dialogs , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[24]  Luke Fletcher,et al.  Multimodal interaction with an autonomous forklift , 2010, 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[25]  Takayuki Kanda,et al.  Nonverbal leakage in robots: Communication of intentions through seemingly unintentional behavior , 2009, 2009 4th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[26]  Jiri Matas,et al.  Online learning of robust object detectors during unstable tracking , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[27]  Jonathan M. Roberts,et al.  Autonomous Hot Metal Carrier , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[28]  Derek Anderson,et al.  Using a hand-drawn sketch to control a team of robots , 2007, Auton. Robots.

[29]  Matthias Scheutz,et al.  What to do and how to do it: Translating natural language directives into temporal and dynamic logic representation for goal management and action execution , 2009, 2009 IEEE International Conference on Robotics and Automation.

[30]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[31]  Luke Fletcher,et al.  A perception‐driven autonomous urban vehicle , 2008, J. Field Robotics.

[32]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[33]  Luke S. Zettlemoyer,et al.  A Joint Model of Language and Perception for Grounded Attribute Learning , 2012, ICML.

[34]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[35]  Rita Cucchiara,et al.  Focus based Feature Extraction for Pallets Recognition , 2000, BMVC.

[36]  C.J. Tomlin,et al.  Autonomous Automobile Trajectory Tracking for Off-Road Driving: Controller Design, Experimental Validation and Racing , 2007, 2007 American Control Conference.

[37]  David G. Lowe,et al.  Local feature view clustering for 3D object recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[38]  Roger V. Bostelman,et al.  Visualization of pallets , 2006, SPIE Optics East.

[39]  Ming-Hsuan Yang,et al.  Incremental Learning for Visual Tracking , 2004, NIPS.

[40]  Luke Fletcher,et al.  A perception-driven autonomous urban vehicle , 2008 .

[41]  I. Lee Hetherington,et al.  PocketSUMMIT: small-footprint continuous speech recognition , 2007, INTERSPEECH.

[42]  GlassJames,et al.  A Situationally Aware Voice-commandable Robotic Forklift Working Alongside People in Unstructured Outdoor Environments , 2015 .

[43]  Luke Fletcher,et al.  A perception‐driven autonomous urban vehicle , 2008, J. Field Robotics.

[44]  Stefanie Tellex,et al.  Toward understanding natural language directions , 2010, 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[45]  Charles E. Thorpe,et al.  PdaDriver: A Handheld System for Remote Driving , 2003 .

[46]  脇元 修一,et al.  IEEE International Conference on Robotics and Automation (ICRA) におけるフルードパワー技術の研究動向 , 2011 .

[47]  Jiri Matas,et al.  P-N learning: Bootstrapping binary classifiers by structural constraints , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[48]  Hande Kaymaz-Keskinpala,et al.  PDA-based human-robotic interface , 2003, SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483).

[49]  David G. Lowe,et al.  What and Where: 3D Object Recognition with Accurate Pose , 2006, Toward Category-Level Object Recognition.

[50]  Yale Song,et al.  Tracking body and hands for gesture recognition: NATOPS aircraft handling signals database , 2011, Face and Gesture 2011.

[51]  Andrew Zisserman,et al.  Multiple View Geometry in Computer Vision (2nd ed) , 2003 .

[52]  Dorin Comaniciu,et al.  Kernel-Based Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[53]  Cordelia Schmid,et al.  Viewpoint-independent object class detection using 3D Feature Maps , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Eduardo Mario Nebot,et al.  Surface Mining: Main Research Issues for Autonomous Operations , 2005, ISRR.

[55]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[56]  Ming-Hsuan Yang,et al.  Visual tracking with online Multiple Instance Learning , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  Silvio Savarese,et al.  3D generic object categorization, localization and pose estimation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[58]  Stefanie Tellex,et al.  Toward Information Theoretic Human-Robot Dialog , 2012, Robotics: Science and Systems.

[59]  Ryosuke Shibasaki,et al.  Laser-based detection and tracking of multiple people in crowds , 2007, Comput. Vis. Image Underst..

[60]  Siddhartha S. Srinivasa,et al.  Generating Legible Motion , 2013, Robotics: Science and Systems.

[61]  James R. Glass,et al.  Spoken command of large mobile robots in outdoor environments , 2010, 2010 IEEE Spoken Language Technology Workshop.

[62]  Emilio Frazzoli,et al.  Incremental Sampling-based Algorithms for Optimal Motion Planning , 2010, Robotics: Science and Systems.

[63]  Tara N. Sainath,et al.  A voice-commandable robotic forklift working alongside humans in minimally-prepared outdoor environments , 2010, 2010 IEEE International Conference on Robotics and Automation.

[64]  Emilio Frazzoli,et al.  Sampling-based algorithms for optimal motion planning , 2011, Int. J. Robotics Res..

[65]  Benjamin Kuipers,et al.  Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions , 2006, AAAI.

[66]  Hugh F. Durrant-Whyte,et al.  Field and service applications - An autonomous straddle carrier for movement of shipping containers - From Research to Operational Autonomous Systems , 2007, IEEE Robotics & Automation Magazine.

[67]  Kaiqiong Sun,et al.  Extrinsic Calibration of a Camera and a Laser Range Finder using Point to Line Constraint , 2012 .

[68]  Randall Davis,et al.  Sketch Understanding in Design: Overview of Work at the MIT AI Lab , 2002 .

[69]  L. Dubins On Curves of Minimal Length with a Constraint on Average Curvature, and with Prescribed Initial and Terminal Positions and Tangents , 1957 .

[70]  Wolfram Burgard,et al.  Using Boosted Features for the Detection of People in 2D Range Data , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[71]  Raffaello D'Andrea,et al.  Coordinating Hundreds of Cooperative, Autonomous Vehicles in Warehouses , 2007, AI Mag..

[72]  Bernardo Wagner,et al.  Variable Pallet Pick-Up for Automatic Guided Vehicles in Industrial Environments , 2006, 2006 IEEE Conference on Emerging Technologies and Factory Automation.

[73]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[74]  Rainer Stiefelhagen,et al.  Implementation and evaluation of a constraint-based multimodal fusion system for speech and 3D pointing gestures , 2004, ICMI '04.

[75]  Derek Hoiem,et al.  3D LayoutCRF for Multi-View Object Class Recognition and Segmentation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[76]  Luke Fletcher,et al.  Simultaneous local and global state estimation for robotic navigation , 2009, 2009 IEEE International Conference on Robotics and Automation.

[77]  François Michaud,et al.  Egocentric and exocentric teleoperation interface using real-time, 3D video projection , 2009, 2009 4th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[78]  Jeffrey M. Bradshaw,et al.  Ten Challenges for Making Automation a "Team Player" in Joint Human-Agent Activity , 2004, IEEE Intell. Syst..

[79]  Jonathan M. Roberts,et al.  Vision‐based operations of a large industrial vehicle: Autonomous hot metal carrier , 2008, J. Field Robotics.

[80]  Emilio Frazzoli,et al.  Anytime computation of time-optimal off-road vehicle maneuvers using the RRT* , 2011, IEEE Conference on Decision and Control and European Control Conference.

[81]  Edwin Olson,et al.  LCM: Lightweight Communications and Marshalling , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[82]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.