Intelligence is a capability ascribed typically to animals, but not usually to plants. Animals can move while plants do not. Is the mobility a necessary condition or driving force for the emergence of intelligence? We hypothesize that mobility plays a foundational role in evolving animal and human intelligence, thus, is fundamentally important in understanding and creating embodied cognitive systems [1]. In this project, we aim to develop a new class of machine learning algorithms for mobile cognitive systems that actively collect data by sensing and interacting with the environment. We envision a new paradigm of autonomous AI that overcomes the previous AI paradigms of top-down/rule-driven symbolic and bottom-up/data-driven statistical systems. Inspired by the dual process theory of mind [2]. We use mobile robot platforms to investigate the autonomous learning algorithms and demonstrate their capability in real-world home environments. Introduction: In the history of artificial intelligence (AI), two main approaches have emerged: symbolic and statistical systems. The former approach, or first generation AI, is deductive, relies on rule-based programming, and can solve complex problems, however, faces difficulties in learning and adaptability. The latter approach, or second generation AI, is inductive, relies on statistical learning from big data, but cannot solve complex problems, the speed of learning is limited, and thus faces the issues of scalability. To create human-level artificial intelligence, we need a methodology that combines the best of both approaches and also scales up to real complex problems. Recent advancements in deep learning provide a crucial lesson in this direction, i.e., building more expressive representations help solve complex problems [3][4]. This provides evidence for an earlier prediction, that “learning requires much more memory than we have thought to solve real-world problems” [5]. Deep learning models use much larger memory than previous machine learning models, but they do not overfit due to the increased data size. However, deep learning models are very limited in their learning speed, flexibility, and robustness when applied to dynamic environments of mobile cognitive agents. Why and how has the human brain evolved to learn so rapidly, flexibly, and robustly? We hypothesize that the brain evolved these properties mainly to support its mobility for the survival of its body in hostile environments [1][6]. In fact, the brain’s main function is to make decisions and control the body motion. Higher functions like memory and planning were evolved on top of this substrate. Therefore, to achieve a truly human-level AI, it is important to study higher-level intelligence, such as vision and language, in a mobile platform and dynamic environment. It is our belief that fast, flexible, and robust learning in interactive mobile environments will give rise to a new paradigm of machine learning that will enable the next generation of autonomous AI systems. In this project, the ultimate goal is to demonstrate a mobile personal robot that learns the objects, people, actions, events, episodes and schedule plans from daily to extended periods of time. In the basic year of the project, we built a multi-module integrated system for mobile robots to perceive information (objects, people, actions) from the environment, act (schedule, interact) according to the perceived information and develop models that learn the dynamics of the environment. We also demonstrated the integration of multimodal information for an interactive system which efficiently infers and responds to the goals and plans of the observed environment. DISTRIBUTION A. Approved for public release: distribution unlimited. 2 Experiments and Results: a) Perception-Action-Learning System for Mobile Social-Service Robots Making robots becoming more human-like, capable of providing natural social services to the customers in dynamic environments such as houses, restaurants, hotels and even airports has been a challenging goal for researchers in the field of social-service robotics. One promising approach is developing an integrated system of methodologies from many different research areas. This multi-module integrated intelligent robotic system has been widely accepted and its performance has been well known from previous studies [7][8]. However, with the individual roles of each module in the integrated system, perception modules mostly suffered from desynchronization between each other and difficulty in adapting to dynamic environments [9]. This occurred because of the different process time and scale of coverage of the adopted vision techniques [10]. To overcome such difficulties, developers usually upgraded or added expensive sensors (hardware) to the robot to improve performances. Though this may have provided some solutions to the limitations, current robot systems still have difficulties on natural interaction within real-life, dynamic environment. We account this matter by designing a system incorporated with state-of-the-art deep learning methods and inspiration by the cognitive perception-action-learning cycle [11]. The implemented novel and robust integrated system for mobile social-service robots that at least includes an RGB-D camera and any obstacle detecting sensors (laser, bumper, sonar), achieved real-time performance on various social service tasks. Also, by performing the task in real-time with robustness, more natural interaction with people could be attained. As illustrated in Figure 1, our system's perception-action-learning cycle works in real-time (~0.2 s/cycle) where the arrows indicate the flow of each module. The system was implemented on a server of I7 CPU, 32 GB RAM and GTX Titan 12 GB GPU. Using ROS topics, the communication between the server and the robot were achieved and the ROS topics were passed through 5 GHz Wi-Fi connection. The conducted experiments were finely designed by the RoboCup@Home Committee, which is described in the rulebook [12] and our system was able to perform all the scenarios in a significantly improved way. RoboCup2017@Home Social Standard Platform League (SSPL) Winning First Place We used our system on SoftBank Pepper, a standardized mobile social-service robot, and achieved the highest score in every scenario performed at the RoboCup2017@Home Social Figure 1. Perception-Action-Learning system for mobile social-service robots using deep learning DISTRIBUTION A. Approved for public release: distribution unlimited. 3 Standard Platform League (SSPL), winning first place overall. Our system allows robots to perform social service tasks in real-life social situations with high performance working in real-time. However, our system is yet to fulfill every individual's expectations on performance and processing speed, we highlight the importance of research on not only the individual elements but the integration of each module for developing a more human-like, idealistic robot to assist humans in the future. Related videos can be found at https://goo.gl/Pxnf1n and our open-sourced codes at https://github.com/soseazi/pal_pepper. [Table 1] RoboCup2017@Home Social Standard Platform League (SSPL) Test 1 Result Team Poster Speech &
[1]
D. Kahneman.
Thinking, Fast and Slow
,
2011
.
[2]
Byoung-Tak Zhang,et al.
Teaching an Agent by Playing a Multimodal Memory Game: Challenges for Machine Learners and Human Teachers
,
2009,
AAAI Spring Symposium: Agents that Learn from Human Teachers.
[3]
David Badre,et al.
Cognitive control, hierarchy, and the rostro–caudal organization of the frontal lobes
,
2008,
Trends in Cognitive Sciences.
[4]
Wei Liu,et al.
SSD: Single Shot MultiBox Detector
,
2015,
ECCV.
[5]
Nico Blodow,et al.
RoboSherlock: Unstructured information processing for robot perception
,
2015,
2015 IEEE International Conference on Robotics and Automation (ICRA).
[6]
Sven Wachsmuth,et al.
Deploying a modeling framework for reusable robot behavior to enable informed strategies for domestic service robots
,
2014,
Robotics Auton. Syst..
[7]
Geoffrey E. Hinton,et al.
ImageNet classification with deep convolutional neural networks
,
2012,
Commun. ACM.
[8]
Luc Van Gool,et al.
A mobile vision system for robust multi-person tracking
,
2008,
2008 IEEE Conference on Computer Vision and Pattern Recognition.
[9]
Yaser Sheikh,et al.
OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields
,
2018,
IEEE Transactions on Pattern Analysis and Machine Intelligence.
[10]
Samy Bengio,et al.
Show and tell: A neural image caption generator
,
2014,
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11]
Siddhartha S. Srinivasa,et al.
The MOPED framework: Object recognition and pose estimation for manipulation
,
2011,
Int. J. Robotics Res..
[12]
A. Clark.
Being There: Putting Brain, Body, and World Together Again
,
1996
.
[13]
Vicente Matellán Olivera,et al.
A Motivational Architecture to Create more Human-Acceptable Assistive Robots for Robotics Competitions
,
2016,
ICARSC.
[14]
Michael Beetz,et al.
Scaling perception towards autonomous object manipulation — in knowledge lies the power
,
2016,
2016 IEEE International Conference on Robotics and Automation (ICRA).
[15]
Jun Miura,et al.
A SIFT-Based person identification using a distance-dependent appearance model for a person following robot
,
2012,
2012 IEEE International Conference on Robotics and Biomimetics (ROBIO).
[16]
Martín Abadi,et al.
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
,
2016,
ArXiv.
[17]
Ali Farhadi,et al.
YOLO9000: Better, Faster, Stronger
,
2016,
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18]
Li Fei-Fei,et al.
DenseCap: Fully Convolutional Localization Networks for Dense Captioning
,
2015,
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19]
Jan Drugowitsch.
Variational Bayesian inference for linear and logistic regression
,
2013,
1310.5438.
[20]
Danica Kragic,et al.
A person following behaviour for a mobile robot
,
1999,
Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).
[21]
Christian Szegedy,et al.
DeepPose: Human Pose Estimation via Deep Neural Networks
,
2013,
2014 IEEE Conference on Computer Vision and Pattern Recognition.
[22]
Luca Iocchi,et al.
RoboCup@Home: Scientific Competition and Benchmarking for Domestic Service Robots
,
2009
.
[23]
Rodney A. Brooks,et al.
A Robust Layered Control Syste For A Mobile Robot
,
2022
.
[24]
Shane Legg,et al.
Human-level control through deep reinforcement learning
,
2015,
Nature.
[25]
Kaiming He,et al.
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
,
2015,
IEEE Transactions on Pattern Analysis and Machine Intelligence.
[26]
Matteo Munaro,et al.
A feature-based approach to people re-identification using skeleton keypoints
,
2014,
2014 IEEE International Conference on Robotics and Automation (ICRA).