Unsupervised Activity Recognition Using Latent Semantic Analysis on a Mobile Robot

We show that by using qualitative spatio-temporal abstraction methods, we can learn common human movements and activities from long term observation by a mobile robot. Our novel framework encodes multiple qualitative abstractions of RGBD video from detected activities performed by a human as encoded by a skeleton pose estimator. Analogously to informational retrieval in text corpora, we use Latent Semantic Analysis (LSA) to uncover latent, semantically meaningful, concepts in an unsupervised manner, where the vocabulary is occurrences of qualitative spatio-temporal features extracted from video clips, and the discovered concepts are regarded as activity classes. The limited field of view of a mobile robot represents a particular challenge, owing to the obscured, partial and noisy human detections and skeleton pose-estimates from its environment. We show that the abstraction into a qualitative space helps the robot to generalise and compare multiple noisy and partial observations in a real world dataset and that a vocabulary of latent activity classes (expressed using qualitative features) can be recovered.

[1]  Anand Rajaraman,et al.  Mining of Massive Datasets , 2011 .

[2]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[3]  Balaraman Ravindran,et al.  Activity Recognition for Natural Human Robot Interaction , 2014, ICSR.

[4]  Bernard De Baets,et al.  A Qualitative Approach to the Identification, Visualisation and Interpretation of Repetitive Motion Patterns in Groups of Moving Point Objects , 2015, Int. Arab J. Inf. Technol..

[5]  Jake K. Aggarwal,et al.  Robot-Centric Activity Recognition 'in the Wild' , 2015, ICSR.

[6]  Juan Carlos Niebles,et al.  Spatial-Temporal correlatons for unsupervised action classification , 2008, 2008 IEEE Workshop on Motion and video Computing.

[7]  Anthony G. Cohn,et al.  Unsupervised Learning of Event Classes from Video , 2010, AAAI.

[8]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[9]  Mubarak Shah,et al.  Learning object motion patterns for anomaly detection and improved object detection , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Karl Rihaczek,et al.  1. WHAT IS DATA MINING? , 2019, Data Mining for the Social Sciences.

[11]  Yaser Sheikh,et al.  Exploring the space of a human action , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[12]  Tieniu Tan,et al.  A system for learning statistical motion patterns , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[14]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[15]  Jake K. Aggarwal,et al.  Robot-centric Activity Recognition from First-Person RGB-D Videos , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[16]  Frank Witlox,et al.  Representing moving objects in computer-based expert systems: the overtake event example , 2005, Expert Syst. Appl..

[17]  Anthony G. Cohn,et al.  QSRlib: a software library for online acquisition of qualitative spatial relations from video , 2016 .

[18]  James Bailey,et al.  Information theoretic measures for clusterings comparison: is a correction for chance necessary? , 2009, ICML '09.

[19]  Anthony G. Cohn,et al.  Unsupervised Learning of Qualitative Motion Behaviours by a Mobile Robot , 2016, AAMAS.

[20]  Nico Van de Weghe,et al.  Implementing a qualitative calculus to analyse moving point objects , 2011, Expert Syst. Appl..

[21]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[22]  David C. Hogg,et al.  Learning the distribution of object trajectories for event recognition , 1996, Image Vis. Comput..

[23]  Jake K. Aggarwal,et al.  Human activity recognition from 3D data: A review , 2014, Pattern Recognit. Lett..

[24]  Mubarak Shah,et al.  Recognizing human actions using multiple features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories , 2006 .

[26]  Anthony G. Cohn,et al.  Event Model Learning from Complex Videos using ILP , 2010, ECAI.

[27]  Shaogang Gong,et al.  Action categorization by structural probabilistic latent semantic analysis , 2010, Comput. Vis. Image Underst..

[28]  Tae-Kyun Kim,et al.  Learning Motion Categories using both Semantic and Structural Information , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Eliseo Clementini,et al.  Qualitative Representation of Positional Information , 1997, Artif. Intell..

[30]  Tieniu Tan,et al.  A hierarchical self-organizing approach for learning the patterns of motion trajectories , 2004, IEEE Trans. Neural Networks.

[31]  Qing Zhang,et al.  A Survey on Human Motion Analysis from Depth Data , 2013, Time-of-Flight and Depth Imaging.

[32]  Jake K. Aggarwal,et al.  Robot-Centric Activity Prediction from First-Person Videos: What Will They Do to Me? , 2015, 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[33]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[34]  M. Veloso,et al.  Learning and Recognizing Activities in Streams of Video , 2005 .

[35]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[36]  Reinhard Moratz,et al.  Qualitative spatial reasoning about relative point position , 2008, J. Vis. Lang. Comput..

[37]  Ian Witten,et al.  Data Mining , 2000 .

[38]  Gian Luca Foresti,et al.  Trajectory clustering and its applications for video surveillance , 2005, IEEE Conference on Advanced Video and Signal Based Surveillance, 2005..

[39]  Chenyang Zhang,et al.  RGB-D Camera-based Daily Living Activity Recognition , 2022 .

[40]  Fabrizio Costa,et al.  Fast Neighborhood Subgraph Pairwise Distance Kernel , 2010, ICML.

[41]  Wolfram Burgard,et al.  Probabilistic Robotics (Intelligent Robotics and Autonomous Agents) , 2005 .

[42]  Ehud Rivlin,et al.  Understanding Video Events: A Survey of Methods for Automatic Interpretation of Semantic Occurrences in Video , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[43]  Anthony G. Cohn,et al.  Egocentric Activity Monitoring and Recovery , 2012, ACCV.

[44]  Marwan Torki,et al.  Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations , 2013, IJCAI.

[45]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.