Semantic Event Fusion of Different Visual Modality Concepts for Activity Recognition

Combining multimodal concept streams from heterogeneous sensors is a problem superficially explored for activity recognition. Most studies explore simple sensors in nearly perfect conditions, where temporal synchronization is guaranteed. Sophisticated fusion schemes adopt problem-specific graphical representations of events that are generally deeply linked with their training data and focused on a single sensor. This paper proposes a hybrid framework between knowledge-driven and probabilistic-driven methods for event representation and recognition. It separates semantic modeling from raw sensor data by using an intermediate semantic representation, namely concepts. It introduces an algorithm for sensor alignment that uses concept similarity as a surrogate for the inaccurate temporal information of real life scenarios. Finally, it proposes the combined use of an ontology language, to overcome the rigidity of previous approaches at model definition, and a probabilistic interpretation for ontological models, which equips the framework with a mechanism to handle noisy and ambiguous concept observations, an ability that most knowledge-driven methods lack. We evaluate our contributions in multimodal recordings of elderly people carrying out IADLs. Results demonstrated that the proposed framework outperforms baseline methods both in event recognition performance and in delimiting the temporal boundaries of event instances.

[1]  Adriana M. Seelye,et al.  Pervasive Computing Technologies to Continuously Assess Alzheimer’s Disease Progression and Intervention Efficacy , 2015, Front. Aging Neurosci..

[2]  Nicu Sebe,et al.  Egocentric Daily Activity Recognition via Multitask Clustering , 2015, IEEE Transactions on Image Processing.

[3]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[4]  A. G. Amitha Perera,et al.  Multimedia event detection with multimodal feature fusion and temporal concept localization , 2013, Machine Vision and Applications.

[5]  Meinard Müller,et al.  Information retrieval for music and motion , 2007 .

[6]  Chris D. Nugent,et al.  An Ontology-Based Hybrid Approach to Activity Modeling for Smart Homes , 2014, IEEE Transactions on Human-Machine Systems.

[7]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[8]  Duc Phu Chau,et al.  A multi-feature tracking algorithm enabling adaptation to context variations , 2011, ICDP.

[9]  François Brémond,et al.  Background subtraction in people detection framework for RGB-D cameras , 2014, 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[10]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[11]  James M. Keller,et al.  Recognizing complex instrumental activities of daily living using scene information and fuzzy logic , 2015, Comput. Vis. Image Underst..

[12]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[13]  C. Derouesné [Mini-mental state examination]. , 2001, Revue neurologique.

[14]  C. V. Jawahar,et al.  Generalized RBF feature maps for Efficient Detection , 2010, BMVC.

[15]  Philip Chan,et al.  Toward accurate dynamic time warping in linear time and space , 2007, Intell. Data Anal..

[16]  Alan Fern,et al.  Probabilistic event logic for interval-based event recognition , 2011, CVPR 2011.

[17]  Stephen J. Maybank,et al.  Learning Human Actions by Combining Global Dynamics and Local Appearance , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Yiannis Kompatsiaris,et al.  Activity detection using Sequential Statistical Boundary Detection (SSBD) , 2016, Comput. Vis. Image Underst..

[19]  Ramakant Nevatia,et al.  Evaluating multimedia features and fusion for example-based event detection , 2013, Machine Vision and Applications.

[20]  Ramakant Nevatia,et al.  Evaluating multimedia features and fusion for example-based event detection , 2013, Machine Vision and Applications.

[21]  Boris Motik,et al.  OWL 2: The next step for OWL , 2008, J. Web Semant..

[22]  Bernadette Dorizzi,et al.  A pervasive multi-sensor data fusion for smart home healthcare monitoring , 2011, 2011 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2011).

[23]  Alexander Artikis,et al.  An Event Calculus for Event Recognition , 2015, IEEE Transactions on Knowledge and Data Engineering.

[24]  Diane J. Cook,et al.  Activity recognition on streaming sensor data , 2014, Pervasive Mob. Comput..

[25]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[26]  Cordelia Schmid,et al.  Human Focused Action Localization in Video , 2010, ECCV Workshops.

[27]  Dong Liu,et al.  Discovering joint audio–visual codewords for video event detection , 2013, Machine Vision and Applications.

[28]  François Brémond,et al.  Automatic Video Interpretation: A Novel Algorithm for Temporal Scenario Recognition , 2003, IJCAI.

[29]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[30]  Jenny Benois-Pineau,et al.  Fusion of Multiple Visual Cues for Visual Saliency Extraction from Wearable Camera Settings with Strong Motion , 2012, ECCV Workshops.

[31]  Georgios Meditskos,et al.  Knowledge-Driven Activity Recognition and Segmentation Using Context Connections , 2014, International Semantic Web Conference.

[32]  François Brémond,et al.  Evaluation of a monitoring system for event recognition of older people , 2013, 2013 10th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[33]  José María Martínez Sanchez,et al.  A semantic-based probabilistic approach for real-time video event recognition , 2012, Comput. Vis. Image Underst..

[34]  Michel Vacher,et al.  Introducing knowledge in the process of supervised classification of activities of Daily Living in Health Smart Homes , 2010, The 12th IEEE International Conference on e-Health Networking, Applications and Services.

[35]  Linmi Tao,et al.  An Event-driven Context Model in Elderly Health Monitoring , 2009, 2009 Symposia and Workshops on Ubiquitous, Autonomic and Trusted Computing.

[36]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[37]  Amit K. Roy-Chowdhury,et al.  Context-Aware Activity Modeling Using Hierarchical Conditional Random Fields , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[39]  Qiang Ji,et al.  A Hierarchical Context Model for Event Recognition in Surveillance Video , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.