A new model driven architecture for deep learning-based multimodal lifelog retrieval

Nowadays, taking photos and recording our life are daily task for the majority of people. The recorded information helped to build several applications like the self-monitoring of activities, memory assistance and long-term assisted living. This trend, called lifelogging, interests a lot of research communities such as computer vision, machine learning, human-computer interaction, pervasive computing and multimedia. Great effort have been made in the acquisition and the storage of captured data but there are still challenges in managing, analyzing, indexing, retrieving, summarizing and visualizing these captured data. In this work, we present a new model driven architecture for deep learning-based multimodal lifelog retrieval, summarization and visualization. Our proposed approach is based on different models integrated in an architecture established on four phases. Based on Convolutional Neural Network, the first phase consists of data preprocessing for discarding noisy images. In a second step, we extract several features to enhance the data description. Then, we generate a semantic segmentation to limit the search area in order to better control the runtime and the complexity. The second phase consist in analyzing the query. The third phase which based on Relational Network aims at retrieving the data matching the query. The final phase treat the diversity-based summarization with k-means which offers, to lifelogger, a key-frame concept and context selection-based visualization.

[1]  José García Rodríguez,et al.  A Review on Deep Learning Techniques Applied to Semantic Segmentation , 2017, ArXiv.

[2]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[3]  Petia Radeva,et al.  Semantic Summarization of Egocentric Photo Stream Events , 2015, LTA@MM.

[4]  Jakob Eg Larsen,et al.  QS Spiral: Visualizing Periodic Quantified Self Data , 2013 .

[5]  Eugenio Culurciello,et al.  An Analysis of Deep Neural Network Models for Practical Applications , 2016, ArXiv.

[6]  Georges Quénot,et al.  LIG-MRIM at NTCIR-12 Lifelog Semantic Access Task , 2016, NTCIR.

[7]  Joo-Hwee Lim,et al.  Describing Lifelogs with Convolutional Neural Networks: A Comparative Study , 2016, LTA@MM.

[8]  Chokri Ben Amar,et al.  A New System for Event Detection from Video Surveillance Sequences , 2010, ACIVS.

[9]  Weiguo Fan,et al.  VTIR at the NTCIR-12 2016 Lifelog Semantic Access Task , 2016, NTCIR.

[10]  Chokri Ben Amar,et al.  Personalizing information retrieval: A new model for user preferences elicitation , 2016, 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[11]  A. Levine,et al.  New estimates of the storage permanence and ocean co-benefits of enhanced rock weathering , 2023, PNAS nexus.

[12]  Vigneshwaran Subbaraju,et al.  VC-I2R@ImageCLEF2017: Ensemble of Deep Learned Features for Lifelog Video Summarization , 2017, CLEF.

[13]  Victor S. Lempitsky,et al.  Neural Codes for Image Retrieval , 2014, ECCV.

[14]  Liang-Pu Chen,et al.  Image Searching by Events with Deep Learning for NTCIR-12 Lifelog , 2016, NTCIR.

[15]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[16]  Petia Radeva,et al.  Leveraging Activity Indexing for Egocentric Image Retrieval , 2017, IbPRIA.

[17]  Petia Radeva,et al.  Visual summary of egocentric photostreams by representative keyframes , 2015, 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[18]  Chokri Ben Amar,et al.  A Hypergraph-Based Reranking Model for Retrieving Diverse Social Images , 2017, CAIP.

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Xiangyu Zhang,et al.  Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yang Yang,et al.  User Interaction Templates for the Design of Lifelogging Systems , 2013 .

[22]  Chokri Ben Amar,et al.  Knowledge structures: Which one to use for the query disambiguation? , 2015, 2015 15th International Conference on Intelligent Systems Design and Applications (ISDA).

[23]  Sebastian Bosse,et al.  A deep neural network for image quality assessment , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[24]  Chokri Ben Amar,et al.  Facial expression recognition based on a mlp neural network using constructive training algorithm , 2014, Multimedia Tools and Applications.

[25]  Abigail Sellen,et al.  Beyond total capture , 2010, Commun. ACM.

[26]  Alan F. Smeaton,et al.  Constructing a SenseCam visual diary as a media process , 2008, Multimedia Systems.