Lifelog Moment Retrieval with Self-Attention based Joint Embedding Model

With the swift growth of technology, personal devices like cameras or healthcare sensors are more and more approachable, and many people use these devices to record their daily lives. So there is an increasing need for exploiting that enormous amount of data to understand more about how people live their lives. Thus, we introduce a novel interactive system to retrieve specific moments utilizing textbased queries. We propose Self-Attention based Joint Embedding Model (SAJEM) for that purpose. In our proposed method, we first extract visual and text features, then map them to a single common space, and calculate cosine distance for ranking. Besides, our system has two more auxiliary components using ResNet152 features and metadata of images to help users extend their query results. We also design a web application with an easy-to-use user interface to visualize and retrieve lifelog data. With this solution, we achieve the first rank in Lifelog Moment Retrieval task of ImageCLEF Lifelog 2020 with F1@10 score of 0.811.

[1]  Hsin-Hsi Chen,et al.  An Interactive Approach to Integrating External Textual Knowledge for Multimodal Lifelog Retrieval , 2019, LSC '19.

[2]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[3]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[4]  Jakub Lokoc,et al.  Enhanced VIRET Tool for Lifelog Data , 2019, LSC '19.

[5]  Antonio,et al.  ImageCLEF 2020: Multimedia Retrieval in Lifelogging, Medical, Nature, and Internet Applications , 2020, ECIR.

[6]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[7]  Heiko Schuldt,et al.  Retrieval of Structured and Unstructured Data with vitrivr , 2019, LSC@ICMR.

[8]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[9]  Minh-Triet Tran,et al.  Overview of ImageCLEF Lifelog 2020: Lifelog Moment Retrieval and Sport Performance Lifelog , 2020, CLEF.

[10]  Cathal Gurrin,et al.  VieLens,: An Interactive Search engine for LSC2019 , 2019, LSC '19.

[11]  Cathal Gurrin,et al.  Virtual Reality Lifelog Explorer: Lifelog Search Challenge at ACM ICMR 2018 , 2018, LSC@ICMR.

[12]  Vinh-Tiep Nguyen,et al.  Smart Lifelog Retrieval System with Habit-based Concepts and Moment Visualization , 2019, LSC '19.

[13]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  David J. Fleet,et al.  VSE++: Improving Visual-Semantic Embeddings with Hard Negatives , 2017, BMVC.

[15]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[16]  Minh-Triet Tran,et al.  LifeSeeker: Interactive Lifelog Search Engine at LSC 2019 , 2019, LSC@ICMR.

[17]  Lei Zhang,et al.  Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[19]  Minh-Triet Tran,et al.  Social Relation Trait Discovery from Visual LifeLog Data with Facial Multi-Attribute Framework , 2018, ICPRAM.

[20]  Yiqun Liu,et al.  A Two-Level Lifelog Search Engine at the LSC 2019 , 2019, LSC '19.

[21]  Klaus Schöffmann,et al.  lifeXplore at the Lifelog Search Challenge 2018 , 2018, LSC@ICMR.