Holographic Feature Learning of Egocentric-Exocentric Videos for Multi-Domain Action Recognition

Though existing cross-domain action recognition methods successfully improve the performance on videos of one view (e.g., egocentric videos) by transferring the knowledge from videos of another view (e.g., exocentric videos), they have limitations in generality because the source and target domains need to be fixed aforehand. In this paper, we propose to solve a more practical task of multi-domain action recognition on egocentricexocentric videos. It aims to transfer knowledge between two domains by a single action recognition model, which can be directly applied on a test video from either of the two domains. To solve this task, we propose memory-based holographic feature learning framework to predict the action classes based on the holographic feature which contains the visual feature extracted from the input video and the complementary feature predicted for the opposite view. To compute the holographic feature, we design a dynamic meta-hallucination module to retrieve the complementary feature from a learnable dual-memory structure, which is built on the view-specific features of egocentric and exocentric videos by writing discriminative centroids of different action classes into the memory. We demonstrate the effectiveness of the proposed method with extensive experimental results on two public datasets. Moreover, the good performances under the semi-supervised setting show the generality of our model.