Exploiting Multimodality in Video Hyperlinking to Improve Target Diversity

Video hyperlinking is the process of creating links within a collection of videos to help navigation and information seeking. Starting from a given set of video segments, called anchors, a set of related segments, called targets, must be provided. In past years, a number of content-based approaches have been proposed with good results obtained by searching for target segments that are very similar to the anchor in terms of content and information. Unfortunately, relevance has been obtained to the expense of diversity. In this paper, we study multimodal approaches and their ability to provide a set of diverse yet relevant targets. We compare two recently introduced cross-modal approaches, namely, deep auto-encoders and bimodal LDA, and experimentally show that both provide significantly more diverse targets than a state-of-the-art baseline. Bimodal autoencoders offer the best trade-off between relevance and diversity, with bimodal LDA exhibiting slightly more diverse targets at a lower precision.

[1]  Rik Van de Walle,et al.  Multimedia information seeking through search and hyperlinking , 2013, ICMR.

[2]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[3]  Martha Larson,et al.  Multimodal Video-to-Video Linking: Turning to the Crowd for Insight and Evaluation , 2017, MMM.

[4]  Quoc-Minh Bui,et al.  LinkedTV at MediaEval 2014 Search and Hyperlinking Task , 2014, MediaEval.

[5]  Georges Quénot,et al.  TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[6]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[7]  Wesley De Neve,et al.  Ghent University-iMinds at MediaEval 2013: An Unsupervised Named Entity-based Similarity Measure for Search and Hyperlinking , 2013, MediaEval.

[8]  Pascale Sébillot,et al.  IRISA at MediaEval 2012: Search and Hyperlinking Task , 2012, MediaEval.

[9]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[10]  Marie-Francine Moens,et al.  Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings , 2015, SIGIR.

[11]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[12]  Jason Weston,et al.  Large scale image annotation: learning to rank with joint word-image embeddings , 2010, Machine Learning.

[13]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[14]  Ruifan Li,et al.  Cross-modal Retrieval with Correspondence Autoencoder , 2014, ACM Multimedia.

[15]  Wesley De Neve,et al.  Towards Named-Entity-based Similarity Measures: Challenges and Opportunities , 2014, ESAIR '14.

[16]  Maria Eskevich,et al.  The Search and Hyperlinking Task at MediaEval 2013 , 2013, MediaEval.

[17]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[18]  Karel Jezek,et al.  Comparing Semantic Models for Evaluating Automatic Document Summarization , 2015, TSD.

[19]  Guillaume Gravier,et al.  Multimodal and Crossmodal Representation Learning from Textual and Visual Features with Bidirectional Deep Neural Networks for Video Hyperlinking , 2016, iV&L-MM@MM.

[20]  Guillaume Gravier,et al.  IRISA at TrecVid 2015: Leveraging Multimodal LDA for Video Hyperlinking , 2015, TRECVID.

[21]  David Contreras,et al.  ORAND at TRECVID 2015: Instance Search and Video Hyperlinking Tasks , 2015, TRECVID.

[22]  Tinne Tuytelaars,et al.  Beyond Metadata: Searching Your Archive Based on its Audio-visual Content , 2014 .

[23]  Alexander G. Hauptmann,et al.  CMU-SMU@TRECVID 2015: Video Hyperlinking , 2015 .

[24]  Maryam Habibi,et al.  Idiap at MediaEval 2013: Search and Hyperlinking Task , 2013, MediaEval.

[25]  Marie-Francine Moens,et al.  Cross-language linking of news stories on the web using interlingual topic modelling , 2009, CIKM-SWSM.

[26]  Anca-Roxana Şimon,et al.  Semantic structuring of video collections fromspeech : segmentation and hyperlinking , 2015 .

[27]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[28]  Guillaume Gravier,et al.  Bidirectional Joint Representation Learning with Symmetrical Deep Neural Networks for Multimodal and Crossmodal Applications , 2016, ICMR.

[29]  Martin Krulis,et al.  CUNI at MediaEval 2014 Search and Hyperlinking Task: Visual and Prosodic Features in Hyperlinking , 2014, MediaEval.

[30]  Benoit Huet,et al.  Video hyperlinking , 2014, ACM Multimedia.

[31]  Vincent Ng,et al.  Conundrums in Unsupervised Keyphrase Extraction: Making Sense of the State-of-the-Art , 2010, COLING.