论文信息 - Proceedings of the Third Edition Workshop on Speech, Language & Audio in Multimedia

Proceedings of the Third Edition Workshop on Speech, Language & Audio in Multimedia

Welcome to SLAM 2015 in Brisbane, Australia! SLAM 2015 is the third edition of the series of SLAM workshops, with worldwide leading protagonists in the field of speech, language and audio processing applied to multimedia material or in a multimedia context. From the very beginning, the workshop is steered and patronized by the Special Interest Group of the International Speech Communication Association on Speech and Language in Multimedia. This year's edition follows this tradition. SLAM is by nature interdisciplinary, existing at the intersection of multiple scientific communities: music and audio processing, speech processing, natural language processing and, of course, multimedia. After collocating the first two editions of SLAM with Interspeech, the premier international conference in the field of speech communication, we're very proud to hold SLAM 2015 with ACM Multimedia. This is in logical continuation from the preceding editions and reflects the fact that the focus of SLAM goes far beyond speech processing to genuinely account for the multiple facets of multimedia. Our long-term goal is to establish SLAM as a regular workshop, alternating between major speech and language conferences and major multimedia conferences, as a bridge between these domains. This year's edition is a first step in this direction and we are very grateful to ACM Multimedia General and Workshop chairs for their support in the development of SLAM in spite of possible interferences with the main conference. The program in 2015 covers a wide range of problems related to SLAM topics, with contributions related to music, speech, language but also computer vision. To emphasize the links between audio, speech, language and multimedia, the workshop features a special session on video hyperlinking, as recently introduced in international benchmark initiatives such as MediaEval or TRECVid. The multimodal nature of the video hyperlinking task makes it an emblematic case study where the speech and language modalities are perfectly complemented by audio and vision. The session gathers contributions where audio and natural language processing are used for video hyperlinking, possibly in conjunction with image processing and computer vision. A panel discussion focused on discussing the past, present and future of hyperlinking will conclude the workshop. This panel will aim at an understanding of which approaches are most promising and how they can be evaluated. The goal is to shape research directions at the crossroad of the scientific communities involved in SLAM and to nurture future implementations of video hyperlinking benchmarks.

Martha Larson | Gareth J. F. Jones | Guillaume Gravier | Roeland Ordelman