论文信息 - VisPod: Content-Based Audio Visual Navigation

VisPod: Content-Based Audio Visual Navigation

Current audio player interfaces generally provide brief information such as title and duration time and support basic playback control functions. These features alone are not sufficient for certain user tasks, such as quickly finding a previously-visited location or browsing the main topics covered in the audio content. We present VisPod, a visual audio player that visually displays the main topics and keywords extracted from the transcript. VisPod supports (1) audio content browsing, (2) topic-based and keyword-based navigation, (3) communication of transcript and speaker information in real time, and (4) content-based query. VisPod encodes audio as a donut chart comprised of topic segments, and uses text processing algorithms to segment the transcript into independent topics and utilizes a deep learning model to generate human-readable topic names. An informal study suggests users prefer VisPod over traditional audio playback approaches specifically with regards to its benefits for audio browsing and navigation.

Nitesh V. Chawla | Shuai He | Suwen Lin | Ronald A. Metoyer | Qiyu Zhi

[1] Mark Liberman,et al. Speaker identification on the SCOTUS corpus , 2008 .

[2] Marti A. Hearst. Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[3] Konstantin Lopyrev,et al. Generating News Headlines with Recurrent Neural Networks , 2015, ArXiv.

[4] Rada Mihalcea,et al. TextRank: Bringing Order into Text , 2004, EMNLP.