论文信息 - Effective browsing of long audio recordings

Effective browsing of long audio recordings

Timeliner is a browser for long audio recordings and features that it derives from such recordings. Features can be either signal-based, like spectrograms, or model-based, like categorical classifiers. Unlike conventional audio editors, Timeliner pans and zooms smoothly across many orders of magnitude, from days-long overviews to millisecond-scale details, with zero latency, zero flicker, and low CPU load. Also, to suggest which details are worth zooming in to examine, Timeliner's agglomerative hierarchical caches propagate feature-specific details up to wider zoom levels. Because these details are not averaged away, "big data" can be browsed rapidly and effectively. Several studies demonstrate this.

Camille Goudeseune

[1] Sylvain Meignier,et al. LIUM SPKDIARIZATION: AN OPEN SOURCE TOOLKIT FOR DIARIZATION , 2010 .

[2] Eitan Grinspun,et al. Multiscale texture synthesis , 2008, ACM Trans. Graph..

[3] Paul S. Heckbert,et al. Fundamentals of Texture Mapping and Image Warping , 1989 .

[4] Barry Arons,et al. SpeechSkimmer: a system for interactively skimming recorded speech , 1997, TCHI.

[5] Thomas S. Huang,et al. Feature analysis and selection for acoustic event detection , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6] Marijn Huijbregts,et al. Segmentation, diarization and speech transcription : surprise data unraveled , 2008 .

[7] P. Mermelstein,et al. Distance measures for speech recognition, psychological and instrumental , 1976 .

[8] Steve Young,et al. The HTK book , 1995 .

[9] Xu Chen,et al. Comparative performance analysis of time-frequency distributions for spectroscopic optical coherence tomography. , 2004, Applied optics.

[10] R. Shepard. Circularity in Judgments of Relative Pitch , 1964 .

[11] Rik Van de Walle,et al. A new approach to combine texture compression and filtering , 2011, The Visual Computer.

[12] Thomas S. Huang,et al. Improving faster-than-real-time human acoustic event detection by saliency-maximized audio visualization , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13] Jui Ting Huang,et al. Multimodal speech and audio user interfaces for K-12 outreach , 2011 .

[14] Adrien Bousseau,et al. Dynamic solid textures for real-time coherent stylization , 2009, I3D '09.

[15] Thomas S. Huang,et al. Real-world acoustic event detection , 2010, Pattern Recognit. Lett..

[16] Lance Williams,et al. Pyramidal parametrics , 1983, SIGGRAPH.

[17] Rosen Glenn. Global Exploratory Analysis of Massive Neuroimaging Collections using Microsoft Live Labs Pivot and Silverlight , 2010 .