Multiresolution alignment for multiple unsynchronized audio sequences using sequential Monte Carlo samplers

Abstract It is increasingly more common that an occasion is recorded by multiple individuals with the proliferation of recording devices such as smart phones. When properly aligned, these recordings may provide several audio and visual perspectives to a scene which leads to several applications in restoring, remastering and remixing frameworks in various fields. In this work, we propose a multiresolution alignment algorithm for aligning multiple unsynchronized audio sequences using Sequential Monte Carlo samplers. We employ a model based approach and a score function analogous to similarity based methods. The optimum alignments are obtained in a course to fine structure with multiresolution sampling and a heuristic sequential search method. The proposed method is evaluated with a real-life dataset from Jiku Mobile Video Datasets. The simulation results suggest that our method is competitive with the baseline methods in terms of accuracy with suitable choice of parameters.

[1]  Jean-Marie Cornuet,et al.  Adaptive Multiple Importance Sampling , 2009, 0907.1254.

[2]  Avery Wang,et al.  An Industrial Strength Audio Search Algorithm , 2003, ISMIR.

[3]  Mor Naaman,et al.  Making a scene: alignment of complete sets of clips based on pairwise audio match , 2012, ICMR '12.

[4]  Ali Taylan Cemgil,et al.  Model based multiple audio sequence alignment , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[5]  Ali Taylan Cemgil,et al.  A Probabilistic Model-Based Approach for Aligning Multiple Audio Sequences , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[6]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[7]  Kouta Minamizawa,et al.  Interactive instant replay: sharing sports experience using 360-degrees spherical images and haptic sensation based on the coupled body motion , 2015, AH.

[8]  Jean-Michel Marin,et al.  Adaptive importance sampling in general mixture classes , 2007, Stat. Comput..

[9]  Meinard Müller,et al.  Audio Matching via Chroma-Based Statistical Features , 2005, ISMIR.

[10]  Paris Smaragdis,et al.  Clustering and synchronizing multi-camera video via landmark cross-correlation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Wei Tsang Ooi,et al.  The jiku mobile video dataset , 2013, MMSys.

[12]  Andreas Stolcke,et al.  Robust and Efficient Multiple Alignment of Unsynchronized Meeting Recordings , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  Anil C. Kokaram,et al.  Temporal synchronization of multiple audio signals , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Jukka Corander,et al.  Layered adaptive importance sampling , 2015, Statistics and Computing.

[15]  Charalampos Dimoulas,et al.  Syncing Shared Multimedia through Audiovisual Bimodal Segmentation , 2015, IEEE MultiMedia.

[16]  Ton Kalker,et al.  A Highly Robust Audio Fingerprinting System , 2002, ISMIR.

[17]  Mark B. Sandler,et al.  A tutorial on onset detection in music signals , 2005, IEEE Transactions on Speech and Audio Processing.

[18]  Peter H. N. de With,et al.  Automatic mashup generation from multiple-camera concert recordings , 2010, ACM Multimedia.

[19]  Mathias Lux,et al.  A Synchronization Ground Truth for the Jiku Mobile Video Dataset , 2015, MMM.

[20]  Ali Taylan Cemgil,et al.  SMC samplers for multiresolution audio sequence alignment , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Mor Naaman,et al.  Less talk, more rock: automated organization of community-contributed collections of concert videos , 2009, WWW '09.

[22]  Mauro Barbieri,et al.  Synchronization of multi-camera video recordings based on audio , 2007, ACM Multimedia.

[23]  Hans Weda,et al.  Synchronization of Multiple Camera Videos Using Audio-Visual Features , 2010, IEEE Transactions on Multimedia.

[24]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[25]  Daniel P. W. Ellis,et al.  Audio fingerprinting to identify multiple videos of an event , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  Andrea Cavallaro,et al.  Discovery and organization of multi-camera user-generated videos of the same event , 2015, Inf. Sci..

[27]  Luca Martino,et al.  Effective sample size for importance sampling based on discrepancy measures , 2016, Signal Process..

[28]  Kaisa Väänänen,et al.  Automated creation of mobile video remixes: user trial in three event contexts , 2014, MUM.

[29]  Anil Alexander,et al.  MUSIC AND NOISE FINGERPRINTING AND REFERENCE CANCELLATION APPLIED TO FORENSIC AUDIO ENHANCEMENT , 2012 .