Fast second screen TV synchronization combining audio fingerprint technique and generalized cross correlation

For the implementation of emerging second screen TV applications, there is a need for a technique to assure fast and accurate synchronization of media components streamed over different networks to different rendering devices. One approach of great value is to exploit the unmodified audio stream of the original media, and compare it to a reference version. We consider two major approaches for this purpose, namely finger-printing techniques and generalized cross correlation, where the former can greatly reduce computational cost and the latter can offer sample-accurate synchronization. We propose an approach combining these two techniques where coarse frame-accurate synchronization positions are first found by fingerprint matching, then a possible accurate synchronization position is verified by generalized cross correlation with phase transform (GCC-PHAT). Experimental results in a real-world setting confirm the accuracy and rapidity of the proposed approach.

[1]  Ton Kalker,et al.  A Highly Robust Audio Fingerprinting System , 2002, ISMIR.

[2]  L. K. Hansen,et al.  Synchronization and comparison of Lifelog audio recordings , 2008, 2008 IEEE Workshop on Machine Learning for Signal Processing.

[3]  Emmanuel Vincent,et al.  Multi-source TDOA estimation in reverberant audio using angular spectra and clustering , 2012, Signal Process..

[4]  Derek Hoiem,et al.  Computer vision for music identification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  David A. Ross,et al.  Survey and Evaluation of Audio Fingerprinting Schemes for Mobile Query-by-Example Applications , 2011, ISMIR.

[6]  Avery Wang,et al.  An Industrial Strength Audio Search Algorithm , 2003, ISMIR.

[7]  Pedro Cano,et al.  A review of algorithms for audio fingerprinting , 2002, 2002 IEEE Workshop on Multimedia Signal Processing..

[8]  Michael Fink,et al.  Social- and Interactive-Television Applications Based on Real-Time Ambient-Audio Identification , 2006 .

[9]  Nuria Oliver,et al.  MuViSync: Realtime music video alignment , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[10]  Hyoung Joong Kim,et al.  Audio Watermarking Techniques , 2004 .

[11]  Christopher Howson,et al.  Second screen TV synchronization , 2011, 2011 IEEE International Conference on Consumer Electronics -Berlin (ICCE-Berlin).

[12]  Joseph Kardamis Audio watermarking techniques using singular value decomposition , 2007 .

[13]  Michael S. Brandstein,et al.  A robust method for speech signal time-delay estimation in reverberant rooms , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  M. Omologo,et al.  Comparison Between Different Sound Source Localization Techniques Based on a Real Data Collection , 2008, 2008 Hands-Free Speech Communication and Microphone Arrays.