Comparative study of methods for reducing dimensionality of MPEG-7 audio signature descriptors

We study how to reduce the dimensionality of the MPEG-7 audio signature descriptors in this paper. With the aid of the dimension-reduced descriptors, the comparison time for detecting copyrighted audio can be significantly reduced. The studied methods include block average, principal component analysis (PCA), Hadamard transform, Haar transform, and CDF (Cohen-Daubechies-Feauveau) 9/7 wavelet transform. For the latter four methods, we also examine whether different partition methods would affect the accuracy. The simulation results show that different reduction methods should use different partition strategies for best accuracy. In addition, we also compare the computational complexity of these methods. The experimental results show that, except the CDF 9/7 method, the rest four methods yield comparable accuracy for undistorted and MP-3 coded audio. When also considering the computational complexity, the block average method is a better choice.

[1]  Ton Kalker,et al.  A Highly Robust Audio Fingerprinting System , 2002, ISMIR.

[2]  Adnan Yazici,et al.  A flexible and scalable audio information retrieval system for mixed‐type audio signals , 2011, Int. J. Intell. Syst..

[3]  Oliver Hellmuth,et al.  Using MPEG-7 Audio Fingerprinting in Real-World Applications , 2003 .

[4]  Holger Crysandt,et al.  Music classification with MPEG-7 , 2003, IS&T/SPIE Electronic Imaging.

[5]  Radomir S. Stankovic,et al.  The Haar wavelet transform: its status and achievements , 2003, Comput. Electr. Eng..

[6]  Avery Wang,et al.  The Shazam music recognition service , 2006, CACM.

[7]  Holger Crysandt Music identification with MPEG-7 , 2003, IS&T/SPIE Electronic Imaging.

[8]  Shumeet Baluja,et al.  Audio Fingerprinting: Combining Computer Vision & Data Stream Processing , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[9]  Yo-Ping Huang,et al.  Query-by-Humming/Singing of MIDI and Audio Files by Fuzzy Inference System , 2012, 2012 Third FTRA International Conference on Mobile, Ubiquitous, and Intelligent Computing.

[10]  John C. Platt,et al.  Distortion discriminant analysis for audio fingerprinting , 2003, IEEE Trans. Speech Audio Process..

[11]  Xiangyang Wang,et al.  A robust content based audio watermarking using UDWT and invariant histogram , 2010, Multimedia Tools and Applications.

[12]  Gajanan K. Kharate,et al.  Face Recognition Based on PCA on Wavelet Subband of Average-Half-Face , 2012, J. Inf. Process. Syst..

[13]  Jonathon Shlens,et al.  A Tutorial on Principal Component Analysis , 2014, ArXiv.

[14]  Michael W. Marcellin,et al.  JPEG2000 - image compression fundamentals, standards and practice , 2013, The Kluwer international series in engineering and computer science.

[15]  Frank Nack,et al.  Everything You Wanted to Know About MPEG-7: Part 2 , 1999, IEEE Multim..

[16]  Shingchern D. You,et al.  Music Identification System Using MPEG-7 Audio Signature Descriptors , 2013, TheScientificWorldJournal.

[17]  Teruhiko Teraoka,et al.  Organization and exploration of heterogeneous personal data collected in daily life , 2012, Human-centric Computing and Information Sciences.

[18]  Frank Nack,et al.  Everything You Wanted to Know About MPEG-7: Part 1 , 1999, IEEE Multim..

[19]  Julien Bringer,et al.  Embedding edit distance to enable private keyword search , 2012, Human-centric Computing and Information Sciences.

[20]  Shingchern D. You,et al.  Dimension-Reduction Technique for MPEG-7 Audio Descriptors , 2005, PCM.

[21]  Reginald L. Lagendijk,et al.  On the comparison of audio fingerprints for extracting quality parameters of compressed audio , 2006, Electronic Imaging.

[22]  Jhing-Fa Wang,et al.  Personal Spoken Sentence Retrieval Using Two-Level Feature Matching and MPEG-7 Audio LLDs , 2009, J. Inf. Sci. Eng..