Audio Signal Processing Using Time-Frequency Approaches: Coding, Classification, Fingerprinting, and Watermarking

Audio signals are information rich nonstationary signals that play an important role in our day-to-day communication, perception of environment, and entertainment. Due to its non-stationary nature, time- or frequency-only approaches are inadequate in analyzing these signals. A joint time-frequency (TF) approach would be a better choice to efficiently process these signals. In this digital era, compression, intelligent indexing for content-based retrieval, classification, and protection of digital audio content are few of the areas that encapsulate a majority of the audio signal processing applications. In this paper, we present a comprehensive array of TF methodologies that successfully address applications in all of the above mentioned areas. A TF-based audio coding scheme with novel psychoacoustics model, music classification, audio classification of environmental sounds, audio fingerprinting, and audio watermarking will be presented to demonstrate the advantages of using time-frequency approaches in analyzing and extracting information from audio signals.

[1]  Irena Orovic,et al.  An Application of Multidimensional Time-Frequency Analysis as a Base for the Unified Watermarking Approach , 2010, IEEE Transactions on Image Processing.

[2]  Jin Woo Hong,et al.  Audio watermarking for copyright protection of digital audio data , 2001 .

[3]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[4]  P. Tse,et al.  An improved Hilbert–Huang transform and its application in vibration signal analysis , 2005 .

[5]  Patrick J. Loughlin,et al.  Time-frequency-based classification , 1996, Optics & Photonics.

[6]  Dale Groutage,et al.  Feature sets for nonstationary signals derived from moments of the singular value decomposition of Cohen-Posch (positive time-frequency) distributions , 2000, IEEE Trans. Signal Process..

[7]  Hrishikesh Deshpande,et al.  CLASSIFICATION OF MUSIC SIGNALS IN THE VISUAL DOMAIN , 2001 .

[8]  Paris Smaragdis,et al.  Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs , 2004, ICA.

[9]  R.D. Dony,et al.  Audio Environment Classication for Hearing Aids using Artificial Neural Networks with Windowed Input , 2007, 2007 IEEE Symposium on Computational Intelligence in Image and Signal Processing.

[10]  Karthikeyan Umapathy,et al.  Audio Signal Feature Extraction and Classification Using Local Discriminant Bases , 2004, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Michael Arnold Audio watermarking: features, applications and algorithms , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[12]  Georgios Tziritas,et al.  A speech/music discriminator based on RMS and zero-crossings , 2005, IEEE Transactions on Multimedia.

[13]  Sridhar Krishnan,et al.  Audio feature clustering for hearing aid systems , 2009, 2009 IEEE Toronto International Conference Science and Technology for Humanity (TIC-STH).

[14]  Nazim Fatès,et al.  StirMark benchmark: audio watermarking attacks , 2001, Proceedings International Conference on Information Technology: Coding and Computing.

[15]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[16]  A. Spanias,et al.  Perceptual coding of digital audio , 2000, Proceedings of the IEEE.

[17]  Karthikeyan Umapathy,et al.  Multigroup classification of audio signals using time-frequency parameters , 2005, IEEE Transactions on Multimedia.

[18]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[19]  Thomas Rydén Using Listening Tests to Assess Audio Codecs , 1996 .

[20]  Chih-Jen Lin,et al.  Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[21]  M. J. Norušis,et al.  SPSS advanced statistics user's guide , 1990 .

[22]  Thomas Sikora,et al.  Audio classification based on MPEG-7 spectral basis representations , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[23]  Sridhar Krishnan,et al.  Chirp-Based Image Watermarking as Error-Control Coding , 2006, 2006 International Conference on Intelligent Information Hiding and Multimedia.

[24]  Karthikeyan Umapathy,et al.  Audio Coding and Classification: Principles and Algorithms , 2009 .

[25]  Edward A. Lee,et al.  Adaptive Signal Models: Theory, Algorithms, and Audio Applications , 1998 .

[26]  Hagen Soltau,et al.  Recognition of music types , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[27]  S. Aign,et al.  Overview of the MPEG-4 Standard and Error Resilience Investigations , 1998 .

[28]  Sridhar Krishnan Instantaneous mean frequency estimation using adaptive time-frequency distributions , 2001, Canadian Conference on Electrical and Computer Engineering 2001. Conference Proceedings (Cat. No.01TH8555).

[29]  Michael M. Goodwin,et al.  Adaptive Signal Models , 1998 .

[30]  Eric Allamanche,et al.  Content-based Identification of Audio Material Using MPEG-7 Low Level Description , 2001, ISMIR.

[31]  Karthikeyan Umapathy,et al.  Perceptual Coding of Audio Signals Using Adaptive Time-Frequency Transform , 2006, EURASIP J. Audio Speech Music. Process..

[32]  Ioan Buciu,et al.  Non-negative Matrix Factorization, A New Tool for Feature Extraction: Theory and Applications , 2008 .

[33]  Sridhar Krishnan,et al.  A Joint Time-Frequency and Matrix Decomposition Feature Extraction Methodology for Pathological Voice Classification , 2009, EURASIP J. Adv. Signal Process..

[34]  E. Owens Introduction to the Psychology of Hearing , 1977 .

[35]  Marina Bosi,et al.  ISO/IEC MPEG-2 Advanced Audio Coding: Overview and Applications , 1997 .

[36]  Matthew Cooper,et al.  Summarizing popular music via structural similarity analysis , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[37]  Lie Lu,et al.  Content analysis for audio classification and segmentation , 2002, IEEE Trans. Speech Audio Process..

[38]  Michael W. Berry,et al.  Algorithms and applications for approximate nonnegative matrix factorization , 2007, Comput. Stat. Data Anal..

[39]  Guodong Guo,et al.  Content-based audio classification and retrieval by support vector machines , 2003, IEEE Trans. Neural Networks.

[40]  Ioannis Pitas,et al.  Recent advances in biometric person authentication , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[41]  Sridhar Krishnan,et al.  Discrete Polynomial Transform for Digital Imagewatermarking Application , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[42]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[43]  Leon Cohen,et al.  Positive time-frequency distribution functions , 1985, IEEE Trans. Acoust. Speech Signal Process..

[44]  S. Krishnan,et al.  Quantification and localization of features in time-frequency plane , 2008, 2008 Canadian Conference on Electrical and Computer Engineering.

[45]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[46]  Yannis Stylianou,et al.  Musical Genre Classification Using Nonnegative Matrix Factorization-Based Features , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[47]  Queen Mary MUSICAL AUDIO STREAM SEPARATION BY NON-NEGATIVE MATRIX FACTORIZATION , 2005 .

[48]  P. Laguna,et al.  Signal Processing , 2002, Yearbook of Medical Informatics.

[49]  L. Cohen,et al.  Time-frequency distributions-a review , 1989, Proc. IEEE.

[50]  Ingrid Daubechies,et al.  The wavelet transform, time-frequency localization and signal analysis , 1990, IEEE Trans. Inf. Theory.

[51]  A. Zoubir,et al.  EURASIP Journal on Advances in Signal Processing , 2011 .

[52]  Rémi Gribonval,et al.  Fast matching pursuit with a multiscale dictionary of Gaussian chirps , 2001, IEEE Trans. Signal Process..

[53]  Sacha Krstulovic,et al.  Mptk: Matching Pursuit Made Tractable , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[54]  I. Paraskevas,et al.  Audio classification using acoustic images for retrieval from multimedia databases , 2003, Proceedings EC-VIP-MC 2003. 4th EURASIP Conference focused on Video/Image Processing and Multimedia Communications (IEEE Cat. No.03EX667).

[55]  C.-C. Jay Kuo,et al.  Environmental sound recognition using MP-based features , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[56]  Wen-Nung Lie,et al.  Robust and high-quality time-domain audio watermarking subject to psychoacoustic masking , 2001, ISCAS 2001. The 2001 IEEE International Symposium on Circuits and Systems (Cat. No.01CH37196).

[57]  Changsheng Xu,et al.  Automatic music classification and summarization , 2005, IEEE Transactions on Speech and Audio Processing.

[58]  S. Krishnan,et al.  A Robust Audio Watermark Representation Based on Linear Chirps , 2006, IEEE Transactions on Multimedia.

[59]  Irena Orovic,et al.  Robust Speech Watermarking Procedure in the Time-Frequency Domain , 2008, EURASIP J. Adv. Signal Process..

[60]  Ernst Eberlein,et al.  Second-Generation ISO/MPEG-Audio Layer III Coding , 1995 .

[61]  S. Mallat A wavelet tour of signal processing , 1998 .

[62]  Rangaraj M. Rangayyan,et al.  Feature identification in the time-frequency plane by using the Hough-Radon transform , 2001, Pattern Recognit..

[63]  K. Raahemifar,et al.  Audio watermarking time-frequency characteristics , 2003, Canadian Journal of Electrical and Computer Engineering.

[64]  Stefan Meltzer,et al.  MPEG-4 HE-AAc v2 - audio coding for today’s media world , 2005 .

[65]  William J. Williams,et al.  Improved time-frequency representation of multicomponent signals using exponential kernels , 1989, IEEE Trans. Acoust. Speech Signal Process..

[66]  Ahmed H. Tewfik,et al.  Current state of the art, challenges and future directions for audio watermarking , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[67]  L. Cazzanti,et al.  Automatic identification of sound recordings , 2004, IEEE Signal Processing Magazine.

[68]  John C. Platt,et al.  Distortion discriminant analysis for audio fingerprinting , 2003, IEEE Trans. Speech Audio Process..

[69]  Ioan Buciu A Letter From the Associate Executive Editor , 2008, Int. J. Comput. Commun. Control.

[70]  Thierry Pun,et al.  Second Generation Benchmarking and Application Oriented Evaluation , 2001, Information Hiding.

[71]  Kaamran Raahemifar,et al.  Content based audio classification and retrieval using joint time-frequency analysis , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[72]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[73]  Sophocles J. Orfanidis,et al.  Introduction to signal processing , 1995 .

[74]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..