Specific Selection of FFT Amplitudes from Audio Sports and News Broadcasting for Classification Purposes

In this paper we investigate the problem of classification between sports and news broadcasting. We detect and classify files that consist of speech and music or background noise (news broadcasting), and speech and a noisy background (sports broadcasting). More specifically, this study investigates feature extraction and training and classification procedures. We compare the Average Magnitude Difference Function (AMDF) method, which we consider more robust to background noise, with a novel proposed method. This method uses several spectral audio features which may be considered as specific semantic information. We base the extraction of these features on the theory of computational geometry using an Onion Algorithm (OA). We tested the classification procedure as well as the learning ability of the two methods using a Learning Vector Quantizer One (LVQ1) neural network. The results of the experiment showed that the OA method has a faster learning procedure, which we characterise as an accurate feature extraction method for several audio cases.

[1]  Vassilios Chrissikopoulos,et al.  A Text Categorization Technique based on a Numerical Conversion of a Symbolic Expression and an Onion Layers Algorithm , 2006, J. Digit. Inf..

[2]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[3]  Jorma Laaksonen,et al.  Variants of self-organizing maps , 1990, International 1989 Joint Conference on Neural Networks.

[4]  Marios Poulos,et al.  Person identification via the EEG using computational geometry algorithms , 1998, 9th European Signal Processing Conference (EUSIPCO 1998).

[5]  M. Ross,et al.  Average magnitude difference function pitch extractor , 1974 .

[6]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[7]  B. Chandrasekaran,et al.  On dimensionality and sample size in statistical pattern classification , 1971, Pattern Recognit..

[8]  M. POULOS,et al.  Computational Geometry Algorithms in an Educational Intelligent Scenario Management System , 2003 .

[9]  Daniele Falavigna,et al.  Maximum likelihood endpoint detection with time-domain features , 2003, INTERSPEECH.

[10]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[11]  Marios Poulos,et al.  FINGERPRINT VERIFICATION BASED ON IMAGE PROCESSING SEGMENTATION USING AN ONION ALGORITHM OF COMPUTATIONAL GEOMETRY , 2004 .

[12]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals , 1983 .

[13]  Tsuhan Chen,et al.  Audio feature extraction and analysis for scene classification , 1997, Proceedings of First Signal Processing Society Workshop on Multimedia Signal Processing.

[14]  M. POULOS,et al.  Comparison between Computational Geometry and Coherence Methods applied to the EEG for Medical Diagnostic Purposes , 2003 .

[15]  Lie Lu,et al.  A robust audio classification and segmentation method , 2001, MULTIMEDIA '01.

[16]  Prosenjit Bose,et al.  No Quadrangulation is Extremely Odd , 1995, ISAAC.

[17]  Barry Arons,et al.  SpeechSkimmer: a system for interactively skimming recorded speech , 1997, TCHI.

[18]  Alon Orlitsky,et al.  Supervised dimensionality reduction using mixture models , 2005, ICML.

[19]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Robert P. W. Duin,et al.  Exploring the Capacity of Simple Neural Networks , 2007 .

[21]  J. R. Deller,et al.  Least-square identification with error bounds for real-time signal processing and control , 1993, Proc. IEEE.

[22]  Stephen W. Smoliar,et al.  Content based video indexing and retrieval , 1994, IEEE MultiMedia.

[23]  Ronald L. Graham,et al.  An Efficient Algorithm for Determining the Convex Hull of a Finite Planar Set , 1972, Inf. Process. Lett..

[24]  Marios Poulos,et al.  Anti-Spam Filtering using Computational Geometry , 2004 .

[25]  Mikko Kurimo,et al.  Status Report Of The Finnish Phonetic Typewriter Project , 1991 .

[26]  M Poulos,et al.  Person Identification from the EEG using Nonlinear Signal Classification , 2002, Methods of Information in Medicine.

[27]  Anil K. Jain,et al.  On the optimal number of features in the classification of multivariate Gaussian data , 1978, Pattern Recognit..

[28]  Alexander G. Hauptmann,et al.  Informedia: news-on-demand multimedia information acquisition and retrieval , 1997 .

[29]  Ketan Dalal,et al.  Counting the onion , 2004, Random Struct. Algorithms.

[30]  George Tzanetakis,et al.  BUILDING AUDIO CLASSIFIERS FOR BROADCAST NEWS RETRIEVAL , 2003 .

[31]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[32]  Vassilios Chrissikopoulos,et al.  Parametric person identification from the EEG using computational geometry , 1999, ICECS'99. Proceedings of ICECS '99. 6th IEEE International Conference on Electronics, Circuits and Systems (Cat. No.99EX357).

[33]  Mark J. T. Smith,et al.  Three-dimensional subband coding techniques for wireless video communications , 2002, IEEE Trans. Circuits Syst. Video Technol..

[34]  Joseph O'Rourke,et al.  Computational Geometry in C. , 1995 .