Dynamic and scalable audio classification by collective network of binary classifiers framework: An evolutionary approach

In this paper, we propose a novel framework based on a collective network of evolutionary binary classifiers (CNBC) to address the problems of feature and class scalability. The main goal of the proposed framework is to achieve a high classification performance over dynamic audio and video repositories. The proposed framework adopts a "Divide and Conquer" approach in which an individual network of binary classifiers (NBC) is allocated to discriminate each audio class. An evolutionary search is applied to find the best binary classifier in each NBC with respect to a given criterion. Through the incremental evolution sessions, the CNBC framework can dynamically adapt to each new incoming class or feature set without resorting to a full-scale re-training or re-configuration. Therefore, the CNBC framework is particularly designed for dynamically varying databases where no conventional static classifiers can adapt to such changes. In short, it is entirely a novel topology, an unprecedented approach for dynamic, content/data adaptive and scalable audio classification. A large set of audio features can be effectively used in the framework, where the CNBCs make appropriate selections and combinations so as to achieve the highest discrimination among individual audio classes. Experiments demonstrate a high classification accuracy (above 90%) and efficiency of the proposed framework over large and dynamic audio databases.

[1]  Trieu-Kien Truong,et al.  Audio classification and categorization based on wavelets and support vector Machine , 2005, IEEE Transactions on Speech and Audio Processing.

[2]  Andreas Spanias,et al.  Segmentation, Indexing, and Retrieval for Environmental and Natural Sounds , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Moncef Gabbouj,et al.  Evolutionary artificial neural networks by multi-dimensional particle swarm optimization , 2009, Neural Networks.

[4]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[5]  Moncef Gabbouj,et al.  A generic audio classification and segmentation approach for multimedia indexing and retrieval , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Stan Z. Li,et al.  Content-based audio classification and retrieval using the nearest feature line method , 2000, IEEE Trans. Speech Audio Process..

[7]  Lei Chen,et al.  Mixed Type Audio Classification with Support Vector Machine , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[8]  Douglas Keislar,et al.  Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  Stan Z. Li,et al.  Content-based Classification and Retrieval of Audio Using the Nearest Feature Line Method , 2000 .

[11]  Moncef Gabbouj,et al.  Content-based audio classification using collective network of binary classifiers , 2011, 2011 IEEE Workshop on Evolving and Adaptive Intelligent Systems (EAIS).

[12]  Adnan Yazici,et al.  Content-Based Classification and Segmentation of Mixed-Type Audio by Using MPEG-7 Features , 2009, 2009 First International Conference on Advances in Multimedia.

[13]  Michael R. Lyu,et al.  An Empirical Study on Large-Scale Content-Based Image Retrieval , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[14]  Ishwar K. Sethi,et al.  Classification of general audio data for content-based retrieval , 2001, Pattern Recognit. Lett..

[15]  P. Dhanalakshmi,et al.  Classification of audio signals using AANN and GMM , 2011, Appl. Soft Comput..

[16]  Vikramjit Mitra,et al.  Content based audio classification: a neural network approach , 2008, Soft Comput..

[17]  Kuansan Wang,et al.  Self-normalization and noise-robustness in early auditory representations , 1994, IEEE Trans. Speech Audio Process..

[18]  Chung-Hsien Wu,et al.  Multiple change-point audio segmentation and classification using an MDL-based Gaussian model , 2006, IEEE Trans. Speech Audio Process..

[19]  Rajeev Motwani,et al.  Randomized algorithms , 1996, CSUR.

[20]  Geoffroy Peeters A GENERIC SYSTEM FOR AUDIO INDEXING: APPLICATION TO SPEECH/ MUSIC SEGMENTATION AND MUSIC GENRE RECOGNITION , 2007 .

[21]  Guodong Guo,et al.  Content-based audio classification and retrieval by support vector machines , 2003, IEEE Trans. Neural Networks.

[22]  Qiang Huang,et al.  SVM-Based Audio Classification for Content- Based Multimedia Retrieval , 2007, MCAM.

[23]  Gaël Richard,et al.  Audio Signal Representations for Indexing in the Transform Domain , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Masataka Goto,et al.  RWC Music Database: Popular, Classical and Jazz Music Databases , 2002, ISMIR.

[26]  Yong Yao,et al.  An Unsupervised Audio Segmentation and Classification Approach , 2007, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007).

[27]  Liming Chen,et al.  A general audio classifier based on human perception motivated model , 2007, Multimedia Tools and Applications.

[28]  Ulrich H.-G. Kreßel,et al.  Pairwise classification and support vector machines , 1999 .

[29]  Jamie Bullock,et al.  Libxtract: a Lightweight Library for audio Feature Extraction , 2007, ICMC.

[30]  Wei Chu,et al.  A Noise-Robust Fft-Based Spectrum for Audio Classification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[31]  Yves Chauvin,et al.  Backpropagation: theory, architectures, and applications , 1995 .

[32]  James Kennedy,et al.  Particle swarm optimization , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[33]  Ronald W. Schafer,et al.  Introduction to Digital Speech Processing , 2007, Found. Trends Signal Process..

[34]  Moncef Gabbouj,et al.  Fractional Particle Swarm Optimization in Multidimensional Search Space , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).