Discrimination and retrieval of animal sounds

Until recently few research has been performed in the area of animal sound retrieval. The authors identify state-of-the-art techniques in general purpose sound recognition by a broad survey of literature. Based on the findings, this paper gives a thorough investigation of audio features and classifiers and their applicability in the domain of animal sounds. We introduce a set of novel audio descriptors and compare their quality to other popular features. The results are encouraging and motivate further research in this domain

[1]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[2]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[3]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[4]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[5]  Michael J. Carey,et al.  A comparison of features for speech, music discrimination , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[6]  Yi-Ping Phoebe Chen,et al.  The power of play-break for automatic detection and browsing of self-consumable sport video highlights , 2004, MIR '04.

[7]  Thomas Sikora,et al.  Audio classification based on MPEG-7 spectral basis representations , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Elias Pampalk A Matlab Toolbox to Compute Music Similarity from Audio , 2004, ISMIR.

[9]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[10]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[11]  Kiyoharu Aizawa Digitizing Personal Experiences: Capture and Retrieval of Life Log , 2005, MMM.

[12]  Michael A. Cowling,et al.  Non-Speech Environmental Sound Classification System for Autonomous Surveillance , 2004 .

[13]  J. Davenport Editor , 1960 .

[14]  Tsuhan Chen,et al.  Audio feature extraction and analysis for scene classification , 1997, Proceedings of First Signal Processing Society Workshop on Multimedia Signal Processing.

[15]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[16]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[17]  Satosi Watanabe,et al.  Pattern Recognition: Human and Mechanical , 1985 .

[18]  Douglas Keislar,et al.  Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[19]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[20]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[21]  Xuejing Sun,et al.  Pitch determination and voice quality analysis using Subharmonic-to-Harmonic Ratio , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  J. Hadamard Sur les problemes aux derive espartielles et leur signification physique , 1902 .

[23]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[24]  B. P. Bogert,et al.  The quefrency analysis of time series for echoes : cepstrum, pseudo-autocovariance, cross-cepstrum and saphe cracking , 1963 .

[25]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[26]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[27]  Kyu-Sik Park,et al.  Acoustic intruder detection system for home security , 2005, 2005 Digest of Technical Papers. International Conference on Consumer Electronics, 2005. ICCE..

[28]  B. Feiten,et al.  Automatic indexing of a sound database using self-organizing neural nets , 1994 .

[29]  P. C. Pandey,et al.  The Journal of the Acoustical Society of America , 1939 .

[30]  Jonathan Foote,et al.  Content-based retrieval of music and audio , 1997, Other Conferences.

[31]  Ingrid Daubechies,et al.  The wavelet transform, time-frequency localization and signal analysis , 1990, IEEE Trans. Inf. Theory.

[32]  Judith C. Brown Calculation of a constant Q spectral transform , 1991 .

[33]  M. Lamming,et al.  "Forget-me-not" Intimate Computing in Support of Human Memory , 1994 .

[34]  Teuvo Kohonen,et al.  Learning vector quantization , 1998 .

[35]  Youngmoo E. Kim,et al.  Musical instrument identification: A pattern‐recognition approach , 1998 .

[36]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[37]  Ronald W. Schafer,et al.  Digital Processing of Speech Signals , 1978 .

[38]  C.-C. Jay Kuo,et al.  Audio content analysis for online audiovisual data segmentation and classification , 2001, IEEE Trans. Speech Audio Process..

[39]  Kaamran Raahemifar,et al.  Content based audio classification and retrieval using joint time-frequency analysis , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[40]  Guodong Guo,et al.  Content-based audio classification and retrieval by support vector machines , 2003, IEEE Trans. Neural Networks.

[41]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[42]  Anil C. Kokaram,et al.  A Wavelet Packet representation of audio signals for music genre classification using different ensemble and feature selection techniques , 2003, MIR '03.

[43]  Brian Christopher Smith,et al.  Query by humming: musical information retrieval in an audio database , 1995, MULTIMEDIA '95.

[44]  Mohan S. Kankanhalli,et al.  Content-based music structure analysis with applications to music semantics understanding , 2004, MULTIMEDIA '04.

[45]  Anoop Gupta,et al.  Automatically extracting highlights for TV Baseball programs , 2000, ACM Multimedia.

[46]  B. Kedem,et al.  Spectral analysis and discrimination by zero-crossings , 1986, Proceedings of the IEEE.

[47]  Changsheng Xu,et al.  Audio keyword generation for sports video analysis , 2004, MULTIMEDIA '04.

[48]  Y.K. Muthusamy,et al.  Reviewing automatic language identification , 1994, IEEE Signal Processing Magazine.

[49]  Chunru Wan,et al.  Feature selection for automatic classification of musical instrument sounds , 2001, JCDL '01.

[50]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[51]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[52]  C.-C. Jay Kuo,et al.  Hierarchical classification of audio data for archiving and retrieving , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[53]  Horst M. Eidenberger New perspective on visual information retrieval , 2003, IS&T/SPIE Electronic Imaging.