Joint localization and fingerprinting of sound sources for auditory scene analysis

In the field of scene understanding, researchers have mainly focused on using video/images to extract different elements in a scene. The computational as well as monetary cost associated with such implementations is high. This paper proposes a low-cost system which uses sound-based techniques in order to jointly perform localization as well as fingerprinting of the sound sources. A network of embedded nodes is used to sense the sound inputs. Phase-based sound localization and Support-Vector Machine classification are used to locate and classify elements of the scene, respectively. The fusion of all this data presents a complete “picture” of the scene. The proposed concepts are applied to a vehicular-traffic case study. Experiments show that the system has a fingerprinting accuracy of up to 97.5%, localization error less than 4 degrees and scene prediction accuracy of 100%.

[1]  Shrikanth Narayanan,et al.  Audio Scene Understanding using Topic Models , 2009, NIPS 2009.

[2]  Tomohiro Nakatani,et al.  Sound Ontology for Computational Auditory Scence Analysis , 1998, AAAI/IAAI.

[3]  Mauricio Kugler,et al.  A Complete Hardware Implementation of an Integrated Sound Localization and Classification System Based on Spiking Neural Networks , 2007, ICONIP.

[4]  Parham Aarabi,et al.  Robust sound localization in 0.18 /spl mu/m CMOS , 2005, IEEE Transactions on Signal Processing.

[5]  Volkan Cevher,et al.  Joint Acoustic-Video Fingerprinting of Vehicles, Part I , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[6]  Arye Nehorai,et al.  Wideband source localization using a distributed acoustic vector-sensor array , 2003, IEEE Trans. Signal Process..

[7]  B. Fei,et al.  Binary tree of SVM: a new fast multiclass training and classification algorithm , 2006, IEEE Transactions on Neural Networks.

[8]  Deborah Estrin,et al.  Coherent acoustic array processing and localization on wireless sensor networks , 2003, Proc. IEEE.

[9]  Manohar Das,et al.  An efficient technique for modeling and synthesis of automotive engine sounds , 2001, IEEE Trans. Ind. Electron..

[10]  Gyula Simon,et al.  Countersniper system for urban warfare , 2005, TOSN.

[11]  Zixiang Xiong,et al.  3D scene reconstruction by multiple structured-light based commodity depth cameras , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Taras Butko,et al.  Audiovisual event detection towards scene understanding , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[13]  Alex Doboli,et al.  Improved sound-based localization through a network of reconfigurable mixed-signal nodes , 2010, 2010 IEEE International Workshop on Robotic and Sensors Environments.

[14]  Bruce H. Krogh,et al.  Lightweight detection and classification for wireless sensor networks in realistic environments , 2005, SenSys '05.

[15]  Alex Doboli,et al.  Low-cost sound-based localization using programmable mixed-signal systems-on-chip , 2011, Microelectron. J..

[16]  M. R. Azimi-Sadjadi,et al.  Acoustic source localization with high performance sensor nodes , 2007, SPIE Defense + Commercial Sensing.

[17]  Yu Hen Hu,et al.  Vehicle classification in distributed sensor networks , 2004, J. Parallel Distributed Comput..