Data Mining Sound Archives: A New Scalable Algorithm for Parallel-Distributing Processing

This paper discusses a new algorithm, called the acoustic data-mining accelerator (ADA), which was developed to mine large sound archives for signals of interest including animal vocalizations. Background information on the development of ADA is provided, summarizing various projects that have utilized this technology since 2009. Performance was evaluated by comparing runtimes and efficiency metrics for two marine mammal detection algorithms that were applied to a 3-week single channel acoustic data set (sampled at 192 kHz and with 16 bit resolution). A total of four configurations (1, 8, 16 and 64 workers) demonstrated processing scalability. Results showed that each detection algorithm successfully processed the data set in all four configurations without changing the ADA algorithm. The fastest case (64 workers), had a total runtime performance of 1.5 hours; making the ADA 13 times more efficient than the serial case. Using a single worker it took more than 18 hours to process the same 3-week data set. Concurrent processing of both data-mining algorithms using 64 workers showed the highest efficiency gain (23x) compared to sequentially processing the data with a single worker.

[1]  F. Ashcroft,et al.  VIII. References , 1955 .

[2]  Grant Martin,et al.  The Mathworks Distributed and Parallel Computing Tools for Signal Processing Applications , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[3]  Tomasz Danek Parallel and distributed seismic wave field modeling with combined Linux clusters and graphics processing units , 2009, 2009 IEEE International Geoscience and Remote Sensing Symposium.

[4]  Hervé Glotin,et al.  High Performance Computer Acoustic Data Accelerator: A New System for Exploring Marine Mammal Acoustics for Big Data Applications , 2015, ArXiv.

[5]  Mohammad Pourhomayoun,et al.  Bioacoustical Periodic Pulse Train Signal Detection and Classification using Spectrogram Intensity Binarization and Energy Projection , 2013, ArXiv.

[6]  Léon Bottou,et al.  On-line learning for very large data sets , 2005 .

[7]  Shuvra S. Bhattacharyya,et al.  Design and optimization of a distributed, embedded speech recognition system , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[8]  Mohammad Pourhomayoun,et al.  Bioacoustic Signal Classification Based on Continuous Region Processing, Grid Masking and Artificial Neural Network , 2013, ArXiv.

[9]  Hpc Asia Proceedings, High performance computing on the information superhighway, HPC Asia '97 : Seoul, Korea, April 28-May 2, 1997 , 1997 .

[10]  Demetris G. Galatopoullos,et al.  Distributed Matlab based signal and image processing using JavaPorts , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Xiaodong Shi,et al.  A distributed parallel AdaBoost algorithm for face detection , 2010, 2010 IEEE International Conference on Intelligent Computing and Intelligent Systems.

[12]  Mohammad Pourhomayoun,et al.  Classification for Big Dataset of Bioacoustic Signals Based on Human Scoring System and Artificial Neural Network , 2013, ArXiv.