This Convention paper was selected based on a submitted abstract and 750-word precis that have been peer reviewed by at least two qualified anonymous reviewers. The complete manuscript was not peer reviewed. This convention paper has been reproduced from the author's advance manuscript without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio ABSTRACT Numerous methods have been proposed for searching and analyzing long-term audio recordings for specific sound sources. It is increasingly common that audio recordings are archived using perceptual compression, such as MPEG-1 Layer 3 (MP3). Rather than performing sound identification upon the reconstructed time waveform after decoding, we operate on the undecoded MP3 audio data as a way to improve processing speed and efficiency. The compressed audio format is only partially processed using the initial bitstream unpacking of a standard decoder, but then the sound identification is performed directly using the frequency spectrum represented by each MP3 data frame. Practical uses are demonstrated for identifying anthropogenic sounds within a natural soundscape recording. 1. INTRODUCTION Audio monitoring for noise levels at a specific location is not uncommon. However, analyzing these recordings is a much more difficult task. As researchers record audio data at a particular location, the analysis time of these sounds becomes exponentially greater as the recording length increases. While analysis software that provides basic statistics is common, little exists to actually identify and classify the sounds in the recording. In soundscape analysis and audio forensics investigation, this issue is continuing to grow as the ability to record audio data long-term becomes more feasible with inexpensive equipment. These long-term recordings can be created anywhere and contain all kinds of natural and culturally created sounds [1]. While the possible applications for analyzing long-term recordings are vast, the focus in this paper will be directed toward the National Park Service's interest in long-term recordings and analysis in the National Parks. Skies Division (NSNSD) is interested in scientifically measuring background sound levels in the parks and determining how the levels of cultural sounds affect the environment. The National Park Service describes these interests in their Management Policies 2006 report, " The Service will restore to the natural condition wherever possible those park soundscapes that have become degraded by unnatural sounds (noise), and will
[1]
Daniel P. W. Ellis,et al.
Fingerprinting to Identify Repeated Sound Events in Long-Duration Personal Audio Recordings
,
2007,
2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[2]
Robert C. Maher,et al.
AUTOMATIC SEARCH AND CLASSIFICATION OF SOUND SOURCES IN LONG-TERM SURVEILLANCE RECORDINGS
,
2012
.
[3]
George Tzanetakis,et al.
Sound analysis using MPEG compressed audio
,
2000,
2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[4]
S. Levitus,et al.
US Government Printing Office
,
1998
.
[5]
Robert C. Maher.
Acoustics of National Parks and Historic Sites: the 8,760 hour MP3 File
,
2009
.
[6]
Gregory L. Zick,et al.
Speech recognition on MPEG/Audio encoded files
,
1997,
Proceedings of IEEE International Conference on Multimedia Computing and Systems.