ABSTRACT Common problems with current methods of musical note onsetdetection are detection of fast passages of musical audio, detectionof all onsets within a passage with a strong dynamic range anddetection of onsets of varying types, such as multi-instrumentalmusic. We present a method that uses a subband decompositionapproach to onset detection. An energy-based detector is used onthe upper subbands to detect strong transient events. This yieldsprecision in the time resolution of the onsets, but does not detectsofter or weaker onsets. A frequency based distance measure isformulated for use with the lower subbands, improving detectionaccuracy of softer onsets.We also present a method for improving the detection func-tion, by using a smoothed difference metric. Finally, we show thatthe detection threshold may be set automatically from analysis ofthe statistics of the detection function, with results comparable inmost places to manual setting of thresholds. 1. BACKGROUND Note onset detection aims to find the start of musical events fromthe audio signal itself. It is an essential component of many largersystems such as automatic musical transcription schemes, non-linear time scaling [1], and many new audio effects and editingtools, such as ’beat detective’[2] from Digidesign. It is also com-mon for many synthesis applications to require isolation of the at-tack portions of notes.Despite some proposed solutions, it remains an unsolved, andoften over-simplified, problem. Traditional methods such as highfrequency detection rely on theassumption that allnote onsets con-tain high frequency energy [3]. The assumption that, for most in-struments, a note will contain more high frequency energy at itsonset is fair to make. However, in the case of real world audioexamples where there may be high notes with considerable highfrequency energy at their onset in the same region as low noteswith weak high frequency energy, the lower notes become almostimpossible to detect from the detection function. This work ad-dresses this problem directly.If we consider the nature of musical signals, there is a range ofdifferent types of instrument onsets. Figure 1 shows short sectionsof signals from a guitar and a violin. The guitar is a string in-strument that is played percussively, leading to ’hard’ note onsets,appearing as wide-band noise in the spectogram. For this type ofinstrument, high frequency content is a useful detection method.However, the violin in this figure is an example of a bowed stringinstrument, with a ’soft’ onset. The strings are excited because ofthe stick-slip caused by the friction of the bow. In this case, thenotes are being excited constantly, hence there is little, or no, de-cay. Here, the change in frequency content, particularly at lowerfrequencies, is our best guide to note onsets. Most everyday musi-cal signals contain a range of hard and soft onsets.Figure 1:
[1]
E. Owens,et al.
An Introduction to the Psychology of Hearing
,
1997
.
[2]
Heekuck Oh,et al.
Neural Networks for Pattern Recognition
,
1993,
Adv. Comput..
[3]
Teresa H. Y. Meng,et al.
Transient Modeling Synthesis: a flexible analysis/synthesis tool for transient signals
,
1998,
ICMC.
[4]
Anssi Klapuri,et al.
Sound onset detection by applying psychoacoustic knowledge
,
1999,
1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).
[5]
Fabien Gouyon,et al.
A Flexible Analysis-Synthesis Method for Transients
,
2000,
ICMC.
[6]
Xavier Rodet,et al.
Detection and modeling of fast attack transients
,
2001,
ICMC.
[7]
Bruno Torrésani,et al.
Transient detection and encoding using wavelet coefficient trees
,
2001
.
[8]
Mike E. Davies,et al.
Improved Time-Scaling of Musical Audio Using Phase Locking at Transients
,
2002
.