Automatic transcription of drum sequences using audiovisual features

The transcription of a musical performance from the audio signal is often problematic, either because it requires the separation of complex sources, or simply because some important high-level music information cannot be directly extracted from the audio signal. We propose a novel multimodal approach for the transcription of drum sequences using audiovisual features. The transcription is performed by support vector machine (SVM) classifiers, and three different information fusion strategies are evaluated. A correct recognition rate of 85.8% can be achieved for a detailed taxonomy and a fully automated transcription.