Recent improvements of an auditory model based front-end for the transcription of vocal queries

In this paper recent improvements of an existing acoustic frontend for the transcription of vocal (hummed, sung) musical queries is presented. Thanks to the addition of a new second pitch extractor and the introduction of a novel multi-stage segmentation algorithm, the application domain of the front-end could be extended to whistled queries, and on top of that, the performance on the other two query types could be improved. Experiments have shown that the new system can transcribe vocal queries with an accuracy ranging from 76 % (whistling) to 85 % (humming), and that it clearly outperforms other state-of-the art systems on all three query types.