Hand-crafted versus learned representations for audio event detection