Deep audio embeddings for vocalisation clustering