Learning neural audio features without supervision