Applying scalable phonetic context similarity in unit selection of concatenative text-to-speech