Phonetic Set Hashing: A Novel Scheme for Transforming Phone Sequences to Words

The usefulness of accurate sequence information is re-evaluated in this paper. A novel idea, called phonetic set hashing, of transforming phone sequences to words is then suggested. Phone sequences are mapped onto the corresponding phone sets, and the latter used as keys for indexing appropriate words. By using data-driven training strategies, the problem of word segmentation has been alleviated. The robustness of phone set hashing towards insertion, deletion, and substitution errors has also been studied. Experiments with subsets of the TIMIT database indicate that phone set hashing is a simple, fast scheme for word pre-selection.