Evaluation of speaker normalization methods for vowel recognition using neural network and nearest‐neighbor classifiers

Intrinsic and extrinsic speaker normalization methods were compared using a neural network (fuzzy ARTMAP) and L1 and L2 K‐nearest neighbor (K‐NN) categorizers trained and tested on disjoint sets of speakers of the Peterson–Barney vowel database. Intrinsic methods included one nonscaled, four psychophysical scales (bark, bark with end correction, mel, ERB), and three log scales, each tested on four combinations of F0, F1, F2, F3. Extrinsic methods included four speaker adaptation schemes each combined with the 32 intrinsic methods: centroid subtraction across all frequencies (CS), centroid subtraction for each frequency (CSi), linear scale (LS), and linear transformation (LT). ARTMAP and K‐NN showed similar trends, with K‐NN performing better, but requiring about ten times as much memory. Among intrinsic methods, ARTMAP and K‐NN performed optimally using all the differences between bark scaled Fi (BDA). The ordering of performance for the extrinsic methods were LT, CSi, LS, and CS. For all extrinsic method...