i-speech: Experiments with a Level Feedback Paradigm

Speech of multiple speakers is transformed to speech produced by a single speaker (speech normalization) using cross-coding networks. Internal representations for classification are acquired by feeding back the internal speech (i-speech) produced. Training proceeds by unfolding the network through time, and combining the classification error with the intermediate speaker-normalization errors. Experimental results on multi-speaker syllable recognition tasks with trained and new speakers are discussed.