Cross-coding networks for speech classification

What kind of internal representations develop with networks that transform speech of one speaker to that of another? This question is addressed in this paper by a novel supervised coding scheme: cross-coding. Instead of performing auto-association, we train networks to map speech of many speakers to speech of a particular speaker, with intermediate bottlenecks. The internal representations developed are then input to another network trained to label the corresponding sounds. Interestingly, the cross-codings seem to have captured speaker invariant properties in the different sounds. Experiments with multispeaker syllable recognition task show that the proposed scheme outperforms the corresponding multilayered net.

[1]  Alex Waibel,et al.  Connectionist speaker normalization and its applications to speech recognition , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[2]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[3]  Richard Lippmann,et al.  Review of Neural Networks for Speech Recognition , 1989, Neural Computation.

[4]  D Zipser,et al.  Learning the hidden structure of speech. , 1988, The Journal of the Acoustical Society of America.

[5]  Raymond L. Watrous Speaker normalization and adaptation using second-order connectionist networks , 1993, IEEE Trans. Neural Networks.

[6]  Hervé Bourlard,et al.  CDNN: a context dependent neural network for continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Suzanna Becker,et al.  Unsupervised Learning Procedures for Neural Networks , 1991, Int. J. Neural Syst..

[8]  Yochai Konig,et al.  GDNN: a gender-dependent neural network for continuous speech recognition , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.