论文信息 - A comparison of local versus global image decompositions for visual speechreading

A comparison of local versus global image decompositions for visual speechreading

What is the appropriate spatial scale for image representation? In the primate visual system, receptive fields are small at early stages of processing (area Vl), and larger at late stages of processing (areas MT, IT). In the current work, we explore the efficiency of local and global image representations on an automatic visual speech recognition task using an HMM as the recognition system. We compare local and global principal component and independent component image representations for the task. Local representations consistently and significantly outperformed global representations in terms of generalization to new speakers.

Michael S. Gray i

[1] Terrence J. Sejnowski,et al. An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[2] M. Turk,et al. Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[3] Terrence J. Sejnowski,et al. The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[4] Penio S. Penev,et al. Local feature analysis: A general statistical theory for object representation , 1996 .

[5] D. Field,et al. Natural image statistics and efficient coding. , 1996, Network.

[6] Marian Stewart Bartlett,et al. Viewpoint Invariant Face Recognition using Independent Component Analysis and Attractor Networks , 1996, NIPS.

[7] Javier R. Movellan,et al. Visual Speech Recognition with Stochastic Networks , 1994, NIPS.