A comparison of local versus global image decompositions for visual speechreading

What is the appropriate spatial scale for image representation? In the primate visual system, receptive fields are small at early stages of processing (area Vl), and larger at late stages of processing (areas MT, IT). In the current work, we explore the efficiency of local and global image representations on an automatic visual speech recognition task using an HMM as the recognition system. We compare local and global principal component and independent component image representations for the task. Local representations consistently and significantly outperformed global representations in terms of generalization to new speakers.