Category Learning Through Multimodality Sensing

Humans and other animals learn to form complex categories without receiving a target output, or teaching signal, with each input pattern. In contrast, most computer algorithms that emulate such performance assume the brain is provided with the correct output at the neuronal level or require grossly unphysiological methods of information propagation. Natural environments do not contain explicit labeling signals, but they do contain important information in the form of temporal correlations between sensations to different sensory modalities, and humans are affected by this correlational structure (Howells, 1944; McGurk & MacDonald, 1976; MacDonald & McGurk, 1978; Zellner & Kautz, 1990; Durgin & Proffitt, 1996). In this article we describe a simple, unsupervised neural network algorithm that also uses this natural structure. Using only the co-occurring patterns of lip motion and sound signals from a human speaker, the network learns separate visual and auditory speech classifiers that perform comparably to supervised networks.

[1]  Arnaldo Spalvieri,et al.  Pattern classification by the Bayes machine , 1995 .

[2]  C. R. Michael,et al.  Integration of auditory information in the cat's visual cortex. , 1973, Vision research.

[3]  D. Zellner,et al.  Color affects perceived odor intensity. , 1990, Journal of experimental psychology. Human perception and performance.

[4]  Richard Granger,et al.  A cortical model of winner-take-all competition via lateral inhibition , 1992, Neural Networks.

[5]  Ah-Hwee Tan,et al.  Adaptive resonance associative map , 1995, Neural Networks.

[6]  Dana H. Ballard,et al.  Self-Teaching Through Correlated Input , 1993 .

[7]  Stephen Grossberg,et al.  ARTMAP: supervised real-time learning and classification of nonstationary data by a self-organizing neural network , 1991, [1991 Proceedings] IEEE Conference on Neural Networks for Ocean Engineering.

[8]  Paul Munro Self-supervised Learning of Concepts by Single Units and "Weakly Local" Representations , 1988 .

[9]  FRANK MORRELL,et al.  Visual System's View of Acoustic Space , 1972, Nature.

[10]  Dana H. Ballard,et al.  A Note on Learning Vector Quantization , 1992, NIPS.

[11]  J. Maunsell,et al.  Extraretinal representations in area V4 in the macaque monkey , 1991, Visual Neuroscience.

[12]  Naohiro Ishii,et al.  A self-supervised learning system for pattern recognition by sensory integration , 1999, Neural Networks.

[13]  Jürgen Schmidhuber,et al.  Discovering Predictable Classifications , 1993, Neural Computation.

[14]  Jim Kay,et al.  The discovery of structure by multi-stream networks of local processors with contextual guidance , 1995 .

[15]  Teuvo Kohonen,et al.  Improved versions of learning vector quantization , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[16]  Gregory J. Wolff,et al.  Neural network lipreading system for improved speech recognition , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[17]  Joanne L. Miller,et al.  Speech Perception , 1990, Springer Handbook of Auditory Research.

[18]  D R Proffitt,et al.  Visual learning in the perception of texture: simple and contingent aftereffects of texture density. , 1996, Spatial vision.

[19]  Terrence J. Sejnowski,et al.  Neural network models of sensory integration for improved vowel recognition , 1990, Proc. IEEE.

[20]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[21]  Jack Sklansky,et al.  Pattern Classifiers and Trainable Machines , 1981 .

[22]  W. Singer,et al.  Long-term depression of excitatory synaptic transmission and its relationship to long-term potentiation , 1993, Trends in Neurosciences.

[23]  D. Zellner,et al.  Color affects perceived odor intensity. , 1990, Journal of experimental psychology. Human perception and performance.

[24]  R. Hari,et al.  Seeing speech: visual information from lip movements modifies activity in the human auditory cortex , 1991, Neuroscience Letters.

[25]  Risto Miikkulainen,et al.  Self-Organizing Process Based On Lateral Inhibition And Synaptic Resource Redistribution , 1991 .

[26]  P. Buser,et al.  Réponses somesthésiques, visuelles et auditives, recueillies au niveau du cortex ⪡associatif⪢ non anesthésié , 1959 .

[27]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[28]  E. Capaldi,et al.  The organization of behavior. , 1992, Journal of applied behavior analysis.

[29]  T. Howells The experimental development of color-tone synesthesia. , 1944 .

[30]  Geoffrey E. Hinton,et al.  Self-organizing neural network that discovers surfaces in random-dot stereograms , 1992, Nature.

[31]  H. McGurk,et al.  Visual influences on speech perception processes , 1978, Perception & psychophysics.

[32]  Suzanna Becker,et al.  Mutual information maximization: models of cortical self-organization. , 1996, Network.

[33]  D. N. Spinelli,et al.  Auditory specificity in unit recordings from cat's visual cortex. , 1968, Experimental neurology.

[34]  R. O’Reilly Six principles for biologically based computational models of cortical cognition , 1998, Trends in Cognitive Sciences.

[35]  K Murata,et al.  Neuronal convergence of noxious, acoustic, and visual stimuli in the visual cortex of the cat. , 1965, Journal of neurophysiology.

[36]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[37]  GrossbergS. Adaptive pattern classification and universal recoding , 1976 .

[38]  Jack Sklansky,et al.  Training a One-Dimensional Classifier to Minimize the Probability of Error , 1972, IEEE Trans. Syst. Man Cybern..

[39]  James M. Bower,et al.  Computation and Neural Systems , 2014, Springer US.

[40]  David Zipser,et al.  Feature Discovery by Competive Learning , 1986, Cogn. Sci..

[41]  Dario Floreano,et al.  Contextually guided unsupervised learning using local multivariate binary processors , 1998, Neural Networks.

[42]  Ramprasad Polana,et al.  Temporal texture and activity recognition , 1994 .