Separating Style and Content with Bilinear Models

Perceptual systems routinely separate content from style, classifying familiar words spoken in an unfamiliar accent, identifying a font or handwriting style across letters, or recognizing a familiar face or object seen under unfamiliar viewing conditions. Yet a general and tractable computational model of this ability to untangle the underlying factors of perceptual observations remains elusive (Hofstadter, 1985). Existing factor models (Mardia, Kent, & Bibby, 1979; Hinton & Zemel, 1994; Ghahramani, 1995; Bell & Sejnowski, 1995; Hinton, Dayan, Frey, & Neal, 1995; Dayan, Hinton, Neal, & Zemel, 1995; Hinton & Ghahramani, 1997) are either insufficiently rich to capture the complex interactions of perceptually meaningful factors such as phoneme and speaker accent or letter and font, or do not allow efficient learning algorithms. We present a general framework for learning to solve two-factor tasks using bilinear models, which provide sufficiently expressive representations of factor interactions but can nonetheless be fit to data using efficient algorithms based on the singular value decomposition and expectation-maximization. We report promising results on three different tasks in three different perceptual domains: spoken vowel classification with a benchmark multi-speaker database, extrapolation of fonts to unseen letters, and translation of faces to novel illuminants.

[1]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[2]  D. A. Dunnett Classical Electrodynamics , 2020, Nature.

[3]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[4]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[5]  M. Brereton Classical Electrodynamics (2nd edn) , 1976 .

[6]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7]  H. Barrow,et al.  RECOVERING INTRINSIC SCENE CHARACTERISTICS FROM IMAGES , 1978 .

[8]  R. M. Siegel,et al.  Encoding of spatial location by posterior parietal neurons. , 1985, Science.

[9]  Florien J. van Beinum,et al.  Perceptual normalization of the vowels of a man and a child in various contexts , 1988, Speech Commun..

[10]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics (Revised Edition) , 1999 .

[11]  Erkki Oja,et al.  Neural Networks, Principal Components, and Subspaces , 1989, Int. J. Neural Syst..

[12]  Terence D. Sanger,et al.  Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.

[13]  Lawrence Sirovich,et al.  Application of the Karhunen-Loeve Procedure for the Characterization of Human Faces , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Patrick Cavanagh,et al.  What's up in top-down processing? , 1991 .

[15]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[16]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics , 1991 .

[17]  M. Landy,et al.  A Bilinear Model of the Illuminant's Effect on Color Appearance , 1991 .

[18]  M. D'Zmura Color constancy : surface color from changing illumination , 1992 .

[19]  T. Sanocki Effects of font- and letter-specific experience on the perceptual processing of letters , 1992 .

[20]  Yann LeCun,et al.  Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[21]  Richard Szeliski,et al.  Surface modeling with oriented particle systems , 1992, SIGGRAPH.

[22]  David G. Stork,et al.  Connectionist generalization for production: An example from GridFont , 1992, Neural Networks.

[23]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[24]  B A Wandell,et al.  Linear models of surface and illuminant spectra. , 1992, Journal of the Optical Society of America. A, Optics and image science.

[25]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[26]  Tomaso Poggio,et al.  Observations on Cortical Mechanisms for Object Recognition and Learning , 1993 .

[27]  D. V. van Essen,et al.  A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[28]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[29]  Peter W. Hallinan A low-dimensional representation of human faces for arbitrary lighting conditions , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Patrick C. Teo,et al.  Perceptual image distortion , 1994, Proceedings of 1st International Conference on Image Processing.

[31]  Zoubin Ghahramani,et al.  Factorial Learning and the EM Algorithm , 1994, NIPS.

[32]  Stephen M. Omohundro Family Discovery , 1995, NIPS.

[33]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[34]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[35]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[36]  Tom Heskes,et al.  A Neural Model of Visual Attention , 1995, SNN Symposium on Neural Networks.

[37]  T. Sejnowski,et al.  A selection model for motion processing in area MT of primates , 1995, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[38]  D. Hofstadter Fluid Concepts and Creative Analogies: Computer Models of the Fundamental Mechanisms of Thought, Douglas Hofstadter. 1994. Basic Books, New York, NY. 512 pages. ISBN: 0-465-05154-5. $30.00 , 1995 .

[39]  Douglas R. Hofstadter,et al.  Fluid Concepts and Creative Analogies , 1995 .

[40]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  L. Abbott,et al.  A model of multiplicative neural responses in parietal cortex. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Tomaso Poggio,et al.  Image Representations for Visual Learning , 1996, Science.

[43]  Paul A. Griffin,et al.  Statistical Approach to Shape from Shading: Reconstruction of Three-Dimensional Face Surfaces from Single Two-Dimensional Images , 1996, Neural Computation.

[44]  S. Nayar,et al.  Early Visual Learning , 1996 .

[45]  Peter Dayan,et al.  Neural Models for Part-Whole Hierarchies , 1996, NIPS.

[46]  Joshua B. Tenenbaum,et al.  Separating Style and Content , 1996, NIPS.

[47]  Joshua B. Tenenbaum,et al.  Learning bilinear models for two-factor problems in vision , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[48]  J. Atick,et al.  Statistical Approach to Shape from Shading : Reconstruction of 3 D Face Surfaces from Single 2 D , 1997 .

[49]  Christof Koch,et al.  Computation and the single neuron , 1997, Nature.

[50]  Alex Pentland,et al.  Probabilistic Visual Learning for Object Representation , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Geoffrey E. Hinton,et al.  Generative models for discovering sparse distributed representations. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[52]  Thad A. Polk,et al.  A Simple Common Contexts Explanation for the Development of Abstract Letter Identities , 1997, Neural Computation.

[53]  Patrick Cavanagh,et al.  Recovery of 3D volume from 2-tone images of novel objects , 1998, Cognition.

[54]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[55]  滋 篠本,et al.  Computation and the single neuron , 1998 .

[56]  D. Pisoni,et al.  Talker-specific learning in speech perception , 1998, Perception & psychophysics.

[57]  David J. Miller,et al.  Critic-driven ensemble classification , 1999, IEEE Trans. Signal Process..

[58]  Aaron F. Bobick,et al.  Parametric Hidden Markov Models for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[59]  William H. Press,et al.  Numerical recipes in C , 2002 .