Continuous latent variable models for dimensionality reduction and sequential data reconstruction

Continuous latent variable models (cLVMs) are probabilistic models that represent a distribution in a high-dimensional Euclidean space using a small number of continuous, latent variables. This thesis explores, theoretically and practically, the ability of cLVMs for dimensionality reduction and sequential data reconstruction. The first part of the thesis reviews and extends the theory of cLVMs: definition in terms of a prior distribution in latent space, a mapping to data space and a noise model; maximum likelihood parameter estimation with an expectation-maximisation (EM) algorithm; specific cLVMs (factor analysis, principal component analysis (PCA), independent component analysis, independent factor analysis and the generative topographic mapping (GTM)); mixtures of cLVMs; identifiability, interpretability and visualisation; and derivation of mappings for dimensionality reduction and reconstruction and their properties, such as continuity, for each cLVM. We extend GTM to diagonal noise and give a corresponding EM algorithm. We also describe a discrete LVM for binary data, Bernoulli mixtures, widely used in practice. We show that their log-likelihood surface has no singularities, unlike other mixture models, which makes EM estimation practical; and that their theoretical non-identifiability is rarely realised in actual estimates, which makes them interpretable. The second part deals with dimensionality reduction. We define the problem and give an extensive, critical review of nonprobabilistic methods for it: linear methods (PCA, projection pursuit), nonlinear autoassociators, kernel methods, local dimensionality reduction, principal curves, vector quantisation methods (elastic net, self-organising map) and multidimensional scaling methods. We then empirically evaluate, in terms of reconstruction error, computation time and visualisation, several latent-variable methods for dimensionality reduction of binary electropalatographic (EPG) data: PCA, factor analysis, mixtures of factor analysers, GTM and Bernoulli mixtures. We compare these methods with earlier, nonadaptive EPG data reduction methods and derive 2D maps of EPG sequences for use in speech research and therapy. The last part of this thesis proposes a new method for missing data reconstruction of sequential data that includes as particular case the inversion of many-to-one mappings. We define the problem, distinguish it from inverse problems, and show when both coincide. The method is based on multiple pointwise reconstruction and constraint optimisation. Multiple pointwise reconstruction uses a Gaussian mixture joint density model for the data, conveniently implemented with a nonlinear cLVM (GTM). The modes of the conditional distribution of missing values given present values at each point in the sequence represent local candidate reconstructions. A global sequence reconstruction is obtained by efficiently optimising a constraint, such as continuity or smoothness, with dynamic programming. We give a probabilistic interpretation of the method. We derive two algorithms for exhaustive mode finding in Gaussian mixtures, based on gradient-quadratic search and fixed-point search, respectively; as well as estimates of error bars for each mode and a measure of distribution sparseness. We discuss the advantages of the method over previous work based on the conditional mean or on universal mapping approximators (including ensembles and recurrent networks), conditional distribution estimation, vector quantisation and statistical analysis of missing data. We study the performance of the method with synthetic data (a toy example and an inverse kinematics problem) and real data (mapping between EPG and acoustic data). We describe the possible application of the method to several well-known reconstruction or inversion problems: decoding of neural population activity for hippocampal place cells; wind field retrieval from scatterometer data; inverse kinematics and dynamics of a redundant manipulator; acoustic-to-articulatory mapping; audiovisual mappings for speech recognition; and recognition of occluded speech.

[1]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[2]  Wm. R. Wright General Intelligence, Objectively Determined and Measured. , 1905 .

[3]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[4]  G. Young Maximum likelihood estimation and factor analysis , 1941 .

[5]  K. Fan On a Theorem of Weyl Concerning Eigenvalues of Linear Transformations: II. , 1949, Proceedings of the National Academy of Sciences of the United States of America.

[6]  H. Piaggio Differential Geometry of Curves and Surfaces , 1952, Nature.

[7]  Herman Rubin,et al.  Statistical Inference in Factor Analysis , 1956 .

[8]  E. Rothkopf A measure of stimulus similarity and errors in some paired-associate learning tasks. , 1957, Journal of experimental psychology.

[9]  H. Kaiser The varimax criterion for analytic rotation in factor analysis , 1958 .

[10]  M. Kendall,et al.  The Advanced Theory of Statistics, Vol. 1: Distribution Theory , 1959 .

[11]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[12]  Walter L. Smith Probability and Statistics , 1959, Nature.

[13]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[14]  T. W. Anderson ASYMPTOTIC THEORY FOR PRINCIPAL COMPONENT ANALYSIS , 1963 .

[15]  R N SHEPARD,et al.  Analysis of Proximities as a Technique for the Study of Information Processing in Man1 , 1963, Human factors.

[16]  J. H. Wilkinson The algebraic eigenvalue problem , 1966 .

[17]  P. Denes On the Motor Theory of Speech Perception , 1965 .

[18]  K. Jöreskog Some contributions to maximum likelihood factor analysis , 1967 .

[19]  M. Schroeder Determination of the geometry of the human vocal tract by acoustic measurements. , 1967, The Journal of the Acoustical Society of America.

[20]  P. Mermelstein Determination of the vocal-tract shape from measured formant frequencies. , 1967, The Journal of the Acoustical Society of America.

[21]  G. J. Thomas The Co-ordination and Regulation of Movements , 1967 .

[22]  A M Liberman,et al.  Perception of the speech code. , 1967, Psychological review.

[23]  E. B. Andersen,et al.  Modern factor analysis , 1961 .

[24]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[25]  K. Jöreskog A general approach to confirmatory maximum likelihood factor analysis , 1969 .

[26]  J. Wolfe PATTERN CLUSTERING BY MULTIVARIATE MIXTURE ANALYSIS. , 1970, Multivariate behavioral research.

[27]  S. Vajda,et al.  Principles of Operations Research, with Applications to Managerial Decisions. , 1970 .

[28]  J. Behboodian On the Modes of a Mixture of Two Normal Distributions , 1970 .

[29]  Herman Chernoff,et al.  The Use of Faces to Represent Points in k- Dimensional Space Graphically , 1973 .

[30]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[31]  P. Mermelstein Articulatory model for the study of speech production. , 1973, The Journal of the Acoustical Society of America.

[32]  John W. Tukey,et al.  A Projection Pursuit Algorithm for Exploratory Data Analysis , 1974, IEEE Transactions on Computers.

[33]  P. Ladefoged A course in phonetics , 1975 .

[34]  E. Thompson,et al.  THEORY OF ERRORS AND GENERALIZED MATRIX INVERSES , 1975 .

[35]  C.H. Coker,et al.  A model of articulatory dynamics and control , 1976, Proceedings of the IEEE.

[36]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[37]  N. Laird Nonparametric Maximum Likelihood Estimation of a Mixing Distribution , 1978 .

[38]  B. Atal,et al.  Inversion of articulatory-to-acoustic transformation in the vocal tract by a computer-sorting technique. , 1978, The Journal of the Acoustical Society of America.

[39]  P. Ladefoged Articulatory Parameters , 1980, Language and speech.

[40]  A. Konstantellos Unimodality conditions for Gaussian sums , 1980 .

[41]  Gabor T. Herman,et al.  Image reconstruction from projections : the fundamentals of computerized tomography , 1980 .

[42]  R. Broucke Orbital motion , 1980 .

[43]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[44]  Shinji Maeda,et al.  A digital simulation method of the vocal-tract system , 1982, Speech Commun..

[45]  Dorothy T. Thayer,et al.  EM algorithms for ML factor analysis , 1982 .

[46]  E. Bienenstock,et al.  Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex , 1982, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[47]  B. Lindsay The Geometry of Mixture Likelihoods: A General Theory , 1983 .

[48]  J. S. Tanaka,et al.  Problems with EM algorithms for ML factor analysis , 1983 .

[49]  Brian Parker,et al.  Quantitative Applications in the Social Sciences , 1983 .

[50]  S E Levinson,et al.  Adaptive computation of articulatory parameters from the speech signal. , 1982, The Journal of the Acoustical Society of America.

[51]  R. P. McDonald,et al.  A second generation nonlinear factor analysis , 1983 .

[52]  J. Friedman,et al.  PROJECTION PURSUIT DENSITY ESTIMATION , 1984 .

[53]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[54]  D. Freedman,et al.  Asymptotics of Graphical Projection Pursuit , 1984 .

[55]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[56]  Roderick P. McDonald,et al.  Factor Analysis and Related Methods , 1985 .

[57]  D. Bartholomew Foundations of factor analysis: some practical implications , 1985 .

[58]  S. French,et al.  An Introduction to Latent Variable Models. Monographs on Statistics and Applied Probability , 1985 .

[59]  Daniel Asimov,et al.  The grand tour: a tool for viewing multidimensional data , 1985 .

[60]  W. Kier,et al.  Tongues, tentacles and trunks: the biomechanics of movement in muscular‐hydrostats , 1985 .

[61]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[62]  John J. Craig,et al.  Introduction to Robotics Mechanics and Control , 1986 .

[63]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[64]  Bernard W. Silverman,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[65]  Christian Genest,et al.  Combining Probability Distributions: A Critique and an Annotated Bibliography , 1986 .

[66]  Katsuhiko Shirai,et al.  Estimating articulatory motion from speech wave , 1986, Speech Commun..

[67]  Robin Sibson,et al.  What is projection pursuit , 1987 .

[68]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[69]  A. Tarantola Inverse problem theory : methods for data fitting and model parameter estimation , 1987 .

[70]  R. Parker,et al.  Occam's inversion; a practical algorithm for generating smooth models from electromagnetic sounding data , 1987 .

[71]  Richard Durbin,et al.  An analogue approach to the travelling salesman problem using an elastic net method , 1987, Nature.

[72]  J. Kelso,et al.  Skilled actions: a task-dynamic approach. , 1987, Psychological review.

[73]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[74]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[75]  Michael F. Barnsley,et al.  Fractals everywhere , 1988 .

[76]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[77]  Ralph Linsker,et al.  An Application of the Principle of Maximum Information Preservation to Linear Systems , 1988, NIPS.

[78]  P. Foldiak,et al.  Adaptive network for optimal linear feature extraction , 1989, International 1989 Joint Conference on Neural Networks.

[79]  Man Mohan Sondhi,et al.  Dynamic programming search of articulatory codebooks , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[80]  J. N. Kapur Maximum-entropy models in science and engineering , 1992 .

[81]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[82]  P. Hall On Polynomial-Based Projection Indices for Exploratory Projection Pursuit , 1989 .

[83]  Richard Szeliski,et al.  An Analysis of the Elastic Net Approach to the Traveling Salesman Problem , 1989, Neural Computation.

[84]  Terence D. Sanger,et al.  Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.

[85]  W. Hardcastle,et al.  New developments in electropalatography: A state-of-the-art report , 1989 .

[86]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[87]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[88]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[89]  Eric Saund,et al.  Dimensionality-Reduction Using Connectionist Networks , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[90]  William J. Hardcastle,et al.  Cross-language investigation of lingual coarticulatory processes using EPQ , 1989, EUROSPEECH.

[91]  C. Atkeson,et al.  Learning arm kinematics and dynamics. , 1989, Annual review of neuroscience.

[92]  M. K. Fleming,et al.  Categorization of faces using unsupervised feature extraction , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[93]  Andrew Harvey,et al.  Forecasting, Structural Time Series Models and the Kalman Filter , 1990 .

[94]  John F. Kolen,et al.  Backpropagation is Sensitive to Initial Conditions , 1990, Complex Syst..

[95]  Raymond D. Kent,et al.  X‐ray microbeam speech production database , 1990 .

[96]  E. Wegman Hyperdimensional Data Analysis Using Parallel Coordinates , 1990 .

[97]  Michael Spann,et al.  A new approach to clustering , 1990, Pattern Recognit..

[98]  Kenneth Falconer,et al.  Fractal Geometry: Mathematical Foundations and Applications , 1990 .

[99]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[100]  J. N. R. Jeffers,et al.  Graphical Models in Applied Multivariate Statistics. , 1990 .

[101]  Richard Durbin,et al.  A dimension reduction framework for understanding cortical maps , 1990, Nature.

[102]  C. Posse An effective two-dimensional projection pursuit algorithm , 1990 .

[103]  S. Vaseghi Detection and suppression of impulsive noise in speech communication systems , 1990 .

[104]  Lang Tong,et al.  Indeterminacy and identifiability of blind identification , 1991 .

[105]  Edmund T. Rolls,et al.  What determines the capacity of autoassociative memories in the brain? Network , 1991 .

[106]  Donald F. Specht,et al.  A general regression neural network , 1991, IEEE Trans. Neural Networks.

[107]  Maureen C. Stone Toward a model of three-dimensional tongue movement , 1991 .

[108]  Franz Aurenhammer,et al.  Voronoi diagrams—a survey of a fundamental geometric data structure , 1991, CSUR.

[109]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[110]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[111]  William J. Hardcastle,et al.  EPG data reduction methods and their implications for studies of lingual coarticulation , 1991 .

[112]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[113]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[114]  Kenneth Kreutz-Delgado,et al.  Learning Global Direct Inverse Kinematics , 1991, NIPS.

[115]  W. Hardcastle,et al.  Visual display of tongue-palate contact: electropalatography in the assessment and remediation of speech disorders. , 1991, The British journal of disorders of communication.

[116]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[117]  R. Tibshirani Principal curves revisited , 1992 .

[118]  Victor N. Sorokin,et al.  Determination of vocal tract shape for vowels , 1992, Speech Commun..

[119]  A. Raftery,et al.  Ice Floe Identification in Satellite Images Using Mathematical Morphology and Clustering about Principal Curves , 1992 .

[120]  G Papcun,et al.  Inferring articulation and recognizing gestures from acoustics with a neural network trained on x-ray microbeam data. , 1992, The Journal of the Acoustical Society of America.

[121]  C. Browman,et al.  Articulatory Phonology: An Overview , 1992, Phonetica.

[122]  Michael I. Jordan,et al.  Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[123]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[124]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[125]  O. Kvalheim Latent Variable , 1992, The SAGE Encyclopedia of Research Design.

[126]  Roderick J. A. Little Regression with Missing X's: A Review , 1992 .

[127]  Nathan Intrator,et al.  Objective function formulation of the BCM theory of visual cortical plasticity: Statistical connections, stability conditions , 1992, Neural Networks.

[128]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[129]  Dietmar Saupe,et al.  Chaos and fractals - new frontiers of science , 1992 .

[130]  Erkki Oja,et al.  Principal components, minor components, and linear neural networks , 1992, Neural Networks.

[131]  Garrison W. Cottrell,et al.  Non-Linear Dimensionality Reduction , 1992, NIPS.

[132]  David A. Cohn,et al.  Neural Network Exploration Using Optimal Experiment Design , 1993, NIPS.

[133]  S. Pratt,et al.  The efficacy of using the IBM Speech Viewer Vowel Accuracy Module to treat young children with hearing impairment. , 1993, Journal of speech and hearing research.

[134]  Jooyoung Park,et al.  Approximation and Radial-Basis-Function Networks , 1993, Neural Computation.

[135]  C. C. Goodyear,et al.  On the use of neural networks in articulatory speech synthesis , 1993 .

[136]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[137]  Zoubin Ghahramani,et al.  Solving inverse problems using an EM approach to density estimation , 1993 .

[138]  Stephen M. Omohundro,et al.  Surface Learning with Applications to Lipreading , 1993, NIPS.

[139]  J. Besag,et al.  Spatial Statistics and Bayesian Computation , 1993 .

[140]  Michael I. Jordan,et al.  Supervised learning from incomplete data via an EM approach , 1993, NIPS.

[141]  William J. Hardcastle,et al.  Comparing phonetic, articulatory, acoustic and aerodynamic signal representations , 1993 .

[142]  J. Friedman,et al.  [A Statistical View of Some Chemometrics Regression Tools]: Response , 1993 .

[143]  A. Buja,et al.  Projection Pursuit Indexes Based on Orthonormal Function Expansions , 1993 .

[144]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[145]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[146]  A Marchal,et al.  Accor: Instrumentation and Database for the Cross-Language Study of Coarticulation , 1993, Language and speech.

[147]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[148]  Peter Dayan,et al.  Arbitrary Elastic Topologies and Ocular Dominance , 1993, Neural Computation.

[149]  Richard S. McGowan,et al.  Recovering articulatory movement from formant frequency trajectories using task dynamics and a genetic algorithm: Preliminary model tests , 1994, Speech Commun..

[150]  M. Verlaan,et al.  Non-uniqueness in probabilistic numerical identification of bacteria , 1994, Journal of Applied Probability.

[151]  J. Nadal Non linear neurons in the low noise limit : a factorial code maximizes information transferJean , 1994 .

[152]  R. Parker Geophysical Inverse Theory , 1994 .

[153]  Jenq-Neng Hwang,et al.  Regression modeling in back-propagation and projection pursuit learning , 1994, IEEE Trans. Neural Networks.

[154]  Thomas Martinetz,et al.  Topology representing networks , 1994, Neural Networks.

[155]  A. Marchal,et al.  Regenerating the spectral shapes of [s] and [∫] from a limited set of articulatory parameters , 1994 .

[156]  Juha Karhunen,et al.  Representation and separation of signals using nonlinear PCA type learning , 1994, Neural Networks.

[157]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[158]  Stephen M. Omohundro,et al.  Nonlinear Image Interpolation using Manifold Learning , 1994, NIPS.

[159]  S. P. Luttrell,et al.  A Bayesian Analysis of Self-Organizing Maps , 1994, Neural Computation.

[160]  S.Y. Kung,et al.  Adaptive Principal component EXtraction (APEX) and applications , 1994, IEEE Trans. Signal Process..

[161]  Alexander Basilevsky,et al.  Statistical Factor Analysis and Related Methods , 1994 .

[162]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[163]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[164]  F. Marriott,et al.  Some criteria for projection pursuit , 1994 .

[165]  Man Mohan Sondhi,et al.  Techniques for estimating vocal-tract shapes from the speech signal , 1994, IEEE Trans. Speech Audio Process..

[166]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[167]  Anthony J. Robinson,et al.  An application of recurrent nets to phone probability estimation , 1994, IEEE Trans. Neural Networks.

[168]  R. Tibshirani,et al.  Adaptive Principal Surfaces , 1994 .

[169]  Volker Tresp,et al.  Efficient Methods for Dealing with Missing Data in Supervised Learning , 1994, NIPS.

[170]  S. Srihari Mixture Density Networks , 1994 .

[171]  C. Benoît,et al.  A set of French visemes for visual speech synthesis , 1994 .

[172]  Thorsteinn S. Rögnvaldsson On Langevin Updating in Multilayer Perceptrons , 1994, Neural Computation.

[173]  Jack-Gérard Postaire,et al.  Convexity dependent morphological transformations for mode detection in cluster analysis , 1994, Pattern Recognit..

[174]  David J. Field,et al.  What Is the Goal of Sensory Coding? , 1994, Neural Computation.

[175]  Andrew R. Webb,et al.  Multidimensional scaling by iterative majorization using radial basis functions , 1995, Pattern Recognit..

[176]  N. Nguyen EPG bidimensional data reduction. , 1995, European journal of disorders of communication : the journal of the College of Speech and Language Therapists, London.

[177]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[178]  I. Zlokarnik Articulatory kinematics from the standpoint of automatic speech recognition , 1995 .

[179]  Barak A. Pearlmutter Gradient calculations for dynamic recurrent neural networks: a survey , 1995, IEEE Trans. Neural Networks.

[180]  Yoshua Bengio,et al.  Recurrent Neural Networks for Missing or Asynchronous Data , 1995, NIPS.

[181]  Albert Tarantola,et al.  Monte Carlo sampling of solutions to inverse problems , 1995 .

[182]  Vladimir Cherkassky,et al.  Self-Organization as an Iterative Kernel Smoothing Process , 1995, Neural Computation.

[183]  David Mackay,et al.  Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[184]  S. Klinke,et al.  Exploratory Projection Pursuit , 1995 .

[185]  James C. Bezdek,et al.  An index of topological preservation for feature extraction , 1995, Pattern Recognit..

[186]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[187]  D. Byrd,et al.  Using regions and indices in EPG data reduction. , 1995, Journal of speech and hearing research.

[188]  P. Warren,et al.  Categorising [s], [∫] and intermediate electropalatographic patterns: neural networks and other approaches , 1995 .

[189]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[190]  Anil K. Jain,et al.  Artificial neural networks for feature extraction and multivariate data projection , 1995, IEEE Trans. Neural Networks.

[191]  A. Pisani A non-parametric and scale-independent method for cluster analysis , 1995, astro-ph/9508150.

[192]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[193]  I. Zlokarnik Adding articulatory features to acoustic features for automatic speech recognition , 1995 .

[194]  W. Hardcastle,et al.  New developments in EPG3 software. , 1995, European journal of disorders of communication : the journal of the College of Speech and Language Therapists, London.

[195]  C. Fyfe,et al.  Finding compact and sparse-distributed representations of visual images , 1995 .

[196]  Simon Haykin,et al.  Optimally adaptive transform coding , 1995, IEEE Trans. Image Process..

[197]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[198]  C. Posse Tools for Two-Dimensional Exploratory Projection Pursuit , 1995 .

[199]  D. Mackay,et al.  Bayesian neural networks and density networks , 1995 .

[200]  Anil C. Kokaram,et al.  Interpolation of missing data in image sequences , 1995, IEEE Trans. Image Process..

[201]  Geoffrey E. Hinton,et al.  The EM algorithm for mixtures of factor analyzers , 1996 .

[202]  Steve Young,et al.  A review of large-vocabulary continuous-speech recognition , 1996 .

[203]  R. Baddeley,et al.  Searching for filters with 'interesting' output distributions: an uninteresting direction to explore? , 1996, Network.

[204]  P. M. Williams,et al.  Using Neural Networks to Model Conditional Multivariate Densities , 1996, Neural Computation.

[205]  Richard Rohwer,et al.  Minimum Description Length, Regularization, and Multimodal Data , 1996, Neural Computation.

[206]  George H. Freeman,et al.  An HMM‐based speech recognizer using overlapping articulatory features , 1996 .

[207]  Christopher M. Bishop,et al.  Modeling Conditional Probability Distributions for Periodic Variables , 1996, Neural Computation.

[208]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[209]  R. S. McGowan,et al.  Introduction to papers on speech recognition and perception from an articulatory point of view , 1996 .

[210]  Geoffrey E. Hinton,et al.  Using Generative Models for Handwritten Digit Recognition , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[211]  Fei Xie,et al.  Speech enhancement by spectral magnitude estimation - A unifying approach , 1996, Speech Commun..

[212]  Barak A. Pearlmutter,et al.  A Context-Sensitive Generalization of ICA , 1996 .

[213]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[214]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[215]  Christopher G. Atkeson,et al.  Implementing projection pursuit learning , 1996, IEEE Trans. Neural Networks.

[216]  T. Sanger,et al.  Probability density estimation for the interpretation of neural population codes. , 1996, Journal of neurophysiology.

[217]  Kenneth Kreutz-Delgado,et al.  Canonical parameterization of excess motor degrees of freedom with self-organizing maps , 1996, IEEE Trans. Neural Networks.

[218]  L. Wasserman,et al.  The Selection of Prior Distributions by Formal Rules , 1996 .

[219]  M M Sondhi,et al.  The potential role of speech production models in automatic speech recognition. , 1996, The Journal of the Acoustical Society of America.

[220]  Hani Yehia,et al.  A method to combine acoustic and morphological constraints in the speech production inverse problem , 1996, Speech Commun..

[221]  Janet Wiles,et al.  Using Bottlenecks in Feedforward Networks as a Dimension Reduction Technique: An Application to Optimization Tasks , 1996, Neural Computation.

[222]  A. Marchal,et al.  Modeling tongue-palate contact patterns in the production of speech , 1996 .

[223]  N. Swindale The development of topography in the visual cortex: a review of models. , 1996, Network.

[224]  Saeed Vaseghi,et al.  Advanced Signal Processing and Digital Noise Reduction , 1996 .

[225]  Aapo Hyvärinen,et al.  New Approximations of Differential Entropy for Independent Component Analysis and Projection Pursuit , 1997, NIPS.

[226]  Miguel A. Carreira-Perpi Density Networks for Dimension Reduction of Continuous Data: Analytical Solutions , 1997 .

[227]  Peter Dayan,et al.  Factor Analysis Using Delta-Rule Wake-Sleep Learning , 1997, Neural Computation.

[228]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[229]  J. Scales,et al.  Resolution of seismic waveform inversion: Bayes versus Occam , 1997 .

[230]  Akio Utsugi Hyperparameter Selection for Self-Organizing Maps , 1997, Neural Computation.

[231]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[232]  Stephen J. Roberts,et al.  Parametric and non-parametric unsupervised cluster analysis , 1997, Pattern Recognit..

[233]  Geoffrey E. Hinton,et al.  Modeling the manifolds of images of handwritten digits , 1997, IEEE Trans. Neural Networks.

[234]  P. Green,et al.  Corrigendum: On Bayesian analysis of mixtures with an unknown number of components , 1997 .

[235]  F. Lavagetto,et al.  Time-delay neural networks for estimating lip movements from speech analysis: a useful tool in audio-video synchronization , 1997, IEEE Trans. Circuits Syst. Video Technol..

[236]  Thomas Villmann,et al.  Topology preservation in self-organizing feature maps: exact definition and measurement , 1997, IEEE Trans. Neural Networks.

[237]  A. Utsugi Topology selection for self-organizing maps , 1996 .

[238]  Alex Pentland,et al.  Probabilistic Visual Learning for Object Representation , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[239]  Shun-ichi Amari,et al.  Adaptive Online Learning Algorithms for Blind Separation: Maximum Entropy and Minimum Mutual Information , 1997, Neural Computation.

[240]  Eric Moulines,et al.  Maximum likelihood for blind separation and deconvolution of noisy signals using mixture models , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[241]  Maia Berkane Latent Variable Modeling and Applications to Causality , 1997 .

[242]  J. Cast FLIGHT OVER WALL ST , 1997 .

[243]  Colin Fyfe,et al.  Stochastic ICA Contrast Maximisation Using Oja's Nonlinear PCA Algorithm , 1997, Int. J. Neural Syst..

[244]  S Makeig,et al.  Blind separation of auditory event-related brain responses into independent components. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[245]  Ah Chung Tsoi,et al.  Recurrent Neural Network Architectures: An Overview , 1997, Summer School on Neural Networks.

[246]  J. Cardoso Infomax and maximum likelihood for blind source separation , 1997, IEEE Signal Processing Letters.

[247]  Terrence J. Sejnowski,et al.  A Unifying Objective Function for Topographic Mappings , 1997, Neural Computation.

[248]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[249]  Joshua B. Tenenbaum,et al.  Mapping a Manifold of Perceptual Observations , 1997, NIPS.

[250]  Li Deng,et al.  Production models as a structural basis for automatic speech recognition , 1997, Speech Commun..

[251]  Xiao-Li Meng,et al.  The EM Algorithm—an Old Folk‐song Sung to a Fast New Tune , 1997 .

[252]  Christopher K. I. Williams,et al.  Magnification factors for the SOM and GTM algorithms , 1997 .

[253]  Sam T. Roweis,et al.  EM Algorithms for PCA and SPCA , 1997, NIPS.

[254]  Nanda Kambhatla,et al.  Dimension Reduction by Local Principal Component Analysis , 1997, Neural Computation.

[255]  S. Renals,et al.  Experimental evaluation of latent variable models for dimensionality reduction , 1998, Neural Networks for Signal Processing VIII. Proceedings of the 1998 IEEE Signal Processing Society Workshop (Cat. No.98TH8378).

[256]  Volker Tresp,et al.  Averaging, maximum penalized likelihood and Bayesian estimation for improving Gaussian mixture probability density estimates , 1998, IEEE Trans. Neural Networks.

[257]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[258]  E. C. Malthouse,et al.  Limitations of nonlinear PCA as performed with generic neural networks , 1998, IEEE Trans. Neural Networks.

[259]  Bhuvana Ramabhadran,et al.  Factor analysis invariant to linear transformations of data , 1998, ICSLP.

[260]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[261]  P. Bentler,et al.  A quasi-newton method for minimum trace factor analysis , 1998 .

[262]  D. Perrett,et al.  The `Ideal Homunculus': decoding neural population signals , 1998, Trends in Neurosciences.

[263]  K. Rose Deterministic annealing for clustering, compression, classification, regression, and related optimization problems , 1998, Proc. IEEE.

[264]  Shun-ichi Amari,et al.  Adaptive blind signal processing-neural network approaches , 1998, Proc. IEEE.

[265]  Ah Chung Tsoi,et al.  Universal Approximation Using Feedforward Neural Networks: A Survey of Some Existing Methods, and Some New Results , 1998, Neural Networks.

[266]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[267]  E. Rolls,et al.  Neural networks and brain function , 1998 .

[268]  T. Ens,et al.  Blind signal separation : statistical principles , 1998 .

[269]  Steven Greenberg,et al.  Robust speech recognition using the modulation spectrogram , 1998, Speech Commun..

[270]  Christopher M. Bishop,et al.  Developments of the generative topographic mapping , 1998, Neurocomputing.

[271]  Kenneth Kreutz-Delgado,et al.  Learning Global Properties of Nonredundant Kinematic Mappings , 1998, Int. J. Robotics Res..

[272]  Miguel Á. Carreira-Perpiñán,et al.  Dimensionality reduction of electropalatographic data using latent variable models , 1998, Speech Commun..

[273]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[274]  Christopher M. Bishop,et al.  Bayesian PCA , 1998, NIPS.

[275]  Li Deng,et al.  A dynamic, feature-based approach to the interface between phonology and phonetics for speech modeling and recognition , 1998, Speech Commun..

[276]  Tsuhan Chen,et al.  Audio-visual integration in multimodal communication , 1998, Proc. IEEE.

[277]  B L McNaughton,et al.  Interpreting neuronal population activity by reconstruction: unified framework with application to hippocampal place cells. , 1998, Journal of neurophysiology.

[278]  Stephen P. Brooks,et al.  Markov chain Monte Carlo method and its application , 1998 .

[279]  Mary J. Lindstrom,et al.  Differences among speakers in lingual articulation for American English /r/ , 1998, Speech Commun..

[280]  E N Brown,et al.  A Statistical Paradigm for Neural Spike Train Decoding Applied to Position Prediction from Ensemble Firing Patterns of Rat Hippocampal Place Cells , 1998, The Journal of Neuroscience.

[281]  Daniel D. Lee,et al.  Learning a Continuous Hidden Variable Model for Binary Data , 1998, NIPS.

[282]  Satoshi Nakamura,et al.  Lip movement synthesis from speech based on hidden Markov models , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[283]  David A. Nix,et al.  Maximum-Likelihood Continuity Mapping (MALCOM): An Alternative to HMMs , 1998, NIPS.

[284]  H. Attias EM algorithms for independent component analysis , 1998, Neural Networks for Signal Processing VIII. Proceedings of the 1998 IEEE Signal Processing Society Workshop (Cat. No.98TH8378).

[285]  J. Scales,et al.  Bayesian seismic waveform inversion: Parameter estimation and uncertainty analysis , 1998 .

[286]  D M Wolpert,et al.  Multiple paired forward and inverse models for motor control , 1998, Neural Networks.

[287]  Christopher M. Bishop,et al.  A Hierarchical Latent Variable Model for Data Visualization , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[288]  Andrew R. Webb,et al.  Exploratory Data Analysis Using Radial Basis Function Latent Variable Models , 1998, NIPS.

[289]  Peter J. W. Rayner,et al.  Digital Audio Restoration: A Statistical Model Based Approach , 1998 .

[290]  Hani Yehia,et al.  Quantitative association of vocal-tract and facial behavior , 1998, Speech Commun..

[291]  William D. Penny,et al.  Bayesian Approaches to Gaussian Mixture Modeling , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[292]  Hagai Attias,et al.  Independent Factor Analysis , 1999, Neural Computation.

[293]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[294]  Stefano Panzeri,et al.  Firing Rate Distributions and Efficiency of Information Transmission of Inferior Temporal Cortex Neurons to Natural Visual Stimuli , 1999, Neural Computation.

[295]  Miguel Á. Carreira-Perpiñán,et al.  Reconstruction of Sequential Data with Probabilistic Models and Continuity Constraints , 1999, NIPS.

[296]  Thomas Villmann,et al.  Neural maps and topographic vector quantization , 1999, Neural Networks.

[297]  Nicholas I. Fisher,et al.  Bump hunting in high-dimensional data , 1999, Stat. Comput..

[298]  Aapo Hyvärinen,et al.  Survey on Independent Component Analysis , 1999 .

[299]  Sayan Mukherjee,et al.  Support Vector Method for Multivariate Density Estimation , 1999, NIPS.

[300]  Mahesan Niranjan,et al.  Parametric subspace modeling of speech transitions , 1999, Speech Commun..

[301]  Terrence J. Sejnowski,et al.  Independent Component Analysis Using an Extended Infomax Algorithm for Mixed Subgaussian and Supergaussian Sources , 1999, Neural Computation.

[302]  Dirk Husmeier,et al.  Neural Networks for Conditional Probability Estimation , 1999, Perspectives in Neural Computing.

[303]  Sam T. Roweis,et al.  Constrained Hidden Markov Models , 1999, NIPS.

[304]  I. Nabney,et al.  Bayesian retrieval of scatterometer wind fields , 1999 .

[305]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[306]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[307]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[308]  Miguel Á. Carreira-Perpiñán,et al.  A latent-variable modelling approach to the acoustic-to-articulatory mapping problem. I , 1999 .

[309]  Jon Barker,et al.  Evidence of correlation between acoustic and visual features of speech , 1999 .

[310]  David J. C. MacKay,et al.  Comparison of Approximate Methods for Handling Hyperparameters , 1999, Neural Computation.

[311]  A. Azzalini,et al.  Statistical applications of the multivariate skew normal distribution , 2009, 0911.2093.

[312]  Li Deng,et al.  Initial evaluation of hidden dynamic models on conversational speech , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[313]  Daniel P. W. Ellis,et al.  Speech and Audio Signal Processing - Processing and Perception of Speech and Music, Second Edition , 1999 .

[314]  Mike Schuster,et al.  On supervised learning from sequential data with applications for speech regognition , 1999 .

[315]  Simon King,et al.  Dynamical system modelling of articulator movement. , 1999 .

[316]  Matthew Brand,et al.  Structure Learning in Conditional Probability Models via an Entropic Prior and Parameter Extinction , 1999, Neural Computation.

[317]  John S. Bridle,et al.  The HDM: a segmental hidden dynamic model of coarticulation , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[318]  Zoubin Ghahramani,et al.  Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[319]  Shun-ichi Amari,et al.  Natural Gradient Learning for Over- and Under-Complete Bases in ICA , 1999, Neural Computation.

[320]  J. Weston,et al.  Support vector density estimation , 1999 .

[321]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[322]  Alan A Wrench,et al.  A MULTI-CHANNEL/MULTI-SPEAKER ARTICULATORY DATABASE FOR CONTINUOUS SPEECH RECOGNITION RESEARCH , 2000 .

[323]  Alexander S. Leonov,et al.  Estimation of stability and accuracy of inverse problem solution for the vocal tract , 2000, Speech Commun..

[324]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[325]  Nathalie Japkowicz,et al.  Nonlinear Autoassociation Is Not Equivalent to PCA , 2000, Neural Computation.

[326]  J L Gallant,et al.  Sparse coding and decorrelation in primary visual cortex during natural vision. , 2000, Science.

[327]  Terrence J. Sejnowski,et al.  Learning Overcomplete Representations , 2000, Neural Computation.

[328]  Adam Krzyzak,et al.  Learning and Design of Principal Curves , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[329]  Zoubin Ghahramani,et al.  Computational principles of movement neuroscience , 2000, Nature Neuroscience.

[330]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[331]  Jon Barker,et al.  Soft decisions in missing data techniques for robust automatic speech recognition , 2000, INTERSPEECH.

[332]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[333]  Dan Cornford,et al.  Bayesian inference for wind field retrieval , 2000, Neurocomputing.

[334]  Miguel Á. Carreira-Perpiñán,et al.  Mode-Finding for Mixtures of Gaussian Distributions , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[335]  Simon King,et al.  An automatic speech recognition system using neural networks and linear dynamic models to recover and model articulatory traces , 2000, INTERSPEECH.

[336]  Thomas de Quincey [C] , 2000, The Works of Thomas De Quincey, Vol. 1: Writings, 1799–1820.

[337]  Geoffrey J. McLachlan,et al.  Robust mixture modelling using the t distribution , 2000, Stat. Comput..

[338]  Geoffrey E. Hinton,et al.  SMEM Algorithm for Mixture Models , 1998, Neural Computation.

[339]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[340]  B Willmore,et al.  A Comparison of Natural-Image-Based Models of Simple-Cell Coding , 2000, Perception.

[341]  Miguel Á. Carreira-Perpiñán,et al.  Practical Identifiability of Finite Mixtures of Multivariate Bernoulli Distributions , 2000, Neural Computation.

[342]  Lawrence K. Saul,et al.  Maximum likelihood and minimum classification error factor analysis for automatic speech recognition , 2000, IEEE Trans. Speech Audio Process..

[343]  Dan Cornford,et al.  Structured neural network modelling of multi-valued functions for wind vector retrieval from satellite scatterometer measurements , 2000, Neurocomputing.

[344]  C S Blackburn,et al.  A self-learning predictive model of articulator movements during speech production. , 2000, The Journal of the Acoustical Society of America.

[345]  William J. J. Roberts,et al.  Hidden Markov modeling of speech using Toeplitz covariance matrices , 2000, Speech Commun..

[346]  R. Snieder Inverse Problems in Geophysics , 2001 .

[347]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[348]  Daniel P. W. Ellis,et al.  The auditory organization of speech and other sources in listeners and computational models , 2001, Speech Commun..

[349]  A. Utsugi,et al.  Bayesian Analysis of Mixtures of Factor Analyzers , 2001, Neural Computation.

[350]  Stan Lipovetsky,et al.  Latent Variable Models and Factor Analysis , 2001, Technometrics.

[351]  Christopher K. I. Williams,et al.  Modelling Frontal Discontinuities in Wind Fields , 2002 .

[352]  James R. Schott,et al.  Principles of Multivariate Analysis: A User's Perspective , 2002 .

[353]  J. Wade Davis,et al.  Statistical Pattern Recognition , 2003, Technometrics.