Bayesian Inference in Nonlinear and Relational Latent Variable Models

Statistical data analysis is becoming more and more important when growing amounts of data are collected in various fields of life. Automated learning algorithms provide a way to discover relevant concepts and representations that can be further used in analysis and decision making. Graphical models are an important subclass of statistical machine learning that have clear semantics and a sound theoretical foundation. A graphical model is a graph whose nodes represent random variables and edges define the dependency structure between them. Bayesian inference solves the probability distribution over unknown variables given the data. Graphical models are modular, that is, complex systems can be built by combining simple parts. Applying graphical models within the limits used in the 1980s is straightforward, but relaxing the strict assumptions is a challenging and an active field of research. This thesis introduces, studies, and improves extensions of graphical models that can be roughly divided into two categories. The first category involves nonlinear models inspired by neural networks. Variational Bayesian learning is used to counter overfitting and computational complexity. A framework where efficient update rules are derived automatically for a model structure given by the user, is introduced. Compared to similar existing systems, it provides new functionality such as nonlinearities and variance modelling. Variational Bayesian methods are applied to reconstructing corrupted data and to controlling a dynamic system. A new algorithm is developed for efficient and reliable inference in nonlinear state-space models. The second category involves relational models. This means that observations may have distinctive internal structure and they may be linked to each other. A novel method called logical hidden Markov model is introduced for analysing sequences of logical atoms, and applied to classifying protein secondary structures. Algorithms for inference, parameter estimation, and structural learning are given. Also, the first graphical model for analysing nonlinear dependencies in relational data, is introduced in the thesis. Raiko, T. (2006): Bayesilainen paattely epalineaarisissa ja rakenteisissa piilomuuttujamalleissa. Tohtorin vaitoskirja, Teknillinen korkeakoulu, Dissertations in Computer and Information Science, raportti D18, Espoo, Suomi. Avainsanat: koneoppiminen, graafiset mallit, todennakoisyyslaskentaan perustuva paattely, epalineaariset mallit, variaatiomenetelmat, tila-avaruusmallit, piiloMarkov -malli, induktiivinen logiikkaohjelmointi, ensimmaisen kertaluvun logiikka

[1]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[2]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[3]  Luc De Raedt,et al.  Towards Combining Inductive Logic Programming with Bayesian Networks , 2001, ILP.

[4]  Stefan Riezler,et al.  Statistical Inference and Probabilistic Modelling for Constraint-Based NLP , 1999, ArXiv.

[5]  Juha Karhunen,et al.  Advances in blind source separation (BSS) and independent component analysis (ICA) for nonlinear mixtures , 2004, Int. J. Neural Syst..

[6]  Stephen P. Boyd,et al.  Future directions in control in an information-rich world , 2003 .

[7]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[8]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[9]  P. Gehler,et al.  An introduction to graphical models , 2001 .

[10]  Luc De Raedt,et al.  nFOIL: Integrating Naïve Bayes and FOIL , 2005, AAAI.

[11]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[12]  E. F. CODD,et al.  A relational model of data for large shared data banks , 1970, CACM.

[13]  Nir Friedman,et al.  The Bayesian Structural EM Algorithm , 1998, UAI.

[14]  Hagai Attias,et al.  Planning by Probabilistic Inference , 2003, AISTATS.

[15]  H. White,et al.  Universal approximation using feedforward networks with non-sigmoid hidden layer activation functions , 1989, International 1989 Joint Conference on Neural Networks.

[16]  Heikki Mannila,et al.  Constrained hidden Markov models for population-based haplotyping , 2007, BMC Bioinformatics.

[17]  Robert F. Engle,et al.  Advances in Econometrics: The Kalman filter: applications to forecasting and rational-expectations models , 1987 .

[18]  Antti Honkela,et al.  Empirical evidence of the linear nature of magnetoencephalograms , 2005, ESANN.

[19]  Jouko Lampinen,et al.  Rao-Blackwellized particle filter for multiple target tracking , 2007, Inf. Fusion.

[20]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[21]  Antti Honkela,et al.  Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method , 2004, ICA.

[22]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[23]  Michael Grüninger,et al.  Introduction , 2002, CACM.

[24]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[25]  Matthew J. Beal,et al.  The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures , 2003 .

[26]  Henri Prade,et al.  Fuzzy sets and probability: misunderstandings, bridges and gaps , 1993, [Proceedings 1993] Second IEEE International Conference on Fuzzy Systems.

[27]  Thore Graepel,et al.  Modelling Uncertainty in the Game of Go , 2004, NIPS.

[28]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[29]  Brendan J. Frey,et al.  Variational Learning in Nonlinear Gaussian Belief Networks , 1999, Neural Computation.

[30]  Tapani Raiko,et al.  Variational Bayesian Approach for Nonlinear Identification and Control , 2006 .

[31]  Donald E. Kirk,et al.  Optimal Control Theory , 1970 .

[32]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[33]  Pedro M. Domingos,et al.  Relational Markov models and their application to adaptive web navigation , 2002, KDD.

[34]  Yasubumi Sakakibara,et al.  Pair hidden Markov models on tree structures , 2003, ISMB.

[35]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[36]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[37]  Branko Ristic,et al.  Beyond the Kalman Filter: Particle Filters for Tracking Applications , 2004 .

[38]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[39]  Juha Karhunen,et al.  Building Blocks for Variational Bayesian Learning of Latent Variable Models , 2007, J. Mach. Learn. Res..

[40]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[41]  C. Helma,et al.  Statistical Methods in Medical Research Knowledge Discovery and Data Mining in Toxicology , 2022 .

[42]  Terrence J. Sejnowski,et al.  Graphical Models: Foundations of Neural Computation , 2001, Pattern Anal. Appl..

[43]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[44]  Jon Barker,et al.  Handling Missing and Unreliable Information in Speech Recognition , 2001, AISTATS.

[45]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[46]  Ben Taskar,et al.  Rich probabilistic models for gene expression , 2001, ISMB.

[47]  Patrice Koehl,et al.  The ASTRAL Compendium in 2004 , 2003, Nucleic Acids Res..

[48]  M. Psiaki Backward-Smoothing Extended Kalman Filter , 2005 .

[49]  José A. Gámez,et al.  Advances in Bayesian networks , 2004 .

[50]  Jorge Calera-Rubio,et al.  Stochastic Inference of Regular Tree Languages , 2004, Machine Learning.

[51]  Jay H. Lee,et al.  Model predictive control: past, present and future , 1999 .

[52]  Charles M. Bishop,et al.  Variational Message Passing , 2005, J. Mach. Learn. Res..

[53]  F. Rosenqvist,et al.  Realisation and estimation of piecewise-linear output-error models , 2005, Autom..

[54]  Petros G. Voulgaris,et al.  On optimal ℓ∞ to ℓ∞ filtering , 1995, Autom..

[55]  Saul Greenberg,et al.  USING UNIX: COLLECTED TRACES OF 168 USERS , 1988 .

[56]  Luc De Raedt,et al.  Probabilistic Inductive Logic Programming , 2004, ALT.

[57]  Thomas G. Dietterich,et al.  Editors. Advances in Neural Information Processing Systems , 2002 .

[58]  Zoubin Ghahramani,et al.  Propagation Algorithms for Variational Bayesian Learning , 2000, NIPS.

[59]  Antti Honkela,et al.  Variational learning and bits-back coding: an information-theoretic view to Bayesian learning , 2004, IEEE Transactions on Neural Networks.

[60]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[61]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[62]  Geoffrey E. Hinton Modeling High-Dimensional Data by Combining Simple Experts , 2000, AAAI/IAAI.

[63]  Y. Bar-Shalom Stochastic dynamic programming: Caution and probing , 1981 .

[64]  Tadeusz Pietraszek,et al.  Data mining and machine learning - Towards reducing false positives in intrusion detection , 2005, Inf. Secur. Tech. Rep..

[65]  R. T. Cox Probability, frequency and reasonable expectation , 1990 .

[66]  Richard M. Stern,et al.  A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition , 2004, Speech Commun..

[67]  Richard M. Everson,et al.  Independent Component Analysis: Principles and Practice , 2001 .

[68]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[69]  E. Oja,et al.  Nonlinear Blind Source Separation by Variational Bayesian Learning , 2003, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[70]  S. Haykin Kalman Filtering and Neural Networks , 2001 .

[71]  T. Bayes An essay towards solving a problem in the doctrine of chances , 2003 .

[72]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[73]  Michael I. Jordan,et al.  Learning Fine Motion by Markov Mixtures of Experts , 1995, NIPS.

[74]  Luc De Raedt,et al.  Inductive Logic Programming: Theory and Methods , 1994, J. Log. Program..

[75]  Bruce A. Francis,et al.  Feedback Control Theory , 1992 .

[76]  Tom Minka,et al.  Principled Hybrids of Generative and Discriminative Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[77]  Joshua Goodman,et al.  Probabilistic Feature Grammars , 1997, IWPT.

[78]  J. Karhunen,et al.  Building Blocks for Hierarchical Latent Variable Models , 2001 .

[79]  J. Bresnan Lexical-Functional Syntax , 2000 .

[80]  Hagai Attias,et al.  Independent Factor Analysis , 1999, Neural Computation.

[81]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[82]  Francisco Javier Díez,et al.  Parameter adjustment in Bayes networks. The generalized noisy OR-gate , 1993, UAI.

[83]  Yasubumi Sakakibara,et al.  Efficient Learning of Context-Free Grammars from Positive Structural Examples , 1992, Inf. Comput..

[84]  Daphne Koller,et al.  Probabilistic Relational Models , 1999, ILP.

[85]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[86]  Nikunj C. Oza,et al.  Online Ensemble Learning , 2000, AAAI/IAAI.

[87]  Karl Johan Åström,et al.  Control of complex systems , 2001 .

[88]  Juha Karhunen,et al.  Accelerating Cyclic Update Algorithms for Parameter Estimation by Pattern Searches , 2003, Neural Processing Letters.

[89]  Bruno Bouzy Mathematical Morphology Applied to Computer Go , 2003, Int. J. Pattern Recognit. Artif. Intell..

[90]  J. Baker Trainable grammars for speech recognition , 1979 .

[91]  De Raedt,et al.  Advances in Inductive Logic Programming , 1996 .

[92]  Giovanni Soda,et al.  Hidden Markov Models for Text Categorization in Multi-Page Documents , 2002, Journal of Intelligent Information Systems.

[93]  Yoshitaka Kameya,et al.  Parameter Learning of Logic Programs for Symbolic-Statistical Modeling , 2001, J. Artif. Intell. Res..

[94]  Steven P. Abney Stochastic Attribute-Value Grammars , 1996, CL.

[95]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[96]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[97]  Ellery Eells,et al.  Choices: An Introduction to Decision Theory. , 1990 .

[98]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[99]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[100]  Sebastian Thrun,et al.  The role of exploration in learning control , 1992 .

[101]  Zoubin Ghahramani,et al.  Learning Dynamic Bayesian Networks , 1997, Summer School on Neural Networks.

[102]  E. B. Andersen,et al.  Modern factor analysis , 1961 .

[103]  A. Kabán,et al.  A variational Bayesian method for rectified factor analysis , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[104]  Ben Taskar,et al.  Bayesian Logic Programming: Theory and Tool , 2007 .

[105]  Juha Karhunen,et al.  Bayes Blocks: An Implementation of the Variational Bayesian Building Blocks Framework , 2005, UAI.

[106]  Martin Müller,et al.  Computer Go , 2002, Artif. Intell..

[107]  Luc De Raedt,et al.  Probabilistic logic learning , 2003, SKDD.

[108]  David M. Pennock,et al.  Statistical relational learning for document mining , 2003, Third IEEE International Conference on Data Mining.

[109]  Michael P. Wellman,et al.  Planning and Control , 1991 .

[110]  R. Reiter On Closed World Data Bases , 1987, Logic and Data Bases.

[111]  H. Attias Independent Component Analysis: ICA, graphical models and variational methods , 2001 .

[112]  R. C. Underwood,et al.  Stochastic context-free grammars for tRNA modeling. , 1994, Nucleic acids research.

[113]  Kaare Brandt Petersen,et al.  On the Slow Convergence of EM and VBEM in Low-Noise Linear Models , 2005, Neural Computation.

[114]  Christopher M. Bishop Latent Variable Models , 1998, Learning in Graphical Models.

[115]  Guy J. Brown,et al.  Techniques for handling convolutional distortion with 'missing data' automatic speech recognition , 2004, Speech Commun..

[116]  Ben Taskar,et al.  Learning Probabilistic Models of Link Structure , 2003, J. Mach. Learn. Res..

[117]  Aravind K. Joshi,et al.  Skeletal Structural Descriptions , 1978, Inf. Control..

[118]  Erkki Oja,et al.  Nonlinear dynamical factor analysis for state change detection , 2004, IEEE Transactions on Neural Networks.

[119]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[120]  Luc De Raedt,et al.  Towards Discovering Structural Signatures of Protein Folds Based on Logical Hidden Markov Models , 2003, Pacific Symposium on Biocomputing.

[121]  David Haussler,et al.  A Generalized Hidden Markov Model for the Recognition of Human Genes in DNA , 1996, ISMB.

[122]  Abraham Wald,et al.  Statistical Decision Functions , 1951 .

[123]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[124]  C. S. Wallace,et al.  Classification by Minimum-Message-Length Inference , 1991, ICCI.

[125]  Antti Honkela,et al.  Unsupervised Variational Bayesian Learning of Nonlinear Models , 2004, NIPS.

[126]  Tapani Raiko Nonlinear Relational Markov Networks with an Application to the Game of Go , 2005, ICANN.

[127]  Volker Tresp,et al.  Nonlinear Markov Networks for Continuous Variables , 1997, NIPS.

[128]  Stephen J. Roberts,et al.  An ensemble learning approach to independent component analysis , 2000, Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501).

[129]  Pedro M. Domingos,et al.  Dynamic Probabilistic Relational Models , 2003, IJCAI.

[130]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[131]  Hendrik Blockeel,et al.  User modeling with sequential data , 2003 .

[132]  Fernando Pereira,et al.  Relating Probabilistic Grammars and Automata , 1999, ACL.

[133]  Adam Prügel-Bennett,et al.  Evolving the structure of hidden Markov models , 2006, IEEE Transactions on Evolutionary Computation.

[134]  Vasant Honavar,et al.  Efficient Markov Network Structure Discovery using Independence Tests , 2006, SDM.

[135]  Valeria De Fonzo,et al.  Hidden Markov Models in Bioinformatics , 2007 .

[136]  Erkki Oja,et al.  Jammer suppression in DS-CDMA arrays using independent component analysis , 2006, IEEE Transactions on Wireless Communications.

[137]  Hendrik Blockeel,et al.  The Learning Shell: Automated Macro Construction , 2001, User Modeling.

[138]  Tapani Raiko The Go-Playing Program Called Go81 , 2004 .

[139]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[140]  Gary Riley,et al.  Expert Systems: Principles and Programming , 2004 .

[141]  Kristian Kersting,et al.  Scaled Conjugate Gradients for Maximum Likelihood: An Empirical Comparison with the EM Algorithm , 2002, Probabilistic Graphical Models.

[142]  Peter Haddawy,et al.  Answering Queries from Context-Sensitive Probabilistic Knowledge Bases , 1997, Theor. Comput. Sci..

[143]  Jeffrey K. Uhlmann,et al.  New extension of the Kalman filter to nonlinear systems , 1997, Defense, Security, and Sensing.

[144]  J. R. Quinlan Learning Logical Definitions from Relations , 1990 .

[145]  Juha Karhunen,et al.  Bayesian Learning of Logical Hidden Markov Models , 2002 .

[146]  Juha Karhunen,et al.  An Unsupervised Ensemble Learning Method for Nonlinear Dynamic State-Space Models , 2002, Neural Computation.

[147]  Russell Greiner,et al.  Predicting UNIX Command Lines: Adjusting to User Patterns , 2000, AAAI/IAAI.

[148]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[149]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[150]  Volker Tresp,et al.  Discovering Structure in Continuous Variables Using Bayesian Networks , 1995, NIPS.

[151]  Steffen L. Lauritzen,et al.  Bayesian updating in causal probabilistic networks by local computations , 1990 .

[152]  Luc De Raedt,et al.  Bayesian Logic Programs , 2001, ILP Work-in-progress reports.

[153]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[154]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[155]  Antti Honkela,et al.  Bayes Blocks Software Library , 2003 .

[156]  C. Hanson,et al.  Artificial intelligence applications in the intensive care unit , 2001, Critical care medicine.

[157]  Stephen Muggleton,et al.  The Effect of Relational Background Knowledge on Learning of Protein Three-Dimensional Fold Signatures , 2001, Machine Learning.

[158]  Zoubin Ghahramani,et al.  Optimization with EM and Expectation-Conjugate-Gradient , 2003, ICML.

[159]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[160]  David Mackay,et al.  Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[161]  J. Nazuno Haykin, Simon. Neural networks: A comprehensive foundation, Prentice Hall, Inc. Segunda Edición, 1999 , 2000 .

[162]  Alexander Ilin,et al.  On the Effect of the Form of the Posterior Approximation in Variational Learning of ICA Models , 2005, Neural Processing Letters.

[163]  J. Kocijan,et al.  Predictive control with Gaussian process models , 2003, The IEEE Region 8 EUROCON 2003. Computer as a Tool..

[164]  N. Meyers,et al.  H = W. , 1964, Proceedings of the National Academy of Sciences of the United States of America.

[165]  Astronomy,et al.  A data-driven Bayesian approach for finding young stellar populations in early-type galaxies from their ultraviolet-optical spectra , 2005, astro-ph/0511503.

[166]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[167]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[168]  Aapo Hyvärinen,et al.  Emergence of Topography and Complex Cell Properties from Natural Images using Extensions of ICA , 1999, NIPS.

[169]  Antti Honkela,et al.  Bayesian Non-Linear Independent Component Analysis by Multi-Layer Perceptrons , 2000 .

[170]  Kevin Murphy,et al.  Bayes net toolbox for Matlab , 1999 .

[171]  Manfred Jaeger,et al.  Relational Bayesian Networks , 1997, UAI.

[172]  C H Chen Neural networks in pattern recognition and their applications , 1991 .

[173]  Nir Friedman,et al.  Learning Belief Networks in the Presence of Missing Values and Hidden Variables , 1997, ICML.

[174]  George J. Klir,et al.  Fuzzy sets and fuzzy logic - theory and applications , 1995 .

[175]  Koichi Furukawa,et al.  Machine Intelligence 15, Intelligent Agents [St. Catherine's College, Oxford, UK, July 1995] , 1999, Machine Intelligence 15.

[176]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[177]  David J. C. MacKay,et al.  Developments in Probabilistic Modelling with Neural Networks - Ensemble Learning , 1995, SNN Symposium on Neural Networks.

[178]  Marko Bacic,et al.  Model predictive control , 2003 .

[179]  Thorsten Meinl,et al.  Graph based molecular data mining - an overview , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[180]  David J. C. Mackay,et al.  Introduction to Monte Carlo Methods , 1998, Learning in Graphical Models.

[181]  J. Karhunen,et al.  Nonlinear Independent Factor Analysis by Hierarchical Models , 2003 .

[182]  Luc De Raedt,et al.  Bayesian Logic Programming: Theory and Tool , 2007 .

[183]  Terran Lane,et al.  Hidden Markov Models for Human/Computer Interface Modeling , 1999 .

[184]  Nada Lavrač,et al.  An Introduction to Inductive Logic Programming , 2001 .

[185]  Michael I. Jordan,et al.  Probabilistic Networks and Expert Systems , 1999 .

[186]  Erkki Oja,et al.  Independent Component Analysis for Identification of Artifacts in Magnetoencephalographic Recordings , 1997, NIPS.

[187]  Stephen Muggleton,et al.  Efficient Induction of Logic Programs , 1990, ALT.

[188]  Heikki Mannila,et al.  Hidden Markov Modelling Techniques for Haplotype Analysis , 2004, ALT.

[189]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[190]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[191]  Brian D. Davison,et al.  Predicting Sequences of User Actions , 1998 .

[192]  Yoram Singer,et al.  The Hierarchical Hidden Markov Model: Analysis and Applications , 1998, Machine Learning.

[193]  Juha Karhunen,et al.  Missing Values in Hierarchical Nonlinear Factor Analysis , 2003 .

[194]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[195]  D. Mackay Local Minima, Symmetry-breaking, and Model Pruning in Variational Free Energy Minimization , 2001 .

[196]  Mari Ostendorf,et al.  HMM topology design using maximum likelihood successive state splitting , 1997, Comput. Speech Lang..

[197]  Esa Alhoniemi,et al.  Self-organizing map in Matlab: the SOM Toolbox , 1999 .

[198]  Luc De Raedt,et al.  Adaptive Bayesian Logic Programs , 2001, ILP.

[199]  Leon Sterling,et al.  The Art of Prolog , 1987, IEEE Expert.

[200]  Saso Dzeroski From Inductive Logic Programming to Relational Data Mining , 2006, JELIA.

[201]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[202]  J. W. Miskin,et al.  Ensemble Learning for Blind Source Separation , 2001 .

[203]  Alex M. Andrew,et al.  Logic for Learning: Learning Comprehensible Theories from Structured Data , 2004 .

[204]  Antti Honkela,et al.  Post-nonlinear Independent Component Analysis by Variational Bayesian Learning , 2004, ICA.

[205]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[206]  Tapani Raiko,et al.  "Say EM" for Selecting Probabilistic Models for Logical Sequences , 2005, UAI.

[207]  R. Baierlein Probability Theory: The Logic of Science , 2004 .

[208]  Andreas Stolcke,et al.  Hidden Markov Model} Induction by Bayesian Model Merging , 1992, NIPS.

[209]  Adam Prügel-Bennett,et al.  The Block Hidden Markov Model for Biological Sequence Analysis , 2004, KES.

[210]  Juha Karhunen,et al.  Hierarchical models of variance sources , 2004, Signal Process..

[211]  Juha Karhunen,et al.  State Inference in Variational Bayesian Nonlinear State-Space Models , 2006, ICA.

[212]  Erkki Oja,et al.  Independent Component Analysis , 2001 .

[213]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[214]  Niels Kjølstad Poulsen,et al.  Neural Networks for Modelling and Control of Dynamic Systems: A Practitioner’s Handbook , 2000 .

[215]  Heikki Mannila,et al.  An MDL Method for Finding Haplotype Blocks and for Estimating the Strength of Haplotype Block Boundaries , 2002, Pacific Symposium on Biocomputing.

[216]  Pierre Comon Independent component analysis - a new concept? signal processing , 1994 .

[217]  Walter R. Gilks,et al.  BUGS - Bayesian inference Using Gibbs Sampling Version 0.50 , 1995 .

[218]  T. Raiko,et al.  Partially observed values , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[219]  Chi-Tsong Chen,et al.  Linear System Theory and Design , 1995 .

[220]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[221]  Stefan Wrobel,et al.  Relational Instance-Based Learning with Lists and Terms , 2001, Machine Learning.

[222]  Nebojsa Jojic,et al.  LOCUS: learning object classes with unsupervised segmentation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[223]  Tapani Raiko,et al.  A Structural GEM for Learning Logical Hidden Markov Models , 2003 .

[224]  T. Raiko,et al.  Learning nonlinear state-space models for control , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[225]  Saso Dzeroski,et al.  Inductive Logic Programming: Techniques and Applications , 1993 .

[226]  Charles M. Bishop,et al.  Ensemble learning in Bayesian neural networks , 1998 .

[227]  Guanrong Chen,et al.  Kalman Filtering with Real-time Applications , 1987 .