Active inference on discrete state-spaces: A synthesis

Active inference is a normative principle underwriting perception, action, planning, decision-making and learning in biological or artificial agents. From its inception, its associated process theory has grown to incorporate complex generative models, enabling simulation of a wide range of complex behaviours. Due to successive developments in active inference, it is often difficult to see how its underlying principle relates to process theories and practical implementation. In this paper, we try to bridge this gap by providing a complete mathematical synthesis of active inference on discrete state-space models. This technical summary provides an overview of the theory, derives neuronal dynamics from first principles and relates this dynamics to biological processes. Furthermore, this paper provides a fundamental building block needed to understand active inference for mixed generative models; allowing continuous sensations to inform discrete representations. This paper may be used as follows: to guide research towards outstanding challenges, a practical guide on how to implement active inference to simulate experimental behaviour, or a pointer towards various in-silico neurophysiological responses that may be used to make empirical predictions.

[1]  M. Tovée,et al.  Processing speed in the cerebral cortex and the neurophysiology of visual masking , 1994, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[2]  Karl J. Friston,et al.  Post hoc Bayesian model selection , 2011, NeuroImage.

[3]  Desmond P. Taylor,et al.  Is Information in the Brain Represented in Continuous or Discrete Form? , 2018, IEEE Transactions on Molecular, Biological and Multi-Scale Communications.

[4]  Karl J. Friston,et al.  The Dynamic Brain: From Spiking Neurons to Neural Masses and Cortical Fields , 2008, PLoS Comput. Biol..

[5]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[6]  Raymond J. Dolan,et al.  Active Inference, Evidence Accumulation, and the Urn Task , 2015, Neural Computation.

[7]  Karl J. Friston,et al.  What is value—accumulated reward or evidence? , 2012, Front. Neurorobot..

[8]  Joshua B Tenenbaum,et al.  Toward the neural implementation of structure learning , 2016, Current Opinion in Neurobiology.

[9]  Hilbert J. Kappen,et al.  Risk Sensitive Path Integral Control , 2010, UAI.

[10]  Karl J. Friston,et al.  Prefrontal Computation as Active Inference , 2019, Cerebral cortex.

[11]  Karl J. Friston,et al.  Optimal inference with suboptimal models: Addiction and active Bayesian inference , 2015, Medical hypotheses.

[12]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[13]  KE Stephan,et al.  Bayesian Model Selection for Group Studies , 2009, NeuroImage.

[14]  S. Luck,et al.  Discrete fixed-resolution representations in visual working memory , 2008, Nature.

[15]  Karl J. Friston,et al.  Human visual exploration reduces uncertainty about the sensed world , 2018, PloS one.

[16]  Hagai Attias,et al.  Planning by Probabilistic Inference , 2003, AISTATS.

[17]  Raymond J. Dolan,et al.  The anatomy of choice: dopamine and decision-making , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[18]  Daniel M. Wolpert,et al.  Hierarchical MOSAIC for movement generation , 2003 .

[19]  S. Haber The primate basal ganglia: parallel and integrative networks , 2003, Journal of Chemical Neuroanatomy.

[20]  M. Botvinick,et al.  Planning as inference , 2012, Trends in Cognitive Sciences.

[21]  Karl J. Friston,et al.  A variational approach to niche construction , 2018, Journal of The Royal Society Interface.

[22]  J. Gold,et al.  The Basal Ganglia’s Contributions to Perceptual Decision Making , 2013, Neuron.

[23]  Edward L. Deci,et al.  Intrinsic Motivation and Self-Determination in Human Behavior , 1975, Perspectives in Social Psychology.

[24]  Judea Pearl,et al.  Graphical Models for Probabilistic and Causal Reasoning , 1997, The Computer Science and Engineering Handbook.

[25]  Liping Wang,et al.  Large-Scale Cortical Networks for Hierarchical Prediction and Prediction Error in the Primate Brain , 2018, Neuron.

[26]  Raymond J. Dolan,et al.  Exploration, novelty, surprise, and free energy minimization , 2013, Front. Psychol..

[27]  W. Peddie,et al.  Helmholtz's Treatise on Physiological Optics , 1924, Nature.

[28]  Karl J. Friston,et al.  Regimes of Expectations: An Active Inference Model of Social Conformity and Human Decision Making , 2019, Front. Psychol..

[29]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[30]  Karl J. Friston,et al.  The Dopaminergic Midbrain Encodes the Expected Certainty about Desired Outcomes , 2014, Cerebral cortex.

[31]  Karl J. Friston,et al.  Free-energy and the brain , 2007, Synthese.

[32]  Michael I. Jordan,et al.  Hidden Markov Decision Trees , 1996, NIPS.

[33]  W. Ashby,et al.  Every Good Regulator of a System Must Be a Model of That System , 1970 .

[34]  T. Sejnowski,et al.  How the Basal Ganglia Make Decisions , 1996 .

[35]  Karl J. Friston,et al.  Free-Energy and Illusions: The Cornsweet Effect , 2011, Front. Psychology.

[36]  Earl K. Miller,et al.  Shifting the Spotlight of Attention: Evidence for Discrete Computations in Cognition , 2010, Front. Hum. Neurosci..

[37]  Kimron Shapiro,et al.  Direct measurement of attentional dwell time in human vision , 1994, Nature.

[38]  J. Rothwell,et al.  A fronto–striato–subthalamic–pallidal network for goal-directed and habitual inhibition , 2015, Nature Reviews Neuroscience.

[39]  T. Sharot The optimism bias , 2011, Current Biology.

[40]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[41]  Karl J. Friston The free-energy principle: a rough guide to the brain? , 2009, Trends in Cognitive Sciences.

[42]  Peter E. Rossi,et al.  Hierarchical Bayes Models: A Practitioners Guide , 2005 .

[43]  Karl J. Friston,et al.  Active inference and agency: optimal control without cost functions , 2012, Biological Cybernetics.

[44]  Karl J. Friston,et al.  The relationship between dynamic programming and active inference: the discrete, finite-horizon case , 2020, ArXiv.

[45]  Mikhail Rabinovich,et al.  Learning of Chunking Sequences in Cognition and Behavior , 2015, PLoS Comput. Biol..

[46]  Y. Niv,et al.  Learning latent structure: carving nature at its joints , 2010, Current Opinion in Neurobiology.

[47]  Karl J. Friston,et al.  Active Inference and Auditory Hallucinations , 2018, Computational Psychiatry.

[48]  Karl J. Friston,et al.  Active Inference: A Process Theory , 2017, Neural Computation.

[49]  Karl J. Friston,et al.  An active inference model of concept learning , 2019 .

[50]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[51]  Karl J. Friston,et al.  With an eye on uncertainty: Modelling pupillary responses to environmental volatility , 2019, PLoS Comput. Biol..

[52]  Karl J. Friston,et al.  Free-energy minimization in joint agent-environment systems: A niche construction perspective , 2018, Journal of theoretical biology.

[53]  Karl J. Friston,et al.  Towards a Neuronal Gauge Theory , 2016, PLoS biology.

[54]  Karl J. Friston,et al.  Neural masses and fields in dynamic causal modeling , 2013, Front. Comput. Neurosci..

[55]  Karl J. Friston,et al.  Reinforcement Learning or Active Inference? , 2009, PloS one.

[56]  Eric Schulz,et al.  Generalization guides human exploration in vast decision spaces , 2018 .

[57]  Paul B. Reverdy Modeling Human Decision-making in Multi-armed Bandits , 2013 .

[58]  Karl J. Friston,et al.  Active inference, sensory attenuation and illusions , 2013, Cognitive Processing.

[59]  Karl J. Friston,et al.  Locus Coeruleus tracking of prediction errors optimises cognitive flexibility: An Active Inference model , 2018, bioRxiv.

[60]  Karl J. Friston,et al.  A free energy principle for the brain , 2006, Journal of Physiology-Paris.

[61]  Karl J. Friston,et al.  Scene Construction, Visual Foraging, and Active Inference , 2016, Front. Comput. Neurosci..

[62]  Nils Lid Hjort,et al.  Model Selection and Model Averaging , 2001 .

[63]  Yi Sun,et al.  Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments , 2011, AGI.

[64]  J. O'Keefe,et al.  The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat. , 1971, Brain research.

[65]  A. Barto,et al.  Novelty or Surprise? , 2013, Front. Psychol..

[66]  H. Haken Synergetics: an Introduction, Nonequilibrium Phase Transitions and Self-organization in Physics, Chemistry, and Biology , 1977 .

[67]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[68]  Karl J. Friston,et al.  The Anatomy of Inference: Generative Models and Brain Structure , 2018, Front. Comput. Neurosci..

[69]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[70]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[71]  H.-A. Loeliger,et al.  An introduction to factor graphs , 2004, IEEE Signal Process. Mag..

[72]  Geoffrey E. Hinton,et al.  An Efficient Learning Procedure for Deep Boltzmann Machines , 2012, Neural Computation.

[73]  I. Prigogine,et al.  Formative Processes. (Book Reviews: Self-Organization in Nonequilibrium Systems. From Dissipative Structures to Order through Fluctuations) , 1977 .

[74]  Karl J. Friston,et al.  Bayesian model selection for group studies , 2009, NeuroImage.

[75]  Karl J. Friston,et al.  Active listening , 2020, Hearing Research.

[76]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[77]  Bruno A Olshausen,et al.  Sparse coding of sensory inputs , 2004, Current Opinion in Neurobiology.

[78]  A. Borst Seeing smells: imaging olfactory learning in bees , 1999, Nature Neuroscience.

[79]  Karl J. Friston,et al.  Active inference and the anatomy of oculomotion , 2018, Neuropsychologia.

[80]  Stephen P. Brooks,et al.  Markov Decision Processes. , 1995 .

[81]  Stefan J. Kiebel,et al.  Active Inference, Belief Propagation, and the Bethe Approximation , 2018, Neural Computation.

[82]  Karl J. Friston,et al.  Working memory, attention, and salience in active inference , 2017, Scientific Reports.

[83]  J. Fuster,et al.  Prefrontal Cortex and the Bridging of Temporal Gaps in the Perception‐Action Cycle , 1990, Annals of the New York Academy of Sciences.

[84]  Karl J. Friston A free energy principle for a particular physics , 2019, 1906.10184.

[85]  E. Miller,et al.  Gamma and Beta Bursts Underlie Working Memory , 2016, Neuron.

[86]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[87]  Thomas Parr,et al.  The computational neurology of active vision , 2019 .

[88]  Raymond J. Dolan,et al.  Dopamine, reward learning, and active inference , 2015, Front. Comput. Neurosci..

[89]  D. Lindley On a Measure of the Information Provided by an Experiment , 1956 .

[90]  Raymond J. Dolan,et al.  Precision and neuronal dynamics in the human posterior parietal cortex during evidence accumulation , 2015, NeuroImage.

[91]  M. Paulus,et al.  Imprecise action selection in substance use disorder: Evidence for active learning impairments when solving the explore-exploit dilemma. , 2020, Drug and alcohol dependence.

[92]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[93]  Emanuel Todorov,et al.  General duality between optimal control and estimation , 2008, 2008 47th IEEE Conference on Decision and Control.

[94]  William T. Freeman,et al.  Constructing free-energy approximations and generalized belief propagation algorithms , 2005, IEEE Transactions on Information Theory.

[95]  Pierre Baldi,et al.  Bayesian surprise attracts human attention , 2005, Vision Research.

[96]  Karl J. Friston,et al.  The Computational Anatomy of Visual Neglect , 2017, Cerebral cortex.

[97]  H. Eichenbaum,et al.  The Hippocampus, Memory, and Place Cells Is It Spatial Memory or a Memory Space? , 1999, Neuron.

[98]  Alexander Tschantz,et al.  Scaling Active Inference , 2019, 2020 International Joint Conference on Neural Networks (IJCNN).

[99]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[100]  Karl J. Friston The history of the future of the Bayesian brain , 2012, NeuroImage.

[101]  Karl Johan Åström,et al.  Optimal control of Markov processes with incomplete state information , 1965 .

[102]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[103]  Karl J. Friston,et al.  Active Inference, Curiosity and Insight , 2017, Neural Computation.

[104]  Karl J. Friston,et al.  Deep temporal models and active inference , 2017, Neuroscience & Biobehavioral Reviews.

[105]  George E. P. Box,et al.  Multiparameter Problems From a Bayesian Point of View , 1965 .

[106]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[107]  Karl J. Friston,et al.  The Functional Anatomy of Time: What and When in the Brain , 2016, Trends in Cognitive Sciences.

[108]  Karl J. Friston,et al.  Waking and dreaming consciousness: Neurobiological and functional considerations , 2012, Progress in Neurobiology.

[109]  J. Nunn Lectures on the Phenomena of Life Common to Animals and Plants trans. by H. Hoff, R. Guillemin, and L. Guillemin (review) , 2015 .

[110]  Karl J. Friston,et al.  Active Inference in OpenAI Gym: A Paradigm for Computational Investigations Into Psychiatric Illness. , 2018, Biological psychiatry. Cognitive neuroscience and neuroimaging.

[111]  J. Neumann,et al.  Theory of games and economic behavior , 1945, 100 Years of Math Milestones.

[112]  Karl J. Friston,et al.  Towards a Neuronal Gauge Theory , 2016, PLoS biology.

[113]  Simon McGregor,et al.  The free energy principle for action and perception: A mathematical review , 2017, 1705.09156.

[114]  D. Mackay Free energy minimisation algorithm for decoding and cryptanalysis , 1995 .

[115]  Terrence J. Sejnowski,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cognitive Sciences.

[116]  Karl J. Friston,et al.  Active inference and epistemic value , 2015, Cognitive neuroscience.

[117]  P. Bossaerts,et al.  From behavioural economics to neuroeconomics to decision neuroscience: the ascent of biology in research on human decision making , 2015, Current Opinion in Behavioral Sciences.

[118]  Karl J. Friston,et al.  The computational pharmacology of oculomotion , 2019, Psychopharmacology.

[119]  Karl J. Friston,et al.  Sophisticated Inference , 2020, Neural Computation.

[120]  Karl J. Friston,et al.  Neuronal message passing using Mean-field, Bethe, and Marginal approximations , 2019, Scientific Reports.

[121]  Dileep George Belief Propagation and Wiring Length Optimization as Organizing Principles for Cortical Microcircuits , 2005 .

[122]  Laurence Aitchison,et al.  With or without you: predictive coding and Bayesian inference in the brain , 2017, Current Opinion in Neurobiology.

[123]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[124]  Karl J. Friston,et al.  The graphical brain: Belief propagation and active inference , 2017, Network Neuroscience.

[125]  H. B. Barlow,et al.  Possible Principles Underlying the Transformations of Sensory Messages , 2012 .

[126]  D. Knill,et al.  The Bayesian brain: the role of uncertainty in neural coding and computation , 2004, Trends in Neurosciences.

[127]  A. Tversky,et al.  Prospect Theory : An Analysis of Decision under Risk Author ( s ) : , 2007 .

[128]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[129]  Edward K. Vogel,et al.  The capacity of visual working memory for features and conjunctions , 1997, Nature.

[130]  H Barlow,et al.  Redundancy reduction revisited , 2001, Network.

[131]  Karl J. Friston,et al.  Markov blankets, information geometry and stochastic thermodynamics , 2019, Philosophical Transactions of the Royal Society A.

[132]  Karl J. Friston,et al.  Predictions not commands: active inference in the motor system , 2012, Brain Structure and Function.

[133]  Karl J. Friston The free-energy principle: a unified brain theory? , 2010, Nature Reviews Neuroscience.

[134]  W. Fleming,et al.  Risk‐Sensitive Control and an Optimal Investment Model , 2000 .

[135]  Wolfgang Maass,et al.  On the Computational Power of Winner-Take-All , 2000, Neural Computation.

[136]  Michael S. Lewicki,et al.  Efficient coding of natural sounds , 2002, Nature Neuroscience.

[137]  J. Hohwy The self-evidencing brain , 2016 .

[138]  Karl J. Friston,et al.  The Computational Anatomy of Psychosis , 2013, Front. Psychiatry.

[139]  Karl J. Friston,et al.  The Discrete and Continuous Brain: From Decisions to Movement—And Back Again , 2018, Neural Computation.

[140]  Karl J. Friston Hierarchical Models in the Brain , 2008, PLoS Comput. Biol..

[141]  Basal ganglia play a crucial role in decision making , 2016, Dialogues in clinical neuroscience.

[142]  Karl J. Friston,et al.  Planning and navigation as active inference , 2017, Biological Cybernetics.

[143]  Michael I. Jordan,et al.  Optimal feedback control as a theory of motor coordination , 2002, Nature Neuroscience.

[144]  Karl J. Friston,et al.  Perceptions as Hypotheses: Saccades as Experiments , 2012, Front. Psychology.

[145]  Karl J. Friston,et al.  Evidence for surprise minimization over value maximization in choice behavior , 2015, Scientific Reports.

[146]  W. Ashby,et al.  Principles of the self-organizing dynamic system. , 1947, The Journal of general psychology.

[147]  Charles M. Bishop,et al.  Variational Message Passing , 2005, J. Mach. Learn. Res..

[148]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[149]  Karl J. Friston,et al.  Virtual reality and consciousness inference in dreaming , 2014, Front. Psychol..

[150]  Karl J. Friston,et al.  Uncertainty, epistemics and active inference , 2017, Journal of The Royal Society Interface.

[151]  Karl J. Friston,et al.  Canonical Microcircuits for Predictive Coding , 2012, Neuron.

[152]  Ronald A. Howard,et al.  Information Value Theory , 1966, IEEE Trans. Syst. Sci. Cybern..

[153]  Justin S. Feinstein,et al.  Greater decision uncertainty characterizes a transdiagnostic patient sample during approach-avoidance conflict: a computational modelling approach , 2020, Journal of psychiatry & neuroscience : JPN.

[154]  P. Fries,et al.  Attention Samples Stimuli Rhythmically , 2012, Current Biology.

[155]  Karl J. Friston,et al.  Dynamic causal modelling , 2003, NeuroImage.

[156]  Stephen Grossberg,et al.  A massively parallel architecture for a self-organizing neural pattern recognition machine , 1988, Comput. Vis. Graph. Image Process..

[157]  Peter Dayan,et al.  Bonsai Trees in Your Head: How the Pavlovian System Sculpts Goal-Directed Choices by Pruning Decision Trees , 2012, PLoS Comput. Biol..

[158]  Karl J. Friston,et al.  Predictive coding under the free-energy principle , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[159]  Karl J. Friston,et al.  The Markov blankets of life: autonomy, active inference and the free energy principle , 2018, Journal of The Royal Society Interface.

[160]  Pierre-Yves Oudeyer,et al.  What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[161]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[162]  R. Gregory Perceptions as hypotheses. , 1980, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[163]  Andrzej Cichocki,et al.  Measuring Neural Synchrony by Message Passing , 2007, NIPS.

[164]  C. Mathys,et al.  Hierarchical Prediction Errors in Midbrain and Basal Forebrain during Sensory Learning , 2013, Neuron.

[165]  Tom Heskes,et al.  Convexity Arguments for Efficient Minimization of the Bethe and Kikuchi Free Energies , 2006, J. Artif. Intell. Res..

[166]  Raymond J. Dolan,et al.  Model averaging, optimal inference, and habit formation , 2014, Front. Hum. Neurosci..

[167]  Tim Verbelen,et al.  Bayesian policy selection using active inference , 2019, ICLR 2019.

[168]  Karl J. Friston,et al.  Impulsivity and Active Inference , 2019, Journal of Cognitive Neuroscience.

[169]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[170]  Karl J. Friston,et al.  Precision and False Perceptual Inference , 2018, Front. Integr. Neurosci..

[171]  D. Dennett,et al.  The evolution of misbelief , 2009, Behavioral and Brain Sciences.

[172]  Bruno A. Olshausen,et al.  A new window on sound , 2002, Nature Neuroscience.

[173]  Karl J. Friston,et al.  Population dynamics: Variance and the sigmoid activation function , 2008, NeuroImage.

[174]  Rafal Bogacz,et al.  A tutorial on the free-energy framework for modelling perception and learning , 2017, Journal of mathematical psychology.

[175]  Karl J. Friston,et al.  Variational free energy and the Laplace approximation , 2007, NeuroImage.

[176]  Karl J. Friston,et al.  Bayesian model reduction , 2018, 1805.07092.

[177]  S. Dalal,et al.  Prestimulus Oscillatory Phase at 7 Hz Gates Cortical Information Flow and Visual Perception , 2013, Current Biology.

[178]  A. Tversky,et al.  Prospect theory: an analysis of decision under risk — Source link , 2007 .

[179]  Justin Dauwels,et al.  On Variational Message Passing on Factor Graphs , 2007, 2007 IEEE International Symposium on Information Theory.

[180]  Stuart A. Kauffman,et al.  The origins of order , 1993 .

[181]  R Linsker,et al.  Perceptual neural organization: some approaches based on network models and information theory. , 1990, Annual review of neuroscience.

[182]  Karl J. Friston,et al.  Neuroscience and Biobehavioral Reviews , 2022 .

[183]  R C Reid,et al.  Efficient Coding of Natural Scenes in the Lateral Geniculate Nucleus: Experimental Test of a Computational Theory , 1996, The Journal of Neuroscience.

[184]  Florent Meyniel,et al.  The Neural Representation of Sequences: From Transition Probabilities to Algebraic Patterns and Linguistic Trees , 2015, Neuron.

[185]  L. Optican,et al.  Temporal encoding of two-dimensional patterns by single units in primate inferior temporal cortex. III. Information theoretic analysis. , 1987, Journal of neurophysiology.

[186]  Karl J. Friston,et al.  Computational Neuropsychology and Bayesian Inference , 2018, Front. Hum. Neurosci..

[187]  Toshiyuki Tanaka,et al.  A Theory of Mean Field Approximation , 1998, NIPS.

[188]  Jürgen Schmidhuber,et al.  Optimal Artificial Curiosity, Creativity, Music, and the Fine Arts , 2005 .

[189]  Emil Kauder,et al.  Genesis of the Marginal Utility Theory: From Aristotle to the End of the Eighteenth Century , 1953 .

[190]  H. Barlow Inductive Inference, Coding, Perception, and Language , 1974, Perception.

[191]  Karl J. Friston,et al.  A formal model of interpersonal inference , 2014, Front. Hum. Neurosci..

[192]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .