Policy compression: An information bottleneck in action selection

[1]  W. Brown Animal Intelligence: Experimental Studies , 1912, Nature.

[2]  E. A. Berg,et al.  A simple objective technique for measuring flexibility in thinking. , 1948, The Journal of general psychology.

[3]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[4]  K. Lashley The problem of serial order in behavior , 1951 .

[5]  F. Mosteller,et al.  An Experimental Measurement of Utility , 1951, Journal of Political Economy.

[6]  D. Huffman A Method for the Construction of Minimum-Redundancy Codes , 1952 .

[7]  W. E. Hick Quarterly Journal of Experimental Psychology , 1948, Nature.

[8]  W. S. Verplanck,et al.  Nonindependence of successive responses in measurements of the visual threshold. , 1952, Journal of experimental psychology.

[9]  R. Hyman Stimulus information as a determinant of reaction time. , 1953, Journal of experimental psychology.

[10]  C. I. Howarth,et al.  Non-Random Sequences in Visual Threshold Experiments , 1956 .

[11]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[12]  John von Neumann,et al.  The Computer and the Brain , 1960 .

[13]  M. V. Rhoades,et al.  On the Reduction of Choice Reaction Times with Practice , 1959 .

[14]  R. Seibel DISCRIMINATION REACTION TIME FOR 1,023-ALTERNATIVE TASK. , 1963, Journal of experimental psychology.

[15]  P. Bertelson,et al.  Serial Choice Reaction-time as a Function of Response versus Signal-and-Response Repetition , 1965, Nature.

[16]  David Hale,et al.  The relation of correct and error responses in a serial choice reaction task , 1968 .

[17]  Toby Berger,et al.  Rate distortion theory : a mathematical basis for data compression , 1971 .

[18]  Suguru Arimoto,et al.  An algorithm for computing the capacity of arbitrary discrete memoryless channels , 1972, IEEE Trans. Inf. Theory.

[19]  Richard E. Blahut,et al.  Computation of channel capacity and rate-distortion functions , 1972, IEEE Trans. Inf. Theory.

[20]  W H Teichner,et al.  Laws of visual choice reaction time. , 1974, Psychological review.

[21]  D. Norman Categorization of action slips. , 1981 .

[22]  A. Dickinson Actions and habits: the development of behavioural autonomy , 1985 .

[23]  M. Nissen,et al.  Attentional requirements of learning: Evidence from performance measures , 1987, Cognitive Psychology.

[24]  David Haussler,et al.  Occam's Razor , 1987, Inf. Process. Lett..

[25]  L. E. Longstreth,et al.  Hick’s law: Its limit is 3 bits , 1988 .

[26]  H S Terrace,et al.  Chunking during serial learning by a pigeon: I. Basic evidence. , 1991, Journal of experimental psychology. Animal behavior processes.

[27]  Doina Precup,et al.  Theoretical Results on Reinforcement Learning with Temporally Abstract Options , 1998, ECML.

[28]  A. Graybiel The Basal Ganglia and Chunking of Action Repertoires , 1998, Neurobiology of Learning and Memory.

[29]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[30]  J Ashe,et al.  Choice and stimulus-response compatibility affect duration of response selection. , 1999, Brain research. Cognitive brain research.

[31]  Willem B. Verwey,et al.  Evidence for a multistage model of practice in a sequential movement task. , 1999 .

[32]  A. Rivlin,et al.  Economic Choices , 2001 .

[33]  John Langford,et al.  PAC-MDL Bounds , 2003, COLT.

[34]  O. Hikosaka,et al.  Chunking during human visuomotor sequence learning , 2003, Experimental Brain Research.

[35]  H. Bergman,et al.  Information processing, dimensionality reduction and reinforcement learning in the basal ganglia , 2003, Progress in Neurobiology.

[36]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[37]  P. Glimcher,et al.  JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2005, 84, 555–579 NUMBER 3(NOVEMBER) DYNAMIC RESPONSE-BY-RESPONSE MODELS OF MATCHING BEHAVIOR IN RHESUS MONKEYS , 2022 .

[38]  Ahmed,et al.  Hierarchical Chunking during Learning of Visuomotor Sequences , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[39]  Philip David Zelazo,et al.  The Dimensional Change Card Sort (DCCS): a method of assessing executive function in children , 2006, Nature Protocols.

[40]  H. Robbins A Stochastic Approximation Method , 1951 .

[41]  E. Robertson The Serial Reaction Time Task: Implicit Motor Skill Learning? , 2007, The Journal of Neuroscience.

[42]  J. Tanji,et al.  Categorization of behavioural sequences in the prefrontal cortex , 2007, Nature.

[43]  A. Faisal,et al.  Noise in the nervous system , 2008, Nature Reviews Neuroscience.

[44]  M. Botvinick Hierarchical models of behavior and prefrontal function , 2008, Trends in Cognitive Sciences.

[45]  A M McIntosh,et al.  Working memory in schizophrenia: a meta-analysis , 2008, Psychological Medicine.

[46]  M. Gluck,et al.  Dopaminergic Drugs Modulate Learning Rates and Perseveration in Parkinson's Patients in a Dynamic Foraging Task , 2009, The Journal of Neuroscience.

[47]  R. Seidler,et al.  Visuospatial working memory capacity predicts the organization of acquired explicit motor sequences. , 2009, Journal of neurophysiology.

[48]  B. Balleine,et al.  Evidence of Action Sequence Chunking in Goal-Directed Instrumental Conditioning and Its Dependence on the Dorsomedial Prefrontal Cortex , 2009, The Journal of Neuroscience.

[49]  Timothy F. Brady,et al.  Compression in visual working memory: using statistical regularities to form more efficient memory representations. , 2009, Journal of experimental psychology. General.

[50]  Gasper Tkacik,et al.  Optimal population coding by noisy spiking neurons , 2010, Proceedings of the National Academy of Sciences.

[51]  Xin Jin,et al.  Start/stop signals emerge in nigrostriatal circuits during sequence learning , 2010, Nature.

[52]  Daniel Polani,et al.  Information Theory of Decisions and Actions , 2011 .

[53]  Doina Precup,et al.  An information-theoretic approach to curiosity-driven reinforcement learning , 2012, Theory in Biosciences.

[54]  Naftali Tishby,et al.  Dopaminergic Balance between Reward Maximization and Policy Complexity , 2011, Front. Syst. Neurosci..

[55]  F. Mathy,et al.  What’s magic about magic numbers? Chunking and data compression in short-term memory , 2012, Cognition.

[56]  R. Jacobs,et al.  An ideal observer analysis of visual working memory. , 2012, Psychological review.

[57]  Anne G E Collins,et al.  How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis , 2012, The European journal of neuroscience.

[58]  B. Balleine,et al.  Habits, action sequences and reinforcement learning , 2012, The European journal of neuroscience.

[59]  J. Anguera,et al.  Neurocognitive Contributions to Motor Skill Learning: The Role of Working Memory , 2012, Journal of motor behavior.

[60]  Kyle S. Smith,et al.  A dual operator view of habitual behavior reflecting cortical and striatal dynamics. , 2013, Neuron.

[61]  Xin Jin,et al.  Basal Ganglia Subcircuits Distinctively Encode the Parsing and Concatenation of Action Sequences , 2014, Nature Neuroscience.

[62]  Raymond J. Dolan,et al.  Striatal dysfunction during reversal learning in unmedicated schizophrenia patients☆ , 2014, NeuroImage.

[63]  Anne G E Collins,et al.  Working Memory Contributions to Reinforcement Learning Impairments in Schizophrenia , 2014, The Journal of Neuroscience.

[64]  J. Macke,et al.  Quantifying the effect of intertrial dependence on perceptual decisions. , 2014, Journal of vision.

[65]  Chang-Bing Huang,et al.  The external noise normalized gain profile of spatial vision. , 2014, Journal of vision.

[66]  J D Cohen,et al.  Multitasking versus multiplexing: Toward a normative account of limitations in the simultaneous execution of control-demanding behaviors , 2014, Cognitive, affective & behavioral neuroscience.

[67]  Peter Dayan,et al.  Interplay of approximate planning strategies , 2015, Proceedings of the National Academy of Sciences.

[68]  Filip Matejka,et al.  Rational Inattention to Discrete Choices: A New Foundation for the Multinomial Logit Model , 2011 .

[69]  Max Berniker,et al.  Chunking as the result of an efficiency computation trade-off , 2016, Nature Communications.

[70]  Michael F. Green,et al.  Probabilistic Reversal Learning in Schizophrenia: Stability of Deficits and Potential Causal Mechanisms. , 2016, Schizophrenia bulletin.

[71]  Roy Fox,et al.  Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.

[72]  Timothy A. Wifall,et al.  The roles of stimulus and response uncertainty in forced-choice performance: an amendment to Hick/Hyman Law , 2016, Psychological research.

[73]  C. Sims Rate–distortion theory and human perception , 2016, Cognition.

[74]  Roshan Cools,et al.  Impaired Activation in Cognitive Control Regions Predicts Reversal Learning in Schizophrenia. , 2016, Schizophrenia bulletin.

[75]  R. Moubah,et al.  Long-range magnetic interactions and proximity effects in an amorphous exchange-spring magnet , 2016, Nature Communications.

[76]  Jonathan D. Cohen,et al.  Multitasking Capability Versus Learning Efficiency in Neural Network Architectures , 2017, CogSci.

[77]  R. Hampton,et al.  Change in the relative contributions of habit and working memory facilitates serial reversal learning expertise in rhesus monkeys , 2017, Animal Cognition.

[78]  Chris R. Sims,et al.  Policy Generalization In Capacity-Limited Reinforcement Learning , 2018 .

[79]  Rahul Bhui,et al.  Decision by sampling implements efficient coding of psychoeconomic functions , 2017, bioRxiv.

[80]  Anne G. E. Collins,et al.  The tortoise and the hare: interactions between reinforcement learning and working memory , 2017, bioRxiv.

[81]  Darryl W. Schneider,et al.  Hick’s law for choice reaction time: A review , 2018, Quarterly journal of experimental psychology.

[82]  D. Barch,et al.  Effort-based decision-making in schizophrenia , 2018, Current Opinion in Behavioral Sciences.

[83]  Julie C. Helmers,et al.  Chunking as a rational strategy for lossy data compression in visual working memory , 2017, bioRxiv.

[84]  Jonathan D. Cohen,et al.  Efficiency of learning vs. processing: Towards a normative theory of multitasking , 2020, CogSci.

[85]  S. Gershman The rational analysis of memory , 2019 .

[86]  Lawson L. S. Wong,et al.  State Abstraction as Compression in Apprenticeship Learning , 2019, AAAI.

[87]  Thomas Icard,et al.  Why Be Random? , 2019, Mind.

[88]  Samuel J. Gershman,et al.  The algorithmic architecture of exploration in the human brain , 2019, Current Opinion in Neurobiology.

[89]  James A. Brissenden,et al.  "Memory compression" effects in visual working memory are contingent on explicit long-term memory. , 2019, Journal of experimental psychology. General.

[90]  Massimo Marinacci,et al.  A note on rational inattention and rate distortion theory , 2020 .

[91]  Jordi Grau-Moya,et al.  Soft Q-Learning with Mutual-Information Regularization , 2018, ICLR.

[92]  Christopher J Bates,et al.  Adaptive allocation of human visual working memory capacity during statistical and categorical learning. , 2019, Journal of vision.

[93]  Kevin J. Miller,et al.  Habits without Values , 2016, bioRxiv.

[94]  N. Geard,et al.  Epidemiological consequences of enduring strain-specific immunity requiring repeated episodes of infection , 2020, PLoS computational biology.

[95]  Naftali Tishby,et al.  Value-complexity tradeoff explains mouse navigational learning , 2020, PLoS Comput. Biol..

[96]  Anne G E Collins,et al.  Modeling the influence of working memory, reinforcement, and action uncertainty on reaction time and choice during instrumental learning , 2019, Psychonomic bulletin & review.

[97]  S. Gershman Origin of perseveration in the trade-off between reward and complexity , 2020, Cognition.

[98]  Michael L. Littman,et al.  Successor Features Combine Elements of Model-Free and Model-based Reinforcement Learning , 2019, J. Mach. Learn. Res..

[99]  Balázs Török,et al.  Optimal forgetting: Semantic compression of episodic memories , 2020, PLoS computational biology.

[100]  S. Gershman,et al.  The reward-complexity trade-off in schizophrenia , 2020, bioRxiv.

[101]  Balázs Török,et al.  Optimal forgetting: Semantic compression of episodic memories , 2020, bioRxiv.

[102]  Wanqian Yang,et al.  Discovery of hierarchical representations for efficient planning , 2020, PLoS computational biology.

[103]  Michael J Frank,et al.  Reward-predictive representations generalize across tasks in reinforcement learning , 2020, PLoS computational biology.

[104]  Andrew M. Saxe,et al.  On the Rational Boundedness of Cognitive Control: Shared Versus Separated Representations , 2020 .

[105]  Christopher J Bates,et al.  Efficient data compression in perception and perceptual memory. , 2020, Psychological review.

[106]  D. Norris,et al.  Chunking and data compression in verbal short-term memory , 2020, Cognition.