暂无分享,去创建一个
Djallel Bouneffouf | Jenna Reinen | Irina Rish | Baihan Lin | Guillermo Cecchi | Jenna M. Reinen | G. Cecchi | I. Rish | Djallel Bouneffouf | Baihan Lin
[1] P. Glimcher,et al. Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal , 2005, Neuron.
[2] John Langford,et al. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.
[3] Xinxin Zhang,et al. VoiceID on the Fly: A Speaker Recognition System that Learns from Scratch , 2020, INTERSPEECH.
[4] Karl J. Friston,et al. Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.
[5] John Langford,et al. Contextual Bandit Algorithms with Supervised Learning Guarantees , 2010, AISTATS.
[6] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[7] Peter Dayan,et al. A Neural Substrate of Prediction and Reward , 1997, Science.
[8] Nicolò Cesa-Bianchi,et al. On-line learning with malicious noise and the closure algorithm , 1998, Annals of Mathematics and Artificial Intelligence.
[9] Djallel Bouneffouf,et al. Optimal Epidemic Control as a Contextual Combinatorial Bandit with Budget , 2021, ArXiv.
[10] A. Damasio,et al. Insensitivity to future consequences following damage to human prefrontal cortex , 1994, Cognition.
[11] Speaker Diarization as a Fully Online Learning Problem in MiniVox , 2020, ArXiv.
[12] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[13] P. Glimcher,et al. Phasic Dopamine Release in the Rat Nucleus Accumbens Symmetrically Encodes a Reward Prediction Error Term , 2014, The Journal of Neuroscience.
[14] A. Holmes,et al. The Myth of Optimality in Clinical Neuroscience , 2018, Trends in Cognitive Sciences.
[15] Wei Chu,et al. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.
[16] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .
[17] Djallel Bouneffouf,et al. Split Q Learning: Reinforcement Learning with Two-Stream Rewards , 2019, IJCAI.
[18] P. Dayan,et al. Reinforcement learning: The Good, The Bad and The Ugly , 2008, Current Opinion in Neurobiology.
[19] Djallel Bouneffouf,et al. Contextual Bandit with Adaptive Feature Extraction , 2018, 2018 IEEE International Conference on Data Mining Workshops (ICDMW).
[20] Baihan Lin. Online Semi-Supervised Learning in Contextual Bandits with Episodic Reward , 2020, Australasian Conference on Artificial Intelligence.
[21] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[22] M. Frank,et al. From reinforcement learning models to psychiatric and neurological disorders , 2011, Nature Neuroscience.
[23] Woojae Kim,et al. Cognitive Mechanisms Underlying Risky Decision-Making in Chronic Cannabis Users. , 2010, Journal of mathematical psychology.
[24] Yishay Mansour,et al. Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..
[25] A. Tversky,et al. The framing of decisions and the psychology of choice. , 1981, Science.
[26] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.
[27] Michael J. Frank,et al. By Carrot or by Stick: Cognitive Reinforcement Learning in Parkinsonism , 2004, Science.
[28] Predicting human decision making in psychological tasks with recurrent neural networks , 2020, ArXiv.
[29] Djallel Bouneffouf,et al. Bandit Models of Human Behavior: Reward Processing in Mental Disorders , 2017, AGI.
[30] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[31] Baihan Lin,et al. Online Learning in Iterated Prisoner's Dilemma to Mimic Human Behavior , 2020, ArXiv.
[32] Raphaël Féraud,et al. Multi-armed bandit problem with known trend , 2015, Neurocomputing.
[33] J. Langford,et al. The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.
[34] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[35] Stefan Elfwing,et al. Parallel reward and punishment control in humans and robots: Safe reinforcement learning using the MaxPain algorithm , 2017, 2017 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob).
[36] Arno Villringer,et al. Iowa Gambling Task: There is More to Consider than Long-Term Outcome. Using a Linear Equation Model to Disentangle the Impact of Outcome and Frequency of Gains and Losses , 2012, Front. Neurosci..
[37] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[38] J. Kramer,et al. Reward processing in neurodegenerative disease , 2015, Neurocase.
[39] James L. McClelland,et al. Data from 617 Healthy Participants Performing the Iowa Gambling Task: A “Many Labs” Collaboration , 2015 .
[40] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[41] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[42] Michael J. Frank,et al. A mechanistic account of striatal dopamine function in human cognition: psychopharmacological studies with cabergoline and haloperidol. , 2006, Behavioral neuroscience.
[43] Raphaël Féraud,et al. Context Attentive Bandits: Contextual Bandit with Restricted Context , 2017, IJCAI.
[44] R. Dolan,et al. The neurobiology of punishment , 2007, Nature Reviews Neuroscience.