暂无分享,去创建一个
[1] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[2] N. Kiefer,et al. Controlling a Stochastic Process with Unknown Parameters , 1988 .
[3] Robert D. Kleinberg. Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.
[4] Don H. Johnson,et al. Symmetrizing the Kullback-Leibler Distance , 2001 .
[5] J. Michael Harrison,et al. Bayesian Dynamic Pricing Policies: Learning and Earning Under a Binary Prior Distribution , 2011, Manag. Sci..
[6] Omar Besbes,et al. Dynamic Pricing Without Knowing the Demand Function: Risk Bounds and Near-Optimal Algorithms , 2009, Oper. Res..
[7] Frank Thomson Leighton,et al. The value of knowing a demand curve: bounds on regret for online posted-price auctions , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..
[8] Eric W. Cope,et al. Regret and Convergence Bounds for a Class of Continuum-Armed Bandit Problems , 2009, IEEE Transactions on Automatic Control.
[9] Peter Auer,et al. Improved Rates for the Stochastic Continuum-Armed Bandit Problem , 2007, COLT.
[10] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[11] R. Agrawal. The Continuum-Armed Bandit Problem , 1995 .
[12] Vincent K. N. Lau,et al. Distributive Stochastic Learning for Delay-Optimal OFDMA Power and Subband Allocation , 2010, IEEE Transactions on Signal Processing.
[13] Yossi Aviv,et al. A Partially Observed Markov Decision Process for Dynamic Pricing , 2005, Manag. Sci..
[14] R. Bellman. A PROBLEM IN THE SEQUENTIAL DESIGN OF EXPERIMENTS , 1954 .
[15] K. Arrow,et al. A Two-Armed Bandit Theory of Market , 2003 .
[16] Benjamin Van Roy,et al. Dynamic Pricing with a Prior on Market Response , 2010, Oper. Res..
[17] Bhaskar Krishnamachari,et al. Dynamic Multichannel Access With Imperfect Channel State Detection , 2010, IEEE Transactions on Signal Processing.
[18] Sven Rady,et al. Optimal Experimentation in a Changing Environment , 1997 .
[19] Andrew V. Goldberg,et al. Competitive auctions and digital goods , 2001, SODA '01.
[20] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .
[21] Mihaela van der Schaar,et al. Decomposition Principles and Online Learning in Cross-Layer Optimization for Delay-Sensitive Applications , 2008, IEEE Transactions on Signal Processing.
[22] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..
[23] A. McLennan. Price dispersion and incomplete learning in the long run , 1984 .
[24] Felix Wu,et al. Incentive-compatible online auctions for digital goods , 2002, SODA '02.
[25] R. A. Leibler,et al. On Information and Sufficiency , 1951 .
[26] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..
[27] Amos Fiat,et al. Competitive generalized auctions , 2002, STOC '02.
[28] Robin J. Evans,et al. Hidden Markov model multiarm bandits: a methodology for beam scheduling in multitarget tracking , 2001, IEEE Trans. Signal Process..
[29] Vijay Kumar,et al. Online learning in online auctions , 2003, SODA '03.
[30] Hyundong Shin,et al. Sensing and Probing Cardinalities for Active Cognitive Radios , 2012, IEEE Transactions on Signal Processing.
[31] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .
[32] H. Chernoff. A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .
[33] Huaiyu Zhu. On Information and Sufficiency , 1997 .
[34] B. Jullien,et al. OPTIMAL LEARNING BY EXPERIMENTATION , 1991 .