论文信息 - Theoretical guarantees for algorithms in multi-agent settings

Theoretical guarantees for algorithms in multi-agent settings

In this thesis we will develop and analyze algorithms in multi-agent settings. We focus on two areas: that of auctions, negotiation, and exchanges, which are of increasing interest to computer scientists with the rise of e-commerce, and that of repeated bimatrix games and multi-stage games, which provide an interesting test bed for agents that will be interacting with other intelligent agents. The main thrust of this thesis is designing algorithms for agents who are missing critical information about the environment or other agents (like what is going to happen in the future, or what the other agents' motivations are) that perform well compared to the optimal behavior which has this information. In the area of auction design, we consider online double auctions (exchanges) for commodities, where there are buyers and sellers trading indistinguishable goods who arrive and depart over time. We consider this from the perspective of the broker, who decides what trades occur and what payments are made. We also consider combinatorial auctions. We show a connection between query learning in machine learning theory and preference elicitation. We show how certain natural hypotheses spaces that can be learned with membership queries correspond to natural classes of preferences that can be learned with value queries. Finally, we consider repeated bimatrix games. One would like two reasonable agents who encounter each other many times in the same setting (e.g. a bimatrix game) to eventually perform well together. We show how a simple gradient ascent technique performs well in a bimatrix game, as well as in an arbitrary, online convex programming domain. One of the hardest parts of working in a domain with other intelligent agents is defining what it means to perform well and understanding what are the right assumptions to be made about the other agent. It is well known that there exists an algorithm that has no regret against an arbitrary algorithm. Furthermore, it is not hard to show that there exists an algorithm that achieves the minimum Nash equilibrium value against a no-external-regret algorithm. However, we show here that no algorithm can achieve both these guarantees. (Abstract shortened by UMI.)

Martin A. Zinkevich | A. Blum | Avrim Blum

[1] J. Nash. Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[2] J. Robinson. AN ITERATIVE METHOD OF SOLVING A GAME , 1951, Classics in Game Theory.

[3] O. H. Brownlee,et al. ACTIVITY ANALYSIS OF PRODUCTION AND ALLOCATION , 1952 .

[4] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[5] D. Blackwell. An analog of the minimax theorem for vector payoffs. , 1956 .

[6] C Berge,et al. TWO THEOREMS IN GRAPH THEORY. , 1957, Proceedings of the National Academy of Sciences of the United States of America.

[7] James Hannan,et al. 4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[8] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[9] S. Vajda. Some topics in two-person games , 1971 .

[10] N. Megiddo. On repeated games with incomplete information played by non-Bayesian players , 1980 .

[11] Roger B. Myerson,et al. Optimal Auction Design , 1981, Math. Oper. Res..

[12] M. Satterthwaite,et al. Efficient Mechanisms for Bilateral Trading , 1983 .

[13] Robert H. Wilson. Incentive Efficiency of Double Auctions , 1985 .

[14] Dana Angluin,et al. Queries and concept learning , 1988, Machine Learning.

[15] Jeffrey D. Smith,et al. Design and Analysis of Algorithms , 2009, Lecture Notes in Computer Science.

[16] Richard M. Karp,et al. An optimal algorithm for on-line bipartite matching , 1990, STOC '90.

[17] Michael Kearns,et al. On the complexity of teaching , 1991, COLT '91.

[18] Ran El-Yaniv,et al. Competitive analysis of financial games , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[19] Marek Karpinski,et al. Learning read-once formulas with queries , 1993, JACM.

[20] Philip M. Long,et al. WORST-CASE QUADRATIC LOSS BOUNDS FOR ON-LINE PREDICTION OF LINEAR FUNCTIONS BY GRADIENT DESCENT , 1993 .

[21] E. Kalai,et al. Rational Learning Leads to Nash Equilibrium , 1993 .

[22] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..

[23] Richard J. Lipton,et al. Online interval scheduling , 1994, SODA '94.

[24] Ariel Rubinstein,et al. A Course in Game Theory , 1995 .

[25] D. Fudenberg,et al. Consistency and Cautious Fictitious Play , 1995 .

[26] John Nachbar. Prediction, optimization, and learning in repeated games , 1997 .

[27] Lisa Hellerstein,et al. Learning Arithmetic Read-Once Formulas , 1995, SIAM J. Comput..

[28] Amos Fiat,et al. Distributed paging for general networks , 1996, SODA '96.

[29] Philip M. Long,et al. Worst-case quadratic loss bounds for prediction using linear functions and gradient descent , 1996, IEEE Trans. Neural Networks.

[30] Yoav Freund,et al. Game theory, on-line prediction and boosting , 1996, COLT '96.

[31] Dean P. Foster,et al. Calibrated Learning and Correlated Equilibrium , 1997 .

[32] Ronald Miller,et al. The Role of Absolute Continuity in Merging of Opinions and Rational Learning , 1996 .

[33] S. Hart,et al. A simple adaptive procedure leading to correlated equilibrium , 2000 .

[34] Ming-Yang Kao,et al. On-line difference maximization , 1997, SODA '97.

[35] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[36] A. Rubinstein. Modeling Bounded Rationality , 1998 .

[37] Avrim Blum,et al. On-line Learning and the Metrical Task System Problem , 1997, COLT '97.

[38] Michael P. Wellman,et al. The Michigan Internet AuctionBot: a configurable auction server for human and software agents , 1998, AGENTS '98.

[39] Allan Borodin,et al. Online computation and competitive analysis , 1998 .

[40] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[41] D. Fudenberg,et al. The Theory of Learning in Games , 1998 .

[42] Y. Freund,et al. Adaptive game playing using multiplicative weights , 1999 .

[43] D. Fudenberg,et al. Conditional Universal Consistency , 1999 .

[44] Dean P. Foster,et al. A Proof of Calibration Via Blackwell's Approachability Theorem , 1999 .

[45] Dean P. Foster,et al. Regret in the On-Line Decision Problem , 1999 .

[46] Subhash Suri,et al. Improved Algorithms for Optimal Winner Determination in Combinatorial Auctions and Generalizations , 2000, AAAI/IAAI.

[47] Noam Nisan,et al. Competitive analysis of incentive compatible on-line auctions , 2000, EC '00.

[48] Subhash Suri,et al. Online Scheduling with Hard Deadlines , 2000, J. Algorithms.

[49] Michael P. Wellman,et al. AkBA: a progressive, anonymous-price combinatorial auction , 2000, EC '00.

[50] Tuomas Sandholm,et al. eMediator: A Next Generation Electronic Commerce Server , 1999, AGENTS '00.

[51] Yishay Mansour,et al. Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[52] David C. Parkes,et al. Iterative Combinatorial Auctions: Theory and Practice , 2000, AAAI/IAAI.

[53] Subhash Suri,et al. Market Clearability , 2001, IJCAI.

[54] Andrew V. Goldberg,et al. Competitive Auctions for Multiple Digital Goods , 2001, ESA.

[55] Sven de Vries,et al. Linear Programming and Vickrey Auctions , 2001 .

[56] S. D. Pietra,et al. Duality and Auxiliary Functions for Bregman Distances , 2001 .

[57] Andreu Mas-Colell,et al. A General Class of Adaptive Strategies , 1999, J. Econ. Theory.

[58] Manuela M. Veloso,et al. Convergence of Gradient Dynamics with a Variable Learning Rate , 2001, ICML.

[59] Mark Herbster,et al. Tracking the Best Linear Predictor , 2001, J. Mach. Learn. Res..

[60] S. Hart,et al. A Reinforcement Procedure Leading to Correlated Equilibrium , 2001 .

[61] T. Sandholm,et al. Preference Elicitation in Combinatorial Auctions (Extended Abstract) , 2001 .

[62] Vijay Kumar,et al. Seller-Focused Algorithms for Online Auctioning , 2001, WADS.

[63] H P Young,et al. On the impossibility of predicting the behavior of rational agents , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[64] Robert E. Mahony,et al. Prior Knowledge and Preferential Structures in Gradient Descent Learning Algorithms , 2001, J. Mach. Learn. Res..

[65] Andrew V. Goldberg,et al. Competitive auctions and digital goods , 2001, SODA '01.

[66] David Levine,et al. CABOB: A Fast Optimal Algorithm for Combinatorial Auctions , 2001, IJCAI.

[67] S. Hart,et al. Uncoupled Dynamics Cannot Lead to Nash Equilibrium ∗ , 2002 .

[68] Ronen I. Brafman,et al. Efficient learning equilibrium , 2004, Artificial Intelligence.

[69] David Levine,et al. Winner determination in combinatorial auction generalizations , 2002, AAMAS '02.

[70] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[71] N. Nisan,et al. The Communication Complexity of Efficient Allocation Problems , 2002 .

[72] Avrim Blum,et al. Online algorithms for market clearing , 2002, SODA '02.

[73] Tuomas Sandholm,et al. Partial-revelation VCG mechanism for combinatorial auctions , 2002, AAAI/IAAI.

[74] Tuomas Sandholm,et al. Effectiveness of Preference Elicitation in Combinatorial Auctions , 2002, AMEC.

[75] Subhash Suri,et al. Optimal Clearing of Supply/Demand Curves , 2002 .

[76] Tuomas Sandholm. eMediator: A Next Generation Electronic Commerce Server , 2002, Comput. Intell..

[77] Adam Tauman Kalai,et al. Geometric algorithms for online optimization , 2002 .

[78] Vijay Kumar,et al. Online learning in online auctions , 2003, SODA '03.

[79] D. Bergemann,et al. Robust Mechanism Design , 2003 .

[80] Anna R. Karlin,et al. Optimization in the private value model: competitive analysis applied to auction design , 2003 .

[81] Nimrod Megiddo,et al. How to Combine Expert (and Novice) Advice when Actions Impact the Environment? , 2003, NIPS.

[82] Adam Meyerson,et al. Online oblivious routing , 2003, SPAA '03.

[83] Avrim Blum,et al. On polynomial-time preference elicitation with value queries , 2003, EC '03.

[84] Frank Thomson Leighton,et al. The value of knowing a demand curve: bounds on regret for online posted-price auctions , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[85] S. Hart,et al. Uncoupled Dynamics Do Not Lead to Nash Equilibrium , 2003 .

[86] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[87] Tuomas Sandholm,et al. Differential-revelation VCG mechanisms for combinatorial auctions , 2003, EC '03.

[88] H. Peyton Young,et al. Learning, hypothesis testing, and Nash equilibrium , 2003, Games Econ. Behav..

[89] Manfred K. Warmuth,et al. Relative Loss Bounds for Multidimensional Regression Problems , 1997, Machine Learning.

[90] Tuomas Sandholm,et al. Effectiveness of query types and policies for preference elicitation in combinatorial auctions , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[91] Rica Gonen,et al. Negotiation-range mechanisms: exploring the limits of truthful efficient markets , 2004, EC '04.

[92] Ryan Porter,et al. Mechanism design for online real-time scheduling , 2004, EC '04.

[93] Avrim Blum,et al. Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary , 2004, COLT.

[94] Adam Tauman Kalai,et al. Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[95] Marek Karpinski,et al. An algorithm to learn read-once threshold formulas, and transformations between learning models , 2005, computational complexity.

[96] Vincent Conitzer,et al. AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.