Convergence, Targeted Optimality, and Safety in Multiagent Learning

In the previous chapter, we presented an algorithm LoE-AIM that models memory-bounded agents assuming that the memory size of these agents is known beforehand. In situations where such prior knowledge is unavailable, a possible solution can be to use a very large memory size that suffices to be a conservative upper-bound of the true unknown memory size.