Poster
The Update Equivalence Framework for Decision-Time Planning
Samuel Sokota · Gabriele Farina · David Wu · Hengyuan Hu · Kevin A. Wang · J Kolter · Noam Brown
Halle B
The process of revising (or constructing) a policy immediately prior to execution---known as decision-time planning---is key to achieving superhuman performance in perfect-information games like chess and Go. A recent line of work has extended decision-time planning to more general imperfect-information games, leading to superhuman performance in poker. However, these methods require considering subgames whose sizes grow quickly in the amount of non-public information, making them unhelpful when the amount of non-public information is large. Motivated by this issue, we introduce an alternative framework for decision-time planning that is not based on subgames but rather on the notion of update equivalence. In this framework, decision-time planning algorithms are designed to replicate, in the limit, updates of global policy learners. Despite its conceptual simplicity, this approach had surprisingly been overlooked in the imperfect-information game literature. It enables us to introduce a new family of principled decision-time planning algorithms that do not rely on public information, opening the door to sound and effective decision-time planning in games with large amounts of non-public information. In experiments, members of this family produce comparable or superior results compared to state-of-the-art approaches in Hanabi and improve performance in 3x3 Abrupt Dark Hex and Phantom Tic-Tac-Toe.