![]() In turn, they are rather expensive and likely to underperform on large problems. However, they arguably require more samples than model-based approaches. Indeed, one may consider using a model-free approach such as DQN (Mnih et al., 2015) or DeepNash (Perolat et al., 2022). We tackle this problem by encoding the hidden information in the input representation within the process.įollowing a model-based approach is a sensible idea for situations with imperfect information. In chess, for example, a good move is good whether it is played or not, but in games like poker, where players can bluff, this is not the case. ( 2020) claim that methods such as AlphaZero are not sound in imperfect information environments because the value of an action may depend on the probability that it will be chosen. We also make use of a technique we call TrueSight Learning (TSL) which improves learning performance in early stages of zero-knowledge training. We call our adaptation Policy Combining PIMC (PC-PIMC), merging the results of multiple searches into one policy. We replace MCTS in AlphaZero with our own adaptation of Perfect Information Monte-Carlo (Levy, 1989) (PIMC) because, among other reasons, its memory consumption scales better with the action and state spaces than similar methods such as Counterfactual Regret Minimization (Long et al., 2010) and it remains similar to MCTS, making it easier to adapt further methods. To achieve this, we stay true to the model-based nature of AlphaZero and focus on the planning algorithm. To provide a flexible but strong foundation, we introduce the AlphaZero-like framework AlphaZe∗∗ 1 for imperfect information games, which allows us to easily adapt frameworks for perfect information games such as AlphaZero (Silver, 2018) and CrazyAra (Czech et al., 2020), building bridges between current successes in the field of perfect information games and the unpredictability of hidden information. For some, it is not clear if they converge to a Nash equilibrium or if they just aim to develop strong strategies. One challenge of many current state-of-the-art methods is dealing with the unknown. In recent years, projects like AlphaStar (Vinyals et al., 2019), OpenAI Five (Berner et al., 2019), and Pluribus (Brown and Sandholm, 2019) demonstrated that games with imperfect information can be learned and played with great computational effort and in a short time up to the human level and beyond. However, it is not possible to apply this type of method directly to games with imperfect information, since the planning is based on a perfectly observable environment. AlphaZero and its predecessor AlphaGo mark a breakthrough in this area by showing that it is possible to learn a game with perfect information from zero to human level (or even beyond) by combining reinforcement learning (RL) with MCTS (Silver, 2017). Neural networks combined with Monte-Carlo Tree Search (MCTS) have become standard in many games with perfect information such as chess and Go, but have been less successfully applied to games with imperfect information (Brown et al., 2020). Compared to heuristics and oracle-based approaches, AlphaZe∗∗ can easily deal with rule changes, e.g., when more information than usual is given, and drastically outperforms other approaches in this respect. We examine its learning convergence on the games Stratego and DarkHex and show that it is a surprisingly strong baseline, while using a model-based approach: it achieves similar win rates against other Stratego bots like Pipeline Policy Space Response Oracle (P2SRO), while not winning in direct comparison against P2SRO or reaching the much stronger numbers of DeepNash. To this end, we introduce a novel algorithm based solely on reinforcement learning, called AlphaZe∗∗, which is an AlphaZero-based framework for games with imperfect information. ![]() Here, we challenge this view and argue that they are a viable alternative for games with imperfect information-a domain currently dominated by heuristic approaches or methods explicitly designed for hidden information, such as oracle-based techniques. However, they have not been developed for domains where uncertainty and unknowns abound, and are therefore often considered unsuitable due to imperfect observations. AlphaZero-like frameworks which combine Monte-Carlo tree search with reinforcement learning have been successfully applied to numerous games with perfect information. ![]() In recent years, deep neural networks for strategy games have made significant progress.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |