[Mllab] exercise sheet 5 - doubt
Dear Prof. Garke, I have a doubt about the first task of the current sheet. While my implementation of the value iteration algorithm works fine for the TicTacToe, it does not work for the LGame. Playing around with the LGame in the given template, I have seen that in all the states of LGame.unique_states the player who has to play is always Player 1. Similarly the only possible winner in all the states is Player 2. In the Bellman equation the reward therefore will always be zero as Player 1 can never win (since Player 2 never moves and therefore cannot lose). I am sure I am misunderstanding something but I am not able to figure out what. Thank you for your time, Best regards, Valerio Cini
Hi Valerio,
Playing around with the LGame in the given template, I have seen that in all the states of LGame.unique_states the player who has to play is always Player 1. Similarly the only possible winner in all the states is Player 2.
this is on purpose. The unique_states list contains one element from the equivalence class of states with equal value for every class. In Tic-tac-toe the board positions player 1 sees are different to the possible board positions player 2 sees. For example, only the starting player sees the empty board. In the L-Game, it is possible to get back to the starting position but with player 1 and player 2 switched. In consequence, we do not have to differentiate between positions for player 1 and player 2. W.l.o.g. we therefore always assume to be player 1, or more precise, we only need to compute the value for positions of one player, the other player has identical positions. Storing states for player 2 is not needed, since they would have equal value to the state where the players are switched. If you can get back to the starting positions with players switched during the game, it follows that for every state in the state space there is a a state with players swapped. So the player number does not matter, strategically. In Tic-tac-toe this is different.
In the Bellman equation the reward therefore will always be zero as Player 1 can never win (since Player 2 never moves and therefore cannot lose). I am sure I am misunderstanding something but I am not able to figure out what.
Let s be a state where player 2 has to move. The corresponding unique state ss is then (potentially) a reflection, rotation, and the players are switched. By assumption on the equivalence classes, the value of s is then equal to the value of ss. I hope this is helpful. Should you still have issues, you can send me your code and I can try to give you a hint of what might be wrong. Kind regards, Jannik Schürg
participants (2)
-
Jannik Schürg -
Valerio Cini