Dear Prof. Garke,
I have a doubt about the first task of the current sheet.
While my implementation of the value iteration algorithm works fine for the TicTacToe, it does not work for the LGame.
Playing around with the LGame in the given template, I have seen that in all the states of LGame.unique_states the player who has to play is always Player 1.
Similarly the only possible winner in all the states is Player 2.
In the Bellman equation the reward therefore will always be zero as Player 1 can never win (since Player 2 never moves and therefore cannot lose). I am sure I am misunderstanding something but I am not able to figure out what.
Thank you for your time,
Best regards,
Valerio Cini