Hey,
the videos show the best results. For the race, however, with a very simple network which probably causes the result. For the mountain problem, it is probably because there is only a reward for the perfect point and this was not achieved during the training.
In the file "ReinforcementLearning1_2_3" are the solutions to 1, 2 and 3 and accordingly in the other file the solution to the 4.
In
Matthias Ollech