In case you have any remarks, questions or are stuck, message the education committee via the slack channel #ec-helpme or education@serpentineai.nl.
Rank | Name | 100 games | 250 games | 500 games | Git Link |
---|---|---|---|---|---|
1 | Imre Schilstra | -31.56 | 4.51 | 7.73 | gitlab |
2 | Base Implementation (Dik) | -86.9 | -23.2 | 3.9 | gitlab |
3 | Random bot (+/- 10 variance) | -775.0 | -775.0 | -755.0 | gitlab |
This was tested using the run.py script found in the skeleton code. Every test is the average reward of 10 runs with the amount of training games as stated. Send a message in the #ec-helpme slack channel if you want your bot to be added here.
In this lesson we are going to talk about the following things:
The GitLab links for this lesson is: Browse
The links are currently only available for members.
Q learning is, in short, a machine learning tactic to predict the expected (total) reward an action is going to give. This function can be extracted from an environment by trying a lot of different actions in different states. It works in both deterministic and stochastic environments. The following terms will be used:
= current state of the enviroment
= the action for which the reward is being calculated
= the ''expected future reward'' given a specific state and action, for a function Q.
The Q function can be improved iteratively by trying a certain action and updating the function based on the actual received result. This however does not take into account potential rewards recieved in the future, while the best action to pick is not always the action that gets you the most immediate reward. Therefore, to take into account rewards we are able to get in the future we also take a (discounted) expected future reward, which is going to be from our next state (s prime). Which our Q function happens to predict. For the discount factor usually the sign (gamma) is taken.
The maximum Q value is taken over all the possible actions .
As our Q function does not start out to be perfect we do not want to completely replace the old value with the new value as the expected reward, therefore we add a learning rate, called (alpha), to our update function.
We can store this Q-function into a 2D array in which the first index is the state , and the second index is the action .
For further reading you can have a look at the wikipedia page about Q-learning. Or take a look at an alternative explanation by Percy Jaiswal
on his blog ''Gettin Started with Reinforcement Q Learning''.
If you are not yet familiar with gym please have a look at our gym explanation, as we are going to use it. Specifically we are going to use the Taxi-v3 environment which is very usefull as the observation given by the env is a single integer ranging from 0 to 500, and there are 6 possible actions. This leads to a relative small Q-table of (just) 3000 values.
To get a little head start here is the skeleton code for taxi. You can clone that repository and create your own branch, name it <your_name>
(first and last) and push your code everytime you finished a step. This way the education committee can provide you with feedback. You are going to implement a Q-table using the abstractTrainingClass.
Currently the above head start is for members only, otherwise view it as an extra challenge
In this lesson we looked at how to implement a Q table in Python using numpy, and got familiar with the Gym API. We created a machine-learning algorithm that is able to beat the game taxi. Q tables work great for games with a limited state space and action space, such as taxi, but will fail when the state space gets too big.
Explanation | Git command |
---|---|
clone the repository | git clone https://gitlab.serpentineai.nl/education/QLearning/qtable-taxi |
create a new branch with the name <your_name> | git checkout -b <your_name> |
Checkout to an existing branch (e.g., master) | git checkout master |
add your changes | git add . |
commit your changes | git commit -m "Your commit message" |
first time push | git push -u <your_name> |
Next time push with | git push |
The -u
is a shorthand command for --set-upstream
, and this links your local file with the git server.
The following hotkeys might help:
Key combination | Execute command |
---|---|
CTRL + K | Commit progress (equal to git add . followed by git commit -m "" ) |
CTRL + SHIFT + K | Push progress (equal to git push , except first time git push -u <your_name> ) |