In Serpentine a lot of toy problems are from the OpenAI Gym environment. There are a lot of different gym environments and the one most used are the atari games. OpenAI Gym just provides the environments, we have to write algorithms that can play the games well.
For the basic information take a look at the OpenAI Gym documentation. In the remainder of this tutorial we will explain the installation for Atari Gym, a basic loop explanation from gym, some handy information to know and some extra examples.
Before you install any new packages, always check that you are in (the right) virtual environment. This prevents packages breaking due to different version requirements. To install gym you can type the following either in the Pycharm terminal (located on the bottom left) or in your virtual environment command prompt, when not using Pycharm:
pip install gym
For an installation of the atari gym environment for Windows users there is a guide available here. This has to do with the cmake environment on which atari gym relies. For other users (Linux, Mac) you can simply type the following line for the atari environments:
pip install gym[atari]
Now in order to check if this worked you can go to the Python console (next to the terminal button in Pycharm) or type python
when you are using the command prompt interface.
You should be able to type the following Python code, without any errors:
import gym
env = gym.make("Pong-v0")
In case of errors try to install the above packages again, or contact the education comittee via the slack channel #ec-helpme or email education@serpentineai.nl.
To get familiar with gym read the following sections and try to implement the examples yourself.
From the basic gym documentation we see the following example for a game loop.
import gym
import time
env = gym.make("CartPole-v1")
observation = env.reset()
for _ in range(1000):
env.render()
action = env.action_space.sample() # your agent here (this takes random actions)
observation, reward, done, info = env.step(action)
if done:
observation = env.reset()
time.sleep(1)
env.close()
Let us explain what is happening in every line.
Line | Explanation |
---|---|
1 | Importing the gym module. |
2 | Importing the time module, this is used to slow down the rendering of the game. |
4 | Create a gym environment with the id="CartPole-v1" . |
5 | Reset the game (always required before calling step), this returns an observation. |
7 | Create a loop for a 1000 steps (or frames) in the environment. |
8 | Render the environment to the screen (this can be text or a pop-up window with an image) |
9 | Pick a random action from the possible action space (this action space can change per environment). |
10 | Perform the step with the chosen action and get new information from the environment. |
12 | Check if the game is finished |
13 | Wait for 1 second, so the user can see that the game is done. |
14 | Reset the game after it is done, otherwise the game will crash when calling step. |
16 | Close the game environment. |
The information you receive on line 9 entails (for this game (CartPole) specifically):
observation
: a numpy array of length 4
reward
: a float valuedone
: a booleaninfo
: a dictionary (emtpy for this game)For an overview of all possible games you can use the next line
print("All gym games:\n\t", "\n\t".join([each for each in gym.envs.registry.env_specs.keys()]))
env = gym.make(id="MsPacman-v0")
print(env.spec.id)
Often it is handy for a model to know the observation space and action space. To get the observation space use env.observation_space.shape
and for the number of action use env.action_space.n
. In some games it is also possible to get the action meanings in a human form (e.g. UP
in stead of 2).
This can be achieved for at least all atari games using env.unwrapped.get_action_meanings()
. A minimal code example is given below.
env = gym.make(id="MsPacman-v0")
print(env.observation_space.shape)
print(env.action_space.n)
print(env.unwrapped.get_action_meanings())
This runs the Pacman game once.
import gym
env = gym.make(id="MsPacman-v0")
# A single loop example
done = False
env.reset()
score = 0
while not done:
action = env.action_space.sample()
obs, reward, done, info = env.step(action=action)
env.render()
score += reward
env.close()
print(f"\nFinished one game of {env.spec.id}, with a score of {score}.")
This runs Pacman 5 times, without rendering.
import gym
env = gym.make(id="MsPacman-v0")
nr_games = 5
print(f"\nPlaying {nr_games} games of {env.spec.id}:")
for episode in range(1, nr_games + 1):
done = False
env.reset()
score = 0
while not done:
action = env.action_space.sample()
obs, reward, done, info = env.step(actions=action)
score += reward
print(f"\tFinished episode: {episode}, score: {'%4d' % score}")