Welcome to the first lesson of Pommerman AI. In case you had any problems with setting up Python, Pycharm or the Pommerman environment please inform the education committee so we can adjust the guides.
In case you have any remarks or questions on these tutorials they are always welcome, preferably via the slack channel wiki-content-feedback. (You can notify the education committee specifically by adding @educo to your message.) You may also send us an email at education@serpentineai.nl.
We are going to make a self learning ai for the free for all gamemode in pommerman. This lesson series will guide you in creating your own version of it. We will use the same basis to start from as the classic agent.
In this first lesson we are going to talk about the following things:
It is assumed that you already know what the game is and how it works on a basic level. If you do not know that yet, please have a look at lesson 1 and if something is still not clear, please look at the enviroment details
At the beginning of each lesson, there will be three GitHub links that can be useful while you work through the lessons. The Browse link will open the GitHub repository for Pommerman at the place where the changes for the lesson you are reading were added, without including any changes introduced in future chapters. The Zip link is a download link for a zip file including the entire game up to and including the changes in this lesson. The Diff link will open a graphical view of all the changes that were made in the lesson you are about to read.
//TODO Put in correct links
The GitHub links for this lesson are: Browse, Zip, Diff.
The links are currently only available for members.
There is a lot of terminolgy used in machine learning. Most of which you will not need to know the ins and outs of, however a low-level understanding of these concepts/terms will greatly help in understanding the choices made in this lesson series.
To get started we are going to create a keras model, that is able to accept a numpy array of a predefined size, and outputs an array of the expected value for each action. Only the input shape and output shape of our model matters, everything inbetween can be to your liking, although that is not that important for now. To create this model we are going to use tensorflow, and more specifically keras. See the tf.keras api for more information.
import tensorflow.keras as keras
import tensorflow.keras.layers as layers
def create_model(self, input_shape: tuple, output_shape: int):
"""
Creates and compiles a keras
model based on an input_size and output_size,
and assings it to self.model
example input_shape: (7, 11, 11)
output_shape: 6
"""
pass
import tensorflow.keras as keras
from tensorflow.keras.layers import Dense, Conv2D, Flatten
def create_model(self, input_shape: tuple, output_shape: int):
""""Creates and compiles a keras model based on an input_size and output_size."""
# Subject to Change, this is purely a test model.
# The final architecture can be whatever you want!
# For more information about what layers there
# are and what they do look at the tf api, or keras api.
model = keras.Sequential()
# A convolution layer is very usefull when you have a
# situation where the shape is important, and not
# necesary the location of that shape.
model.add(Conv2D(filters=5, kernel_size=3, input_shape=input_shape))
# Conv2d returns a 3rd rank tensor (3d array of values) of
# shape (filters, new_rows, new_cols), the flatten layer reshapes
# this to a 1st rank tensor (1d array of values).
model.add(Flatten())
# A Dense layer is a fully connected layer that has as output shape
# (units) (if its input is a 1st rank tensor).
model.add(Dense(units=output_shape, activation='linear'))
# To finalise the model we compile it, in this case we will use the
# 'adam' optimizer and as the loss function 'categorical_crossentropy'.
model.compile('adam', loss='categorical_crossentropy')
# Now we can return the build model.
return model
Our model is not able to understand the observation from the environment, because the environment does not return a 3 dimenional array. Our model is able to accept a 3 dimensional input of numeric values of size (x, y, z). The observation that we get is a dictionary with different value and shape inside it. See env-info for a detailed explanation of each entry.
We are going to encode the environment information in a 3d array. Since we do not need to use all values we are going to use the following:
Field | dimensions |
---|---|
board | 11 x 11 |
bomb_blast_strength | 11 x 11 |
bomb_life | 11 x 11 |
bomb_moving_direction | 11 x 11 |
flame_life | 11 x 11 |
position | (1, 2) |
can_kick | (1,) |
ammo | (1,) |
blast_strength | (1,) |
enemies | (1,4) |
Note: All layers in your 3d array need to have the same x, y dimension. As the board is always 11, 11 we will use that as a base. And try to place as much information as possible in a single layer.
import numpy as np
def preprocess(obs: dict):
""""Preprocess the board into an observable format for our model
:arg obs: raw observation from the environment
:return numpy array with the preprocessed observations"""
pass
import numpy as np
def preprocess(self, obs):
""""Preprocess the board into an observable format for our model
:arg obs: raw observation from the environment
:return numpy array with the preprocessed observations"""
position = obs['position']
enemies = obs['enemies']
# Make all enemies id 11, as we will focus on ffa we do not have teammates
np.where(obs['board'] in enemies, 11, obs['board'])
# Make our id 10
obs['board'][position[0], position[1]] = 10
# As an array needs to have all values filled, we create an extra 11, 11 array to put in non positional data
meta_info = np.zeros(shape=(11, 11))
meta_info[0][0] = obs['ammo']
meta_info[0][1] = obs['blast_strength']
meta_info[0][2] = int(obs['can_kick'])
# Stack all info into shape (11(x), 11(y), 7(z))
board_obs = [obs['board'],
obs['bomb_blast_strength'],
obs['bomb_life'],
obs['bomb_moving_direction'],
obs['bomb_moving_direction'],
obs['flame_life'],
meta_info]
model_input = np.stack(board_obs)
return model_input
We are able to preprocess the data in a (for our model) readable format, now we are going to use our model to predict an action.
Note: Keras expects a batch during a predict call, you will need to use np.expand_dims(obs, 0) to give it the right dimensions. And you will need to revert this on output.
def act(obs, action_space):
"""Main function that is being called by the environment"""
#TODO Preprocess the obs
#TODO Create model if it does not exist yet,
# using the shapes from our preprocessed obs and action_space
#TODO Get an action from our model
return action_space.Stop
def act(self, obs: dict, action_space):
"""Main function that is being called by the environment"""
# Preprocess the obs
obs = self.preprocess(obs)
# Create the model if it does not exist
if not self.model:
self.create_model(obs.shape, action_space.n)
# Get an action from our model
# Because Keras expect a batch we need to add a batch dimension
obs = np.expand_dims(obs, 0)
# Keras will also return predictions for a batch,
# so we will need to select the first (and only) value array
prediction = self.model.predict(obs)[0]
# Predictions contains our possible actions and the expected
# value for each of those actions. As we want to maximize our
# expected value, we get the index of that value, which
# corresponds with that action
action = tf.argmax(prediction)
return action
Run your agent by importing and adding it in the serpentine/run.py file.
In this lesson we have learned how to create a basic model, preprocess the given observation in to something usefull for our model and getting an action from our model based on the observation.