In case you have any remarks or questions they are always welcome at either the education commissie, via the slack channel ec-helpme, or at our e-mail address education@serpentineai.nl.
In this lesson we are going to talk about the following things:
The GitHub links for this lesson are: Browse, Zip, Diff.
The links are currently only available for members.
can_place_bomb
Now that we can move to any location that we want, it is time to try and blow something up. Before we can blow something up we need to check if the agent can place a bomb. For this we are going to write the can_place_bomb
method, which is going to return a boolean that True
whenever the agent can place a bomb. It is up to you to determine the input arguments for this method and then code it.
In case you are not sure what we can use for this, take another look at what is stored inside the observation.
board
alone is not going to work, can you figure out why?With this new method we are going to place a bomb, whenever it is possible in the act
method. This means that it is the first check that we are performing, and first action that we will put in our queue. The code that has to be added to act
looks like this
if self.can_place_bomb(**your arguments**):
self.queue.append(Action.Bomb)
If it works your agent will place two bombs, one that is avoided moving to the field (2, 2), if it is empty, and one that blows up your agent because it isn't moving away. This has to be fixed, because suicide is not what we want!
You might think that using the original board would work, but we actually need two other argument. This is the reasoning:
bomb_life
or bomb_blast_strength
, in this example bomb_life
was used as input argument.obs["ammo"]
.Lastly, there is one more input argument:
method
, we also need to lead the method argument with self
. This value will always be automatically passed to a class method as first argument.Now see if you can write a method can_place_bomb
that takes as input arguments the bomb_life
, ammo
and my_location
. And returns a boolean that is True
whenever the agent can place a bomb.
The solutions are written from complete coding idea, to more concise form. Always make sure that when you make your code more compact, it does not affect your ability to understand the code. A general guide line for this is that a single line of code should not exceed 80/120 characters (where 80 is prefered and 120 a hard max) and only perform a single clear task.
def can_place_bomb(self, bomb_life: np.ndarray, ammo: int, my_location: tuple) -> bool:
""" Checks if you can place a bomb, if there is no bomb already placed and you have enough ammo return True. """
# Check for bombs
if not bomb_life[my_location] == 0:
return False
# Check for ammo
if not ammo > 0:
return False
return True
We can simplify the last if
statement in the same way as we did in the check_direction
method.
def can_place_bomb(self, bomb_life: np.ndarray, ammo: int, my_location: tuple) -> bool:
""" Checks if you can place a bomb, if there is no bomb already placed and you have enough ammo return True. """
# Check for bombs
if not bomb_life[my_location] == 0:
return False
return ammo > 0:
In example 2 there are only two boolean conditions that we are checking. Both of the conditions have to be True
in order for us to return True
. This can be done in one line using the and
logic. Check out logical operators for a more in depth overview.
def can_place_bomb(self, bomb_life: np.ndarray, ammo: int, my_location: tuple) -> bool:
""" Checks if you can place a bomb, if there is no bomb already placed and you have enough ammo return True. """
return bomb_life[my_location] == 0 and ammo > 0
goal_location = (2, 2)
if self.can_place_bomb(obs['bomb_life'], obs['ammo'], my_location):
self.queue.append(Action.Bomb)
for direction in self.create_path(board, my_location, goal_location):
self.queue.append(direction.action)
Optionaly one can assign a variable for obs['bomb_life']
and obs['ammo']
.
bomb_life = obs['bomb_life']
ammo = obs['ammo']
goal_location = (2, 2)
if self.can_place_bomb(bomb_life, ammo, my_location):
self.queue.append(Action.Bomb)
for direction in self.create_path(board, my_location, goal_location):
self.queue.append(direction.action)
A fixed goal location will always lead to the same problem, unless we stop placing bombs, but hey it is Pommerman. So therefore the goal location has to be determined over and over again. Let's create a new function move_to_safe_location
, where safe means not on a bomb or in its flames. Here it becomes a bit tricky, because we have to do several things at the same time.
Now see if you can solve the following two tasks:
Luckily we can reuse some of the code from before, for example moving the bot is already implemented and finding all locations that our bot can reach is a subpart of moving to a location.
By using the env-info we can see that all the information for bombs is stored in the bomb_life
and bomb_blast_strength
. Now we also need some extra information about the flames, this is stored in flame_life
. With this information we can build a map of all the dangerous locations on the map.
Now you might have guessed that already from the title of the next section, but we are first going to create a danger map. This map can be used to help us locate save places, so after that we can implement move_to_safe_location
.
Now before we start programming we have to make some choices on how we are going to represent the danger map. We could simply state that everywhere the bombs can reach is going to be dangerous. But if a bomb has just been placed we still have 9 turns left before it really becomes deadly.
To also catch this information we are going to create a dangermap with a danger level, where 1 means: you are dead if you move here, 10 means: you should probably move away at some time in the future and a 0 means: do not worry at all. This means that we will create an array of size 8 x 8
, to match the current map size, that will contain the integers going from 0 to 10 (10 is the maximum life of a bomb).
As a reminder, this is the information that we are going to use:
bomb_blast_strength
, to find the areas where the flames will be (and the bomb locations).bomb_life
, the time before the bomb is going to explode.flames_life
, the areas where there is already a flame, and the time that the flame will be there.Now we are going to split the problem into two parts, the flame part and the bombs part. There is an advantage when we start with the flames, any idea why?
So we first have to set our initial danger map, for this the flame_life
map is a good starting point, since this already contains all the flame lifes. A flame means that we would die if we move to it, we are going to change the danger level to 1. There are two options to do this when we are using numpy:
array[condition] = value
.result = np.where(condition, value if true, value if false)
Now for brevity the mask assignment will be used, note that this is an in-place
operation, so the values are changed in place. This is not true for the where
operation, which will return a new modified array, which we have to store in a variable again.
The code to set the variable danger_map
initially with the obs['flame_life']
and perform a masking to set all flames (values that are bigger than 0) to danger level 1, will be:
danger_map = obs['flame_life']
danger_map[danger_map > 0] = 1
The next step is to add the exploding bombs information. The solution for that has been split into 3 parts:
row
, col
, timer
and strength
information per bomb.+
sign of the bomb explosion and put that information into the danger map.Try it yourself if you feel up to the challenge or take a look at our solution for this problem.
Now the bomb information has to be added to the danger map. For now let us ignore the information that crates will block bombs and that bombs can be chained (one bomb explosion can trigger another bomb directly if the flames can reach that bomb). This is okay, as long as you keep this kind of information in the back of your mind. It simplifies the code and makes it easier to test if your idea will work in practice. It is often considered good practice to have some simple working code to start with and then increase the complixety step by step, by taking into account more and more edge cases (conditions that only occure rarely).
For constructing the bomb blast radius, we have to create a symmetric plus sign, where the length of a line depends on the bombs strength. To find the bomb location, we can use the np.where
, and to find the timers and strength we can use the return statement from np.where
as index.
# Find all bomb locations, bomb timers and strength
bombs = np.where(obs['bomb_life'] > 0)
bombs_timers = obs['bomb_life'][bombs]
bomb_strength = obs['bomb_blast_strength'][bombs]
In this section a lot of build in functions will be used, check the tab
build-ins
for an overview and explanation of the used build-in fuctions in this section.
In order to use this explosion symmetry we can go over the start point and end point of a line moving over either the rows (y-axis) or the columns (x-axis) and update the values one by one. This is very similar to a for loop, but then we have a start and end point. There is a built-in function in Python that can generate this line of values from a start point to an end point. It is called the range function.
The range
function does require the values to be integers, but the information in bomb_life
and bomb_blast_strength
are np.float64
. We will convert these to integers so we can use the built-in range
function, by using another built-in function called map. This leads us to the following initalization code:
# Find all bomb locations, bomb timers and strength
bombs = np.where(obs['bomb_life'] > 0)
bombs_timers = map(int, obs['bomb_life'][bombs])
bomb_strength = map(int, obs['bomb_blast_strength'][bombs])
Now we are going to need one more built-in function, called zip, to combine the following data per bomb: the location (row, col), the time until destruction and the strength of the bomb. This will look like this:
for row, col, timer, strength in zip(*bombs, bombs_timers, bomb_strength):
pass
Now the *
before bombs might be a bit confusing, but this is because there are two different arrays in bombs, namely the row
and the col
. By using the unpacking assignment (*
) we split the bombs in those two arrays, so for the zip
function we are joining 4 different lists together.
To create the plus sign, we need the begin and ending of the bomb blast, this can be done by adding the strength to the bombs current location, this has to be done both for the row and column. The value that is going to be stored in the danger map is the time untill explosion. See if you can implement this.
Note 1: We have to stay inside the playing field.
Note 2: The range functions, loops until the end, so the last value is never reached.
# Now we are going to set the danger information
for row, col, timer, strength in zip(*bombs, bombs_timers, bomb_strength):
# Reduce strength by one, since we are creating a `+` form, with the bomb as center.
strength -= 1
# Calculate the upper and lower ranges of the bombs (this is the + sign)
row_low, row_high = max(row - strength, 0), min(row + strength, 10)
col_low, col_high = max(col - strength, 0), min(col + strength, 10)
# Set the information on the danger map
for row_danger in range(row_low, row_high + 1):
danger_map[row_danger, col] = timer
for col_danger in range(col_low, col_high + 1):
danger_map[row, col_danger] = timer
Then the last part is returning the danger_map
and adding the documentation and type hints. Since this function is returning a map which has specific information you probably want to use a multi line docstring:
def create_danger_map(self, obs: dict) -> np.ndarray:
"""
Line 1...
Line 2...
etc...
"""
In this section a lot of build in functions were used, the following table will provide an overview of the used build-in fuctions in this section. For a complete overview of build-in functions check python functions.
function | arguments | example | explanation |
---|---|---|---|
range | start, end, step | In: range(3) Out: [0, 1, 2] In: range(2, 6, 2) Out: [2, 4] |
Create a sequence of numbers from start to (not including) end using a fixed step size. By default start = 0 and step = 1 . |
map | callable, iterable | In: list(map(str, [0, 1, 2])) Out: ['0', '1', '2'] |
It takes an iterable, items that you can use in for _ in iterable , such as a list , dict and applies the callable which is a function/method to every item in the iterable . By default it will return an iterable , so often you want to convert the results back to a list using list(map(..., ...)) . |
zip | iterable, iterable | In: list(zip([0, 1], [3, 4])) Out: [(0, 3), (1, 4)] |
Combines two iterables entry for entry. It will stop when one of the iterables is empty (when one is longer, it will stop early). By default it will return an iterable , so often you want to convert the results back to a list using list(zip(..., ...)) . |
max | iterable | In: max([0, 1, 2]) Out: 2 |
Locates the maximum value in an iterable |
min | iterable | In: min([0, 1, 2]) Out: 0 |
Locates the minimum value in an iterable |
def create_danger_map(self, obs: dict) -> np.ndarray:
"""
Returns a map the size of the board, with the following positional encoding:
0 : A safe place where you can move to.
1 : A place that will kill you if you are there.
>1 : A place that will be dangerous in the future (counts down to 1).
"""
# Set our initial danger map
danger_map = obs['flame_life']
danger_map[danger_map > 0] = 1
# Find all bomb locations, bomb timers and strength
bombs = np.where(obs['bomb_life'] > 0)
bombs_timers = map(int, obs['bomb_life'][bombs])
bomb_strength = map(int, obs['bomb_blast_strength'][bombs])
# Now we are going to set the danger information
for row, col, timer, strength in zip(*bombs, bombs_timers, bomb_strength):
# Reduce strength by one, since we are creating a `+` form, with the bomb as center.
strength -= 1
# Calculate the upper and lower ranges of the bombs (this is the + sign).
row_low, row_high = max(row - strength, 0), min(row + strength, 7)
col_low, col_high = max(col - strength, 0), min(col + strength, 7)
# Set the information on the danger map, first row and then column.
for row_danger in range(row_low, row_high + 1):
danger_map[row_danger, col] = timer
for col_danger in range(col_low, col_high + 1):
danger_map[row, col_danger] = timer
return danger_map
With the danger map created, we now have to use it to find save locations and paths. This is actually going to be very similar to what we did with can_move_to
. The only difference is that we now also have to check the danger map, to see if the move is actually possible. For this create a new function with the following interface:
def find_reachable_safe_location(self, board: np.ndarray, danger_map: np.ndarray, location: tuple) -> tuple:
return location
It will take in the board
, danger_map
and a location
(for example my_location
) and returns a safe location that you can reach, preferably the closest one. For simplicity only consider locations that are absolutely dangerous, so have a danger map value of 1.
We are going to implement another BFS, for this we will need the following:
to visit
and have visited
.to visit
list to checkin_bounds
)danger map
value other than 1).visited
list and move on.Step 3, might be a bit weird, because we do not want to go over dangerous nodes. The reason that we could have a dangerous node there is that our starting location
might be a dangerous. This is why we allow the first node (location
) to be dangerous, but any following nodes cannot be dangerous. Step 4 will ensure that the following nodes are not dangerous, because we filter out the dangerous nodes.
def find_reachable_safe_location(self, board: np.ndarray, danger_map: np.ndarray, location: tuple) -> tuple:
to_visit = [location]
visited = []
while to_visit:
point = to_visit.pop(0)
if danger_map[point] == 0:
return point
for direction in Directions.NEIGHBORS:
new_point = tuple(np.array(point) + direction.array)
# Filter out the bad points
if not self.in_bounds(new_point) or new_point in visited or danger_map[new_point] > 0:
continue
# If we can reach this point add the point to the to visit list
if self.check_direction(board, point, direction):
to_visit.append(new_point)
visited.append(point)
# no safe place was found, so stay where you are and pray.
return location
With the new helper methods we can finalize the move_to_safe_location
. Firstly create the danger map, then check if your position is safe, if so we can return this location. If our position is not safe we have to find a safe location, so return the new location from the find_reachable_safe_location
.
def move_to_safe_location(self, obs: dict) -> tuple:
""" Returns a location to which we can safely move. """
# Find all position that can lead to our destruction.
danger_map = self.create_danger_map(obs)
# Check if our current position is safe, if so we can go/stay there.
my_location = obs['position']
if danger_map[my_location] == 0:
return my_location
# Find a reasonable safer location
safe_position = self.find_reachable_safe_location(obs['board'], danger_map, my_location)
return safe_position
We do not have to explicitly check if our current location is safe, because this is also done in the find_reachable_safe_location
.
def move_to_safe_location(self, obs, location: tuple):
""" Returns a location to which we can safely move. """
# Create a mapping of positions and danger level
danger_map = self.create_danger_map(obs)
# Check if our current position is safe, if so we can go/stay there.
return self.find_reachable_safe_location(obs['board'], danger_map, location)
The last change we have to make in my_agent.py
is in the act
method. Here we change the fixed goal location to the result from move_to_safe_location
. Now your bot should place a bomb and run to a safe place.
Note: the commented code is really unnecessary at this moment, if we would need it we can restore it from our Git
history, as will be explained in the Git
section.
def act(self, obs, action_space):
# Main event that is being called on every turn.
if not self.queue:
my_location = obs['position']
board = obs['board']
goal_location = self.move_to_safe_location(obs, my_location)
if self.can_place_bomb(obs['bomb_life'], obs['ammo'], my_location):
self.queue.append(Action.Bomb)
for direction in self.create_path(board, my_location, goal_location):
self.queue.append(direction.action)
# If we cannot move in any direction, send a pass
if not self.queue:
self.queue.append(Action.Stop)
return self.queue.pop(0)
Now in order to test our new evasive bot we are going to increase the number of epsisodes played. For this we have to change a value in the run.py
. Move over to that file and make the following changes:
for
loop to 10, so we get to see 10 matches.for episode in range(10): #
env.render(do_sleep=False)
Instead of using git only at the end, we can also add and commit when we have implemented a full method, so we create more checkpoints. But where are all the commits stored, and how can we use them? If you take a look in PyCharm at the top, the menu option Git
, then Current File
and Show History
. We can see all changes that we made and also the commits. In the case that we would break working code we can now revert
(go back) to an earlier point in time. As a final step, add and commit the new changes. (Go back to the previous lesson if you need to see the commands.)
In this lesson we have taken a look at how we can safely place a bomb and run away. If it is all implemented correctly you should see how your agent keeps walking around and dodging all bombs. In most cases this bot is good enough to win from the SimpleAgent. It seems that staying alive is a good tactic in any game.
In the next part we are going to take a look at destroying specific objects instead of randomly putting down a bomb whenever we can.