In case you have any remarks or questions they are always welcome at either the education commissie, via the slack channel ec-helpme, or at our e-mail address education@serpentineai.nl.
Now that we have a basic bot working, we are going to improve it so it becomes more fun to watch and it will win against a wider range of opponents. Since we have already build the foundation of our bot in the previous lessons, the remainder of the lessons will talk about small tweaks and coding optimizations. After all of your hard work the following lessons will be relatively short, but leave all the coding (and thinking) up to you.
In this lesson we are going to talk about the following things:
The GitHub links for this lesson are: Browse, Zip, Diff.
The links are currently only available for members.
Another feature of git is branching. This allows multiple people to work on the same project without interfering with each other. It also is a good option to attempt multiple approaches in parallel. An example is that you want to implement an A* searching algorithm, you create a branch start working on it, and halfway decide it might be too much work. Then by going back to the master (the point where you branched off), start a new branch where you implement a simpler BFS algorithm and find out that it isn't fast enough. Instead of having to restart all over you can just switch branches and continue working on the A* algorithm.
This is just an example of how to use branches, but there are a lot more (and better) examples out there. Learning git or any file management service can help you colloberate and keep track of different attempts much better, than copy pasting to other files ever time. To close out this lesson add, commit (and push it).
Before making a new branch always make sure that you have pushed all changes, otherwise the commits will move to the new branch instead of the old one. If you want to try it out create a new branch for this lesson:
git checkout -b lesson-6
Now you will be automatically switched to the branch lesson-6
. If you want to switch between brances you can use:
git checkout <branch name>
Your default branch is called master
(or main
in some companies).
Somethings might be better explained with an image.
So what happens is that we create a branch from master
, on which we develop/work. Then when we are happy with the results we will merge/combine these changes to the master
branch. This now assures that we always have a working master
copy, to which we can return when things are on fire.
The previous lesson is already showing off a fairly good bot, but that is only compared to the SimpleAgent
, which is prone to killing itself. In this tutorial we are also going to compare it against the PlayerAgent
, which is a bot that does nothing (unless you use the arrow keys).
Make the following changes in run.py
, in the main
function (not a method):
agent_list = [
MyAgent(),
agents.PlayerAgent(),
]
result = info['result'].name if 0 in info.get('winners', []) or info['result'].name == 'Tie' else 'Lose'
print(f"Episode: {episode:2d} finished, result: {result}")
We changed the resulting printing to now show three states instead of two Win
, Lose
and Tie
. Run your bot and see what happens against this new opponent. After this you probably see why we have to keep improving our bot.
def main():
""" Simple function to bootstrap a game. """
# Print all possible environments in the Pommerman registry
print(pommerman.REGISTRY)
# Create a set of agents (exactly four)
agent_list = [
MyAgent(),
agents.PlayerAgent(),
]
# Make the "Free-For-All" environment using the agent list
env = pommerman.make('OneVsOne-v0', agent_list)
# Run the episodes just like OpenAI Gym
for episode in range(10):
state = env.reset()
done = False
while not done:
# This renders the game
env.render(do_sleep=False)
# This is where we give an action to the environment
actions = env.act(state)
# This performs the step and gives back the new information
state, reward, done, info = env.step(actions)
result = info['result'].name if 0 in info.get('winners', []) or info['result'].name == 'Tie' else 'Lose'
print(f"Episode: {episode:2d} finished, result: {result}")
env.close()
You probably already understand why we added the extra state Tie
, since all matches will results in a Tie
unless the agent accidently moves close to the enemy bot, but that happens very (very) rarely.
Episode: 0 finished, result: Tie
Episode: 1 finished, result: Tie
Episode: 2 finished, result: Tie
Episode: 3 finished, result: Tie
Episode: 4 finished, result: Tie
Episode: 5 finished, result: Tie
Episode: 6 finished, result: Tie
Episode: 7 finished, result: Tie
Episode: 8 finished, result: Tie
Episode: 9 finished, result: Tie
This is a small thing to commit, but if this is the only thing we are going to change in this file, it is still better to do it now, than try to commit it together with changes in another file.
git add .
git commit -m "New opponent and show Tie as an extra result."
In the previous lessen we have learned how to drop a bomb and stay alive, but the placing of bombs is quite random. We can quickly improve the bomb placing by selecting a position that will clear out some crates. For this we need to find the crates first, so think about how to implement the method find_crates
and where we can use it. After you have made your TODO implement the method .
One solution for this task is with the board
and location
. By checking the adjacent locations we can find the number of crates that can be destroyed from that location. This means that we ignore the strength of our bombs, but keep the code simpler to start with.
If you want to implement the method using the bomb strength you would end up with code similar to the creation of the danger_map
. One advantage of that code is that you can find more good location, but it makes the assumption that you pick up powerups, for which the agent has no code yet. If you do not pick up any powerups your blast strength is 1. Start simple and make it more complex when it is required, but if you like a challenge feel free.
This is the naïve implementation, that is only looking at crates right next to our position.
def find_crates(self, board: np.ndarray, location: tuple) -> int:
""" Returns the number of crates orthogonally adjacent to your position. """
crates = 0
for direction in Directions.NEIGHBORS:
new_point = tuple(direction.array + np.array(location))
if self.in_bounds(new_point) and board[new_point] == Item.Wood.value:
crates += 1
return crates
IndexError
on board[new_point]
if the new_point
is out of bounds?and
condition lazy
, which means that they first check the left condition before moving on to the right condition. So if you have expensive checks, put them last.Currently not implmented, if you want to share your solution to this, please contact the education committee.
This is already a good point to do a commit, you just implemented a whole method.
git add .
git commit -m "Added find crates"
With the new method find_crates
, the can_place_bomb
can be extended to also check for crates before allowing a bomb to be placed. Update the method with the required code to also check for crates.
In order to check for crates we also need the board, so we have to update the act
method as well, since we call can_place_bomb
there.
def can_place_bomb(self, board: np.ndarray, bomb_life: np.ndarray, ammo: int, my_location: tuple) -> bool:
""" Checks if you can place a bomb, if there is no bomb already placed and you have enough ammo return True. """
return bomb_life[my_location] == 0 and ammo > 0 and self.find_crates(board, my_location)
if self.can_place_bomb(obs['board'], obs['bomb_life'], obs['ammo'], my_location):
self.queue.append(Action.Bomb)
Now with this PlayerAgent
opponent we want the change the move_to_safe_location
to return locations that contain at least something to blow up. If there is nothing to blow up, then you might want to stand still. See if you can update the method to new required functionality, good luck.
Side tabs:
There are probably many ways, in how you can achieve the described goal, in here one of them will be explained. Feel free to make your own implementation, and ask the educo committee to review.
We are no longer looking for the first safe location, but are looking for the first safe location that contains a crate to explode. For this the following has to be changed.
find_reachable_safe_locations
The code changes will be discussed per method:
In order to obtain all safe locations, the find_reachable_safe_location
has been adjusted to return all visisted locations. For this the early termination has been removed and the return value has been changed:
Early termination is removed.
if danger_map[point] == 0:
return point
Return value has been changed.
return visited
Bonus: The type hinting return value has been changed, and the name has been updated to locations.
def find_reachable_safe_locations(self, board: np.ndarray, danger_map: np.ndarray, location: tuple) -> list:
safe_locations
, but in locations that are fully safe, in any case have a 0
in the danger_map
. For this a list comprehension
was used instead of a filtering for
loop. See the extra
tab for more information about this.fully_safe_locations = [location for location in safe_locations if danger_map[location] == 0]
fully_safe_location
, if there is any with a crate we will return that location (no list comprehension
was used because there is a return statement).for safe_location in fully_safe_locations:
if self.find_crates(obs['board'], safe_location):
return safe_location
fully_safe_location
if it exists, otherwise return our current position.if fully_safe_locations:
return fully_safe_locations[0]
return location
def find_reachable_safe_locations(self, board: np.ndarray, danger_map: np.ndarray, location: tuple) -> list:
to_visit = [location]
visited = []
while to_visit:
point = to_visit.pop(0)
for direction in Directions.NEIGHBORS:
new_point = tuple(np.array(point) + direction.array)
# Filter out the bad points
if not self.in_bounds(new_point) or new_point in visited or danger_map[new_point] == 1:
continue
# If we can reach this point add the point to the to visit list
if self.check_direction(board, point, direction):
to_visit.append(new_point)
visited.append(point)
# no safe place was found, so stay where you are and pray.
return visited
def move_to_safe_location(self, obs, location: tuple):
""" Returns a location to which we can safely move. """
# Create a mapping of positions and danger level
danger_map = self.create_danger_map(obs)
# Check if our current position is safe, if so we can go/stay there.
safe_locations = self.find_reachable_safe_locations(obs['board'], danger_map, location)
fully_safe_locations = [location for location in safe_locations if danger_map[location] == 0]
for safe_location in fully_safe_locations:
if self.find_crates(obs['board'], safe_location):
return safe_location
if fully_safe_locations:
return fully_safe_locations[0]
return location
In the provided solution a list comprehension
was used. A list comrpehension is a shorter way of writing a for loop and is generally faster in python. A full explanation with examples can be found here. In short it makes writing for
loops that only append (add) data to a list simpler. The used example will be shown both with and without the list comprehension, followed by the general pattern that can be written as list comprehension.
# Check if our current position is safe, if so we can go/stay there.
safe_locations = self.find_reachable_safe_locations(obs['board'], danger_map, location)
fully_safe_locations = [location for location in safe_locations if danger_map[location] == 0]
# Check if our current position is safe, if so we can go/stay there.
safe_locations = self.find_reachable_safe_locations(obs['board'], danger_map, location)
fully_safe_locations = []
for location in safe_locations:
if danger_map[location] == 0:
fully_safe_locations.append(location)
Any pattern that looks like:
var = [] # Creating placeholder list
for each in iterable: # For loop
if condition: # Some condition
var.append(each) # Appending the valid `each` to the placeholder
Can be replaced by:
var = [each for each in iterable if condition]
Side note: people sometimes write the list comprehension in their seperate components to make it more clear:
var = [each # The value that is being added
for each in iterable # The for loop
if condition] # The condition
The advice is to only do this if you have too. If your comprehension becomes too complicated it is almost always better to write it as a for loop instead. In this case complicated, means: more than one if statement or a big operation on each.
Starting to see a pattern? Now that we have implement the usage of find_crates
, it would be nice to go back to just before this point when we have suddenly introduced bugs.
git add .
git commit -m "Applying find crates, bombs are only dropped by crates and find safe location is prioritizing locations with neighboring crates."
Alternatively you can use the Pycharm
interface for committing. By pressing CTRL + K, a commit dialog will pop-up, where you have to enter your commit message.
Next to committing
Pycharm
can also perform thepush
action, the short cut for that is CTRL + SHIFT + K
Since we are only accepting crates as explodables, we can get stuck when there are no more crates, or if they are obscured by upgrades. This can be prevented by changing find_crates
to explodable_neighbors
. Now anything that can be exploded should be noted.
When using Pycharm, you can use SHIFT + F6 with the cursor on top of a variable to rename the instance. This will automatically rename all instances (and calls) to that variable. Using SHIFT + F6 on the method name
find_crates
, will also rename the instances inmove_to_safe_location
andcan_place_bomb
. For an overview of all refactor action check popular refactors orright click -> Refactor
.
Here we use the in
condition to check if a value board[new_point]
is contained in a list or tuple of items, more generally an iterable
object. We put the explodable
as a separate variable since it contains a lot of items.
def explodable_neighbors(self, board: np.ndarray, location: tuple) -> int:
""" Returns the number of explodable items orthogonally adjacent to your position. """
explodable_count = 0
explodable = [Item.Agent1.value, Item.Wood.value, Item.ExtraBomb.value,
Item.IncrRange.value, Item.Kick.value]
for direction in Directions.NEIGHBORS:
new_point = tuple(direction.array + np.array(location))
if self.in_bounds(new_point) and board[new_point] in explodable:
explodable_count += 1
return explodable_count
Now we have to compare our agent against two different opponents, the PlayerAgent
and the SimpleAgent
. The old agent was very good against the SimpleAgent
and was able to beat it roughly 7 out 10 games. With the new changes there is also an option of obtaining a Tie
.
See what happens when you run the updated bot first against the PlayerAgent
and then against the SimpleAgent
. You might want to make a few runs, or increase the number of games to get a good guess of what is happening.
We clearly can beat doing nothing now:
Episode: 0 finished, result: Win
Episode: 1 finished, result: Win
Episode: 2 finished, result: Win
Episode: 3 finished, result: Win
Episode: 4 finished, result: Win
Episode: 5 finished, result: Win
Episode: 6 finished, result: Win
Episode: 7 finished, result: Win
Episode: 8 finished, result: Win
Episode: 9 finished, result: Win
When running the matches you can see our score being all over the place. When you take a look at the visual games, you will see that both of our agents are trying to move side by side a lot of times. This moving side by side is often broken by the SimpleAgent
performing a different action from time to time.
Some example runs summary over 10 games:
Run | Win | Tie | Lose | Overall |
---|---|---|---|---|
1 | 3 | 4 | 3 | Tie |
2 | 3 | 4 | 3 | Tie |
3 | 5 | 2 | 3 | Win |
4 | 3 | 2 | 5 | Lose |
5 | 4 | - | 6 | Lose |
The above are examples, but the general idea is that it is basically random if we win, lose or have a tie. Which is better than always losing right?
Now that we have completed our feature, we are going to merge the result with our master copy. For this we first have to make sure that we committed and pushed everything (git push is only for those which use github).
git add .
git commit -m "Lesson 6 - full code"
git push
Now we will checkout to master, and in master we are going to add the branch lesson-6
(again the push is only for those which use github).
git checkout master
git merge master lesson-6
git push
Now if you are done with the branch lesson-6
, you can remove it using
git branch -d lesson-6
The first
push
was for the branchlesson-6
, the second push is for the branchmaster
. This means that both branches can be updated separately. When both branches have changes in the exact same location, you have created a merge conflict. If you have trouble resolving that please contact the education committee.
In this lesson we have compared our bot against both PlayerAgent
, a bot that does nothing, and SimpleAgent
, our old archenemy. In the beginning it was no problem to beat SimpleAgent
most of the time, but we would almost always get a Tie
againt PlayerAgent
.
After the new improvements we were able to always beat PlayerAgent
, but we were no longer able to beat SimpleAgent
as easily as we did before. This kind of tradeoff is very common in game bots, where an improvement against one opponent weakens your results against another.