In case you have any remarks or questions they are always welcome at either the education commissie, via the slack channel ec-helpme, or at our e-mail address education@serpentineai.nl.
In this lesson we are going to talk about the following things:
The GitHub links for this lesson are: Browse, Zip, Diff.
The links are currently only available for members.
In the previous lessons we have started to add more and more code and now we may have noticed that some things are not optimally written. For example checking for the different directions in the BFS are 4 different if statements. Also we have seen that there is a better way to represent a movement in a grid then using row-1, col
, namely by converting the direction to an array, for example the np.array([-1, 0])
for going up.
It is common and also a good practice to update your code over time if you have a better idea of what functionalities you need or if your single function/method becomes very large (a guideline is more than 15 lines of code). This process is called refactoring
and is usually a good thing to do from time to time. So let's start cleaning up and applying the Single Responsibility Principle our code.
This is directly taken from Wikipedia, Code refactoring, and especially the last paragraph is very important.
In computer programming and software design, code refactoring is the process of restructuring existing computer code—changing the factoring—without changing its external behavior. Refactoring is intended to improve the design, structure, and/or implementation of the software (its non-functional attributes), while preserving its functionality. Potential advantages of refactoring may include improved code readability and reduced complexity; these can improve the source code's maintainability and create a simpler, cleaner, or more expressive internal architecture or object model to improve extensibility. Another potential goal for refactoring is improved performance; software engineers face an ongoing challenge to write programs that perform faster or use less memory.
Typically, refactoring applies a series of standardised basic micro-refactorings, each of which is (usually) a tiny change in a computer program's source code that either preserves the behaviour of the software, or at least does not modify its conformance to functional requirements. Many development environments provide automated support for performing the mechanical aspects of these basic refactorings. If done well, code refactoring may help software developers discover and fix hidden or dormant bugs or vulnerabilities in the system by simplifying the underlying logic and eliminating unnecessary levels of complexity. If done poorly, it may fail the requirement that external functionality not be changed, introduce new bugs, or both.
By continuously improving the design of code, we make it easier and easier to work with. This is in sharp contrast to what typically happens: little refactoring and a great deal of attention paid to expediently adding new features. If you get into the hygienic habit of refactoring continuously, you'll find that it is easier to extend and maintain code.
@dataclass
From the above we notice, that the directions can probably be written in a more compact and simpler way, so how are we going to do that? In the previous lesson we saw how an array could be used to represent a direction, and together by storing it in a dictionary it was easier to go over all directions, recall dict(left=np.array([0, -1]), ...)
.
Now if we could also add the Action
to the dictionary, we have all information in one place. This means that we only need to look at one place for directional information. We could create a dict
in a dict
:
directions = dict(left=dict(array=np.array([0, -1]), action=Action.Left),
right=dict(array=np.array([0, 1]), action=Action.Right),
etc...)
This will work, but makes accessing an element a bit harder, we would have to type directions['left']['action']
, to get Action.Left
. A nicer way would be to access it using directions.left.action
.
There are 2 big advantages when we access the value using dots. Firstly we do not have to type the quotes for strings. But more importantly, while coding, Pycharm can give you hints on what attributes (methods) you can access from a variable. Try pressing CTRL + SPACE while putting the cursor after to_visit.
(note the dot .
in there), this will show a list of possible methods. So if we are unsure what directions.left
entails, we can let the IDE (e.g. Pycharm) tell us that options are array
and action
.
Now how are we going to make that pattern of accesing information by using a dot?
Tip: think about how we access the methods and variables inside
MyAgent
.
If you are guessing that we can use a class
for storing the directional information you are on the right track. Since the values array
and action
have to be initialized we can use a dataclass. This data structure is similar to a class, but has already some basic functionality for initialization and printing.
Now is it possible to add multiple classes (and dataclasses) in a single file, but generally it is more acceptable to split them out over multiple files if the classes are very different to each other. In our case MyAgent
and Directions
(the new dataclass) will be very different in functionality and it would be better to split them out over two files. Therefore create a new file called directions.py
in the serpentine
folder.
To create the dataclass we need to import the dataclass, therefore add this line: from dataclasses import dataclass
just below the numpy import. Now in order to create a Direction
dataclass, we need to know what we want to store, what are the names and types of what we need to store per direction?
When the names and types are known, try and finish the dataclass Direction
. To help you out, the following example is given in the dataclass documentation:
from dataclasses import dataclass
from typing import Any
@dataclass
class WithoutExplicitTypes:
name: Any
value: Any = 42
name: str
array: np.array
action: Action
It is convenient to also store the name in case we want to verify that we have the right direction, if we are iterating over all kinds of directions. It is definitly easier than looking at the numpy array and slightly easier than looking at the action. It is more out of convience for Human interpretability and debugging than a necessity.
What is meant by looking at the name of an action can be retrieved by using .name
, therefore Action.Right.name
will return the value Right
.
@dataclass
class Direction:
name: str
array: np.array
action: Action
With this we have all the information of a single direction stored into one class. Now if we want to make a variable that stores all the information of the left direction we can do the following:
LEFT = Direction(name="left", array=np.array([0, -1]), action=Action.Left)`.
The name is capatilized to indicate that the value is not changing, hence the value is a constant. There are alot of conventions about best code practices, and for Python PEP8 is used. For now it is not important to know all of them, and they will become clear over time.
If you are really wondering why there is an @dataclass
name above the class, the answer is due to that everything in python is an object, so we can create an object that takes an object as input. This is called a decorator, if you do not understand it yet, don't worry, it is just extra information.
Now that we are able to create a single direction we are going to store all possible directions, in a new structure. Since this is going to hold only constant values a normal class
is enough.
There are 4 direction, (left
, right
, up
, down
), but there are 5 actions (also stop
). It would be convention to also have a direction for the stop
action, namely zero
. Create a second class called Directions
(note the extra 's') in which we store all the directions, including the zero
.
class Directions:
ZERO = Direction(name="zero", array=np.array([0, 0]), action=Action.Stop)
LEFT = Direction(name="left", array=np.array([0, -1]), action=Action.Left)
RIGHT = Direction(name="right", array=np.array([0, 1]), action=Action.Right)
UP = Direction(name="up", array=np.array([-1, 0]), action=Action.Up)
DOWN = Direction(name="down", array=np.array([1, 0]), action=Action.Down)
By adding nothing to the rows and columns, we can indicate a zero
direction or the stop
action. Therefore we have np.array([0, 0])
.
To finish it off we will add one more line to Directions
, namely NEIGHBOURS = (LEFT, RIGHT, UP, DOWN)
. This is going to be used when we want to iterate over all the directions, in other words go over all neighbours LEFT, RIGHT, UP and DOWN one by one. This enables us to write the code in a similar way to what we did with the dictionary, namely:
for direction in Directions.NEIGHBOURS:
...
We use ( )
, instead of [ ]
to make the values immutable (unchangeable), the difference is explained in more depth here. It is common for constants to be made immutable, and for changing variables, such as our queue
, to be made muttable (changeable).
directions.py
import numpy as np
from dataclasses import dataclass
from pommerman.constants import Action
@dataclass
class Direction:
name: str
array: np.array
action: Action
class Directions:
ZERO = Direction(name="zero", array=np.array([0, 0]), action=Action.Stop)
LEFT = Direction(name="left", array=np.array([0, -1]), action=Action.Left)
RIGHT = Direction(name="right", array=np.array([0, 1]), action=Action.Right)
UP = Direction(name="up", array=np.array([-1, 0]), action=Action.Up)
DOWN = Direction(name="down", array=np.array([1, 0]), action=Action.Down)
NEIGHBOURS = (LEFT, RIGHT, UP, DOWN)
With the new Directions class we might be able to clean up the checking of passable fields. For this we will first import
the Directions
class by adding the following line below the pommerman
import.
from serpentine.directions import Direction, Directions
Now to combine all the different check methods we have to find the pattern that links the different methods check_left
, check_right
, check_up
and check_down
. To answer this question it can help to think about what the input arguments would be, if this would have been just one method check_direction
.
When you know the answer, try to implement a single method called check_direction
, with the proper type hints and docstring.
Since we started with 4 different methods that each got a specific directions, we now have to give the directions along to this one method. If we call our function check_direction
, the following line would summarize the function inputs
def check_direction(self, board: np.array, location: tuple, direction: Direction) -> bool:
pass
Direction
object with an array
.can_move_to
?if
statement. This means that the statement in an if
is already a boolean expression. Do we really need the return True
and return False
? def in_bounds(self, location: tuple) -> bool:
""" Checks if a location is on the board, if it is returns True. """
return 0 <= min(location) and max(location) <= 7
def check_direction(self, board: np.array, location: tuple, direction: Direction) -> bool:
"""
Checks for a given location and direction if the new location is a Passage.
:param board: The game board
:param location: The location from which you start to move
:param direction: The direction in which you are going to move (This is the Direction class)
:return: True if the new location is a passage, False otherwise.
"""
new_location = np.array(location) + direction.array
if not self.in_bounds(new_location):
return False
# Note that this is already a boolean (so no need for if statements)
return board[tuple(new_location)] == Item.Passage.value
if
and the return True
, return False
statement. Since the condition in the if
statement is already a boolean. Therefore we can return it directly.So now went from 4 different methods to one method with an extra argument, and a helper method. This is greatly reducing the lines of code we need and comes at a small cost in complexity, since we need to give in an extra argument. The trade-off point is hard to determine: when should I keep combining functions and when should I be making my functions smaller? The general rules of thumb are summarized by Pythoneer Tim Peters in The Zen of Python. (More information about line 14 of The Zen of Python can be found here.)
Now back to our code, at the beginning it is fine to have to different codes standing next to each other, but in the end we only want to use the better and cleaner code. So refactor our code by improving the Act
and the BFS (can_move_to
) methods. This will allow you to remove all check
methods except check_direction
.
By using the new Directions
class, the 4 if
statements can be replaced with a loop over Directions.ALL
.
for direction in Directions.NEIGHBOURS:
if self.check_direction(board, my_location, direction):
self.queue.append(direction.action)
We can test by commenting out the self.can_move_to
, by putting a #
in front of it, or when using PyCharm, the shortcut CTRL + /.
Now because of the order of NEIGHBOURS
, the agent will walk in a different pattern than a square. It will first go left and right and then up and down, if possible.
def can_move_to(self, board: np.array, my_location: tuple, goal_location: tuple) -> bool:
""" BFS to a goal location. Returns True if it can be reached."""
to_visit = [my_location]
visited = []
while to_visit:
point = to_visit.pop(0)
for direction in Directions.NEIGHBOURS:
# By making it a numpy array we can add the values
new_point = tuple(np.array(point) + direction.array)
# Check if we have found the location
if new_point == goal_location:
return True
# We have already seen that point
if new_point in visited:
continue
# If we can reach this point add the point to the to visit list
if self.check_direction_passable(board, point, direction):
to_visit.append(new_point)
visited.append(point)
return False
Thinking back about the code that we started with, is this still the way you would have coded it now?
It is important to think a bit ahead of what you want to do with your code and also to sometimes clean up your code in the process. This looking back at your code and changing it to simplify it or reusing certain parts is called refactoring, and is a very common thing to do over time.
Some times it is easy to spot when you can reuse your code, but other times it can be quite hard. There are some specific code smells that are known to be bad, but mostly by using your common sense you can spot them yourself. Another thing to help you write good understandable code is by using a common way of writing the code and comments. Think for example about the places a comment can be placed, above the method, below the method or even inline. The coding standard we are using in our tutorial (and usually the rest of Serpentine as well) is PEP8, which is widely used in Python.
Now feel free to take a break and a walk outside before continuing with the next part.
Now that we have cleaned up our code it is time to add some new code (woohoo). The BFS was only checking if we could reach a point. Now we also have to tell our agent, how it can actually go there. For this we have to update the BFS with parent information. This means that during the BFS we need to know which node was the parent of a specific other node.
For updating the BFS with parent information we have a few options:
Node
to store all the information we need;came_from
and came_from_direction
for all nodes.Any of the above implementations will work, and that is an important aspect in coding. Often there are several ways to do a single task, but some solutions use less memory, other less time and even other are much easier to implement. For example the last solution is relative easy to implement without having to add new classes or mainting tuple indices in the BFS.
For this tutorial we will focus on the 3th solution, by using the two extra dictionaries. For this you first have to initialize the two dictionaries came_from
and came_from_direction
. Then fill the dictionary with their relevant information:
came_from
: should contain as key a location (tuple
), and as value the parent location (tuple
). Except for the starting location, which will have the parent None
.came_from_direction
: should contain as key a location (tuple
) and as value the direction taken to get there.This means that both dictionaries together can provide you with a path. The first one tells you the parent node and the second one tells you the direction you had to take from the parent node.
Implement the dictionaries and their updating, to add a new value to a dictionary you can use:
came_from = dict()
came_from[key] = value
P.S. In case you want to implement a different solution or have questions about them, you can always aks the the education committee for help, either via slack or email.
def can_move_to(self, board: np.array, my_location: tuple, goal_location: tuple) -> bool:
""" BFS to a goal location. Returns True if it can be reached."""
to_visit = [my_location]
visited = []
came_from = dict()
came_from[my_location] = None
came_from_direction = dict()
came_from_direction[my_location] = Directions.ZERO
while to_visit:
point = to_visit.pop(0)
for direction in Directions.NEIGHBOURS:
# By making it a numpy array we can add the values
new_point = tuple(np.array(point) + direction.array)
# Either the row or column value is not on the board.
if not self.in_bounds(new_point):
continue
# We have already seen that point
if new_point in visited:
continue
if point == goal_location:
return True
# If we can reach this point add the point to the to visit list
if self.check_direction(board, point, direction):
to_visit.append(new_point)
came_from[point] = new_point
came_from_direction[point] = direction
visited.append(point)
return False
In the first part we created the dictionaries, now we have to create a path from the dictionaries. This will be a list similar to self.queue
, that tells us which Direction
we have to take, starting from our current position and leading to the goal position. For this it is handy to create a new method reverse_path
.
This will take as input came_from
, came_from_directions
and the goal_location
, and will return a list
of Direction
, which tells us the directions we have to follow to go from start to the end position. We can deduce the start location from came_from
, because the parent of the start location will be None
.
To create the reverse_path
method, some extra knowledge might be handy about how to handle dictionary access and list reversals. Know if you like a challenge see if you can implement the method, the tabs can be used for helping you out and are setup as follow:
which we will let you build .
When you have a list you can reverse the order quickly in two ways:
my_list = [1, 2, 3]
# Method 1
my_list.reverse() # now contains [3, 2, 1]
# Method 2
my_list = list(reversed(my_list)) # now contains [3, 2, 1]
None
if you try to assing it, in any case (my_list = my_list.reverse()
will return assign None
instead of [3, 2, 1]
to the variable my_list
).my_list =
, my_list
will still be in the order [3, 2, 1]
.If you try to access a key that is not located in dactionary you will get a KeyError
. This can be prevented by checking if a key already exists in the dictionary by using if key in my_dict
. But sometimes you want to return a default value instead, this can be done using my_dict.get(key, None)
, where None
is the default value that will be returned if the key does not exists.
The steps are general steps for creating any method/function:
came_from
, came_from_directions
and goal_location
.list[Direction]
.came_from
contains nodes that when entered in came_from_directions
return the direction you have to take to get there.if
, for
, while
, reverse
.
for
or while
.list
instead of list[Direction]
.)Tip: when testing you can use the debug
functionality of Pycharm, check where is the problem for some debuggin examples.
self
.list
is starting from the goal and working back to the start, since we know the came_from
parent value will be None
.goal_location
and None
, we will need a while
loop that follows the parents of nodes, starting at goal_location
.def reverse_path(self, came_from: dict, came_from_direction: dict, goal_location: tuple) -> list:
""" Returns a list of directions from a start node with the parent None, leading to the goal node. """
current = goal_location
parent = came_from.get(goal_location, None)
path = []
while parent is not None:
path.append(came_from_direction[current])
current, parent = parent, came_from.get(parent, None)
return list(reversed(path))
Extra:
The last thing we will have to do is apply reverse_path
, therefore we have to replace both return
statements with a proper path list. In order to make that easier we can adjust our code to only have 1 return statement. This can be done by moving up if point == goal_location
to be before the for
loop and use break
instead of return
.
By using break
we stop the while
loop and go directly to the return False
statement. This False
can then be replaced by the reverse_path
. The last two things that we will have to do are:
can_move_to
can_move_to
to the more descriptive name create_path
. Renaming in pycharm can be done by clicking on the name and pressing SHIFT + F6. def create_path(self, board: np.array, my_location: tuple, goal_location: tuple) -> list:
""" BFS to a goal location. Returns True if it can be reached."""
to_visit = [my_location]
visited = []
came_from = dict()
came_from[my_location] = None
came_from_direction = dict()
came_from_direction[my_location] = Directions.ZERO
while to_visit:
point = to_visit.pop(0)
if point == goal_location: break
for direction in Directions.NEIGHBOURS:
# By making it a numpy array we can add the values
new_point = tuple(np.array(point) + direction.array)
# Either the row or column value is not on the board.
if not self.in_bounds(new_point):
continue
# We have already seen that point
if new_point in visited:
continue
# If we can reach this point add the point to the to visit list
if self.check_direction(board, point, direction):
to_visit.append(new_point)
came_from[new_point] = point
came_from_direction[new_point] = direction
visited.append(point)
return self.reverse_path(came_from, came_from_direction, goal_location)
BEfore we can run and see our code we have to change the act
method. For this we have to replace the if
statement with the following for
loop:
for direction in self.create_path(board, my_location, goal_location):
self.queue.append(direction.action)
When the return value from create_path
is empty, in any case []
. The for
loop won't run and the code will happily go on. Test it out and play a bit with it, try and see what happens when you:
for direction in Directions.NEIGHBOURS
for direction in Directions.NEIGHBOURS
goal_location
to a different point, and check if it was reachable or not. def act(self, obs, action_space):
# Main event that is being called on every turn.
if not self.queue:
my_location = obs['position']
board = obs['board']
goal_location = (2, 2)
for direction in self.create_path(board, my_location, goal_location):
self.queue.append(direction.action)
# for direction in Directions.NEIGHBOURS:
# if self.check_direction(board, my_location, direction):
# self.queue.append(direction.action)
# If we cannot move in any direction, send a pass
if not self.queue:
self.queue.append(Action.Stop)
return self.queue.pop(0)
You know the drill ...
git add *
git commit -m "finished Pommerman lesson 4"
git push
(If you're still wondering why you do this, contact the Education Committee via the slack channel #ec-helpme.)
In this lesson we have done some (a lot of) cleaning up of our code, refactoring, and have seen how we can use a dataclass to store all kinds of information about the same object. The cleaning resulted in changing 4 mehods into a single method for the checking of movements and changing 4 if
stamentents to one single for
loop.
In the last part of the lesson we got the directional information for our bot from the BFS (can_move_to
), so we can use it to get to our goal. If we run it we will see that it moves to our goal location (2, 2) if it is possible. In our next lesson we will start to place (and avoid) bombs.