Day 12 problem projects us the world of graphs. TensorFlow can be used to work on graphs pretty easily since a graph can be represented as an adjacency matrix, and thus, we can have a tf.Tensor containing our graph. However, the “natural” way of exploring a graph is using recursion, and as we’ll see in this article, this prevents us to solve the problem using a pure TensorFlow program, but we have to work only in eager mode.
Day 12: Passage Pathing
You can click on the title above to read the full text of the puzzle. The TLDR version is:
Our dataset is a list of connections with this format:
start-A
start-b
A-c
A-b
b-d
A-end
b-end
where start
and end
are your start and end points, the upper case letters (e.g A
) are the “big caves”, and the lower case letters (e.g. c
) are the “small caves”.
The above cave system looks roughly like this:
start
/ \
c--A-----b--d
\ /
end
The puzzle asks to find the number of distinct paths that start at start
, and end at end
, knowing that you can visit the “big caves” multiple times, and the small caves only once (per path).
Given these rules, there are 10 paths through this example cave system (graph)
start,A,b,A,c,A,end
start,A,b,A,end
start,A,b,end
start,A,c,A,b,A,end
start,A,c,A,b,end
start,A,c,A,end
start,A,end
start,b,A,c,A,end
start,b,A,end
start,b,end
Design phase: part one
The problem asks us to:
- Model the relationships between the nodes
- Finding the paths from
start
toend
that follow the two rules about the big/small caves
Modeling the relationships between nodes is something we can achieve by mapping the neighboring relationship among nodes using an adjacency matrix. In practice, we can assign an unique index to a node (e.g. “start” = 0, “end” = 1, “A” = 2, …), and each index identifies a row/column in the adjacency matrix. Whenever a node is connected to another, we put a “1” in the correct location.
For example, a possible adjacency matrix for the example graph is
\[\begin{pmatrix} 0&1&1&0&0&0\\ 1&0&1&1&0&1\\ 1&1&0&0&1&1\\ 0&1&0&0&0&0\\ 0&0&1&0&0&0\\ 0&1&1&0&0&0\\ \end{pmatrix}\]As it can be easily seen the adjacency matrix for a graph without self connections is a hollow matrix that means that all the elements along the diagonal are equal to zero. Moreover, the adjacency matrix is also symmetric (rows and columns can be transposed and the matrix doesn’t change - and it makes sense since if A
is connected with B
also B
is connected with A
).
Thus, we first need to create the mapping between the “human-readable” nodes and the IDs, then characterize these IDs. In fact, we have some rules to follow. All the upper case nodes (and thus the corresponding numeric IDs) can be visited only once, while the other can be visited multiple times.
Moreover, the start
and end
nodes are special, since if we reach the end
the path is complete and we can exit from our search algorithm, while the start
node shouldn’t be visited more than once (at the beginning and then ignored).
Input pipeline
We create a tf.data.Dataset
object for reading the text file line-by-line as usual. We split every line looking at the -
separator, and then create the relationships between the human-readable nodes and the indexes of the adjacency matrix
connections = tf.data.TextLineDataset("input").map(
lambda string: tf.strings.split(string, "-")
)
# Create a map between human-readable node names and numeric indices
human_to_id = tf.lookup.experimental.MutableHashTable(tf.string, tf.int64, -1)
id_to_human = tf.lookup.experimental.MutableHashTable(tf.int64, tf.string, "")
human_to_id
is the hashtable we use to create the mapping between the human-readable node name and a numeric ID. tf.lookup.experimental.MutableHashTable
returns the third parameter (-1) when the lookup fails, and thus we can use this feature to check if we already created the mapping for a node.
Adjacency matrix
Since we want to create an adjacency matrix, we need to keep track of the adjacency relations between nodes (the indices
we use for setting the 1
in the correct positions of the adjacency matrix), we use a tf.TensorArray
for doing it.
idx = tf.Variable(0, dtype=tf.int64)
indices = tf.TensorArray(tf.int64, size=0, dynamic_size=True)
for edge in connections:
node_i = human_to_id.lookup(edge[0])
node_j = human_to_id.lookup(edge[1])
if tf.equal(node_i, -1):
human_to_id.insert([edge[0]], [idx])
id_to_human.insert([idx], [edge[0]])
node_i = tf.identity(idx)
idx.assign_add(1)
if tf.equal(node_j, -1):
human_to_id.insert([edge[1]], [idx])
id_to_human.insert([idx], [edge[1]])
node_j = tf.identity(idx)
idx.assign_add(1)
ij = tf.convert_to_tensor([node_i, node_j])
indices = indices.write(indices.size(), ij)
We loop over all the connections, check if the node has already been mapped and, if not, increase the counter (idx
) and then create the mapping. Then, we create the relationship between the i
and j
nodes and save it inside the tf.TensorArray
.
At the end of this loop, the indices
variable contains all the pairs of coordinates where a connection is, while idx
contains the ID of the latest mapped item, which also is the width (height) of the matrix.
We should also note that we created the relationship from i
to j
, hence we created an upper triangular matrix (a matrix whose elements below the main diagonals are zero).
For modeling the full relationship i
to j
and j
to i
we can just transpose the matrix and sum it with itself.
indices = indices.stack()
indices = tf.reshape(indices, (-1, 2))
A = tf.tensor_scatter_nd_update(
tf.zeros((idx, idx), dtype=tf.int64),
indices,
tf.repeat(tf.cast(1, tf.int64), tf.shape(indices)[0]),
)
A = A + tf.transpose(A)
Here we go! A
is the adjacency matrix that completely models the graph.
The only thing missing is the modeling of the constraints. We need a list-like object (a tf.Tensor
) containing all the IDs of the nodes that we can visit once, and the difference between this set of IDs and the full list of IDs for getting the list of the IDs we can visit multiple times.
keys = human_to_id.export()[0]
visit_only_once_human = tf.gather(
keys,
tf.where(
tf.equal(
tf.range(tf.shape(keys)[0]),
tf.cast(tf.strings.regex_full_match(keys, "[a-z]+?"), tf.int32)
* tf.range(tf.shape(keys)[0]),
)
),
)
visit_only_once_human = tf.squeeze(visit_only_once_human)
visit_only_once_id = human_to_id.lookup(visit_only_once_human)
# Visit multiple times = {keys} - {only once}
visit_multiple_times_human = tf.sparse.to_dense(
tf.sets.difference(
tf.reshape(keys, (1, -1)), tf.reshape(visit_only_once_human, (1, -1))
)
)
visit_multiple_times_human = tf.squeeze(visit_multiple_times_human)
visit_multiple_times_id = human_to_id.lookup(visit_multiple_times_human)
It’s interesting how we used regular expressions in TensorFlow with the tf.strings.regex_full_match
function for searching for nodes all in lower case ([a-z]+?
- visit only once).
Moreover, we used the set difference together with the conversion from sparse to dense to get a tf.Tensor
containing all the upper-case nodes (visit multiple times).
Visiting the graph
The natural way of facing this problem is by using recursion. In fact, the problem can be modeled as follows
Concatenate the input
node
to thecurrent_path
.Is the current node the
end
node? If yes, we found the path. Increment by 1 the counter of the paths. Return.Otherwise, find the neighborhood of the current node.
For every neighbor verify if the neighbor is in the “visit only once” list and it’s in the current path. If yes, skip this neighbor.
Otherwise, visit the neighbor (recursive step).
Being the adjacency matrix symmetric, the step of “find the neighborhood of the current node” can be simply modeled as a search for the 1
in the row whose id is node_id
.
@tf.function
def _neigh_ids(A, node_id):
return tf.squeeze(tf.where(tf.equal(A[node_id, :], 1)))
Since the proposed solution uses the recursion we cannot decorate the visit
function with @tf.function
because tf.function does not support recursion. Thus, we are forced to solve this problem in eager mode.
Going back to the problem, the puzzle asks us to count the number of paths hence we need a tf.Variable
for counting the number of paths found. Moreover, instead of repeating on every iteration a lookup in the hastable, we can extract the IDs of the start
and end
nodes.
start_id, end_id = human_to_id.lookup(["start", "end"])
count = tf.Variable(0, dtype=tf.int64)
The visit function can be defined by implementing precisely what has been proposed in the algorithm described above.
def _visit(A: tf.Tensor, node_id: tf.Tensor, path: tf.Tensor):
current_path = tf.concat([path, [node_id]], axis=0)
if tf.equal(node_id, end_id):
count.assign_add(1)
return current_path
neighs = _neigh_ids(A, node_id)
neigh_shape = tf.shape(neighs)
if tf.equal(tf.size(neighs), 0):
return current_path
if tf.equal(tf.size(neigh_shape), 0):
neighs = tf.expand_dims(neighs, 0)
neigh_shape = tf.shape(neighs)
for idx in tf.range(neigh_shape[0]):
neigh_id = neighs[idx]
if tf.logical_and(
tf.reduce_any(tf.equal(neigh_id, visit_only_once_id)),
tf.reduce_any(tf.equal(neigh_id, current_path)),
):
continue
# Recursion step
_visit(A, neigh_id, current_path)
return current_path
Execution
To call the _visit
function we need:
A
the adjacency matrixnode_id
a node for starting the visitpath
the initial path (atf.Tensor
containing only the ID of thestart
node).
# All the paths starts from start
neighs = _neigh_ids(A, start_id)
for idx in tf.range(tf.shape(neighs)[0]):
neigh_id = neighs[idx]
_visit(A, neigh_id, [start_id])
tf.print("Part one: ", count)
Here we go! Part one is solved! Let’s see what part two is about.
Design phase: part 2
The puzzle relaxes a constraint: in a path, we can pass twice for a single small cave. For example start,A,b,A,b,A,c,A,end
is a valid path because we visited b
twice and c
once.
Given this relaxed constraint, how many paths through the graph are there?
Part two implementation
It is possible to simply add the new constraint to the previous algorithm. We can define an inner_count
tf.Variable
for counting the number of times small cave in the current path has been visited. When this counter is greater or equal to 2
then the path we are following is invalid and we can return it.
count.assign(0)
inner_count = tf.Variable(0)
def _visit2(A: tf.Tensor, node_id: tf.Tensor, path: tf.Tensor):
current_path = tf.concat([path, [node_id]], axis=0)
# Skip start
if tf.equal(node_id, start_id):
return current_path
# Success on end node
if tf.equal(node_id, end_id):
# paths.append(current_path)
count.assign_add(1)
return current_path
# More than 2 lowercase visited twice
visited, visited_idx, visited_count = tf.unique_with_counts(current_path)
visited = tf.gather_nd(visited, tf.where(tf.greater(visited_count, 1)))
inner_count.assign(0)
for idx in tf.range(tf.shape(visited)[0]):
if tf.reduce_any(tf.equal(visited[idx], visit_only_once_id)):
inner_count.assign_add(1)
if tf.greater_equal(inner_count, 2):
return current_path
neighs = _neigh_ids(A, node_id)
neigh_shape = tf.shape(neighs)
if tf.equal(tf.size(neighs), 0):
return current_path
if tf.equal(tf.size(neigh_shape), 0):
neighs = tf.expand_dims(neighs, 0)
neigh_shape = tf.shape(neighs)
for idx in tf.range(neigh_shape[0]):
neigh_id = neighs[idx]
# already visited twice and is lowcase
if tf.logical_and(
tf.reduce_any(tf.equal(neigh_id, visit_only_once_id)),
tf.greater(
tf.reduce_sum(tf.cast(tf.equal(neigh_id, current_path), tf.int32)),
1,
),
):
continue
_visit2(A, neigh_id, current_path)
return current_path
The execution is performed in the very same way:
neighs = _neigh_ids(A, start_id)
for idx in tf.range(tf.shape(neighs)[0]):
neigh_id = neighs[idx]
_visit2(A, neigh_id, [start_id])
tf.print("Part two: ", count)
It takes a lot of time since we are effectively doing the graph visit, even if the puzzle asked us to just count the number of paths (without modeling them!). Hence for sure, there exists a better solution, very similar to the solution I found to the Day 6 puzzle part two, but for this time the slow solution will be good enough. In fact, after about 20 minutes of computations the output produced is corrected :)
Conclusion
You can see the complete solution in folder 12
on the dedicated Github repository: https://github.com/galeone/tf-aoc.
Solving this problem in TensorFlow demonstrated how the adjacency matrix representation is really handy while working with graphs, but also demonstrated that is impossible to write recursive pure-TensorFlow programs (with `tf.function).
If you missed the articles about the previous days’ solutions, here’s a handy list:
I already solved also part 1 of day 13, but I don’t know if I have the time for solving the puzzles and writing the articles (holidays have reached the end some week ago :D).
So maybe, for the 2021 Advent of Code in pure TensorFlow is all.
For any feedback or comment, please use the Disqus form below - thanks!