Advent of Code 2021 in pure TensorFlow - day 9


The day 9 challenge can be seen as a computer vision problem. TensorFlow contains some computer vision utilities that we’ll use - like the image gradient - but it’s not a complete framework for computer vision (like OpenCV). Anyway, the framework offers primitive data types like tf.TensorArray and tf.queue that we can use for implementing a flood-fill algorithm in pure TensorFlow and solve the problem.

Day 9: Smoke Basin

You can click on the title above to read the full text of the puzzle. The TLDR version is:

Our dataset is a heightmap where each number corresponds to the height of a particular location. 9 is the maximum value, 0 is the lowest.

2199943210
3987894921
9856789892
8767896789
9899965678

Our first goal is to find the low points - the locations that are lower than any of its adjacent locations. Most locations have four adjacent locations (up, down, left, and right); locations on the edge or corner of the map have three or two adjacent locations, respectively. (Diagonal locations do not count as adjacent.).

Once the low points have been found, we must compute the sum of the risk levels. The risk level for a low point is defined as the sum of 1 plus its height.

In the example above, we have 4 low points (highlighted) and the sum of their risk levels is 15.

Design phase: part one

The heightmap can be seen as a 2D image. Every height is a pixel, and every pixel has is 4-neighborhood. We consider only the 4-neighborhood because we are asked to only look at horizontal and vertical neighbors and not to the diagonal ones.

We can loop over the image using a sliding window approach, centering a 3x3 kernel on every location and for each location searching for the lowest point.

Finding the low points requires looking at every pixel, extracting its 4-neighbors, and finding the minimum value among the 5 pixels (the neighbors and the pixel itself). For dealing with border pixels, we can just pad the input image with the correct amount of 9s (hence, point with maximum value).

Input pipeline

We create a tf.data.Dataset object for reading the text file line-by-line as usual.

dataset = (
    tf.data.TextLineDataset("input")
    .map(tf.strings.bytes_split)
    .map(lambda string: tf.strings.to_number(string, out_type=tf.int32))
)

Since we consider the whole dataset as a single image, we can convert it as a 2D tensor. We’ll do it directly in the constructor of the Finder object we are going to create for solving the problem. Moreover, since we are interested in the padded input image, we can also compute the padding amount and pad the image using tf.pad.

class Finder(tf.Module):
    def __init__(self, dataset: tf.data.Dataset):

        super().__init__()

        self._dataset = dataset
        self._image = tf.convert_to_tensor(list(self._dataset))
        self._shape = tf.shape(self._image)
        self._max_value = tf.reduce_max(self._image)

        if tf.not_equal(tf.math.mod(self._shape[0], 3), 0):
            pad_h = self._shape[0] - (self._shape[0] // 3 + 3)
        else:
            pad_h = 0
        if tf.not_equal(tf.math.mod(self._shape[1], 3), 0):
            pad_w = self._shape[1] - (self._shape[1] // 3 + 3)
        else:
            pad_w = 0

        self._padded_image = tf.pad(
            self._image,
            [[0, pad_w], [1, pad_h]],
            mode="CONSTANT",
            constant_values=self._max_value,
        )

        self._padded_shape = tf.shape(self._padded_image)

Finding the 4-neighbors

We’ll loop over the image in a sliding window fashion, and for every pixel, we’ll search for the 4-neighborhood. Hence, we can define a function that given an input grid (2D Tensor), and the coordinates of pixel (in TensorFlow format, hence (y,x)) it returns both the neighborhood values and the neighborhood coordinates.

@tf.function
def _four_neigh(
    self, grid: tf.Tensor, center: tf.Tensor
) -> Tuple[tf.Tensor, tf.Tensor]:
    neigh_mask = tf.constant([(-1, 0), (0, -1), (1, 0), (0, 1)])
    y, x = center[0], center[1]

    if tf.logical_and(tf.less(y, 1), tf.less(x, 1)):
        mask = neigh_mask[2:]
    elif tf.less(y, 1):
        mask = neigh_mask[1:]
    elif tf.less(x, 1):
        mask = tf.concat([[neigh_mask[0]], neigh_mask[2:]], axis=0)
    else:
        mask = neigh_mask

    coords = center + mask

    neighborhood = tf.gather_nd(grid, coords)
    return neighborhood, coords

The function just avoids looking in non-existent coordinates and gathers all the required values with tf.gather_nd.

We are now ready to loop over the padded image, center our “virtual” kernel (because we are not really defining a 3x3 kernel) on every pixel, find the neighborhood, and search for the lowest points.

Finding the lowest points

Since we are creating a pure TensorFlow program, the variables must be defined outside the methods decorated with tf.function. Thus, since we are interested in the sum of the risk level, we’ll add in the init a variable for this sum (count). We’ll use a tf.TensorArray to save the coordinate of every lowest point found and we return them as well.

def __init__(self, dataset: tf.data.Dataset):
    # [...]
    self._count = tf.Variable(0)

@tf.function
def low_points(self) -> Tuple[tf.Tensor, tf.Tensor]:
    self._count.assign(0)
    ta = tf.TensorArray(tf.int32, size=0, dynamic_size=True)

    for y in tf.range(self._padded_shape[0] - 1):
        for x in tf.range(self._padded_shape[1] - 1):
            center = tf.convert_to_tensor([y, x])
            neighborhood, _ = self._four_neigh(self._padded_image, center)
            extended_neighborhood = tf.concat(
                [tf.expand_dims(self._padded_image[y, x], axis=0), neighborhood],
                axis=0,
            )

            minval = tf.reduce_min(extended_neighborhood)
            if tf.logical_and(
                tf.reduce_any(tf.not_equal(extended_neighborhood, minval)),
                tf.equal(minval, self._padded_image[y, x]),
            ):
                self._count.assign_add(1 + self._padded_image[y, x])

                ta = ta.write(ta.size(), center)

    return ta.stack(), self._count

Execution

Here we go!

finder = Finder(dataset)
tf.print("Part one: ", finder.low_points()[1])

Part one is easily solved. Let’s see what part two is about.

Design phase: part 2

Part 2 asks us to find the 3 largest basins and multiply their size together. A basin is all locations that eventually flow downward to a single low point. Therefore, every low point has a basin. The size of a basin is the number of locations within the basin.

In the previous example there are 4 basins. Here’s the middle basin, of size 14:

2199943210
3987894921
9856789892
8767896789
9899965678

and here’s also the top-right basin of size 9:

2199943210
3987894921
9856789892
8767896789
9899965678

How can we find the basins? For sure we know that every low point is inside a basin, we can consider every low point a seed for a flood fill algorithm. But how can we find the basins thresholds? All we know is that every location with 9 is a natural threshold and that a basin is a flow of decreasing numbers around a low point.

Every computer vision practitioner immediately thinks to the image gradient when talking about changes in the color intensity. In particular, we can compute the horizontal and vertical gradients and compute the gradient magnitude for extracting the information about the rate of change in every pixel location.

For example, the gradient magnitude of the example image is:


7 0 16 1 2 6 4 0 6 0
6 12 2 4 0 0 6 10 8 6
0 2 4 2 2 2 2 4 0 8 1
1 0 0 4 3 2 6 0 0 0 1
0 1 2 0 0 3 2 5 4 3 2

The higher the magnitude the higher the change along the x and y directions. The gradient magnitude alone doesn’t give us a clear indication of the basin thresholds, but it shows are where there are no changes (zeros) and where there are changes (ascending/descending values). If we combine this information, with the location of the natural barriers (where the 9 are), we end up with this image:


0  16 -1 -1 -1  4  0  6  0 10
12 -1  4  0  0 -1 10 -1  6  9
-1  4  2  2  2  2 -1  0 -1 14
 0  0  4  3  2 -1  0  0  0 -1
-1  2 -1 -1 -1  2  5  4  3  2

Where every 9 location has been replaced with a -1. The basins are perfectly detected :)

Now that we are able to create the artificial thresholds for the flood fill algorithm, we only need to implement it. For doing it we’ll rely upon the tf.queue.FIFOQueue object.

TensorFlow queues

The TensorFlow tf.queue module offers several different implementations of the tf.queue.QueueBase object. The simples implementation to use is the FIFOQueue, that a queue implementation that dequeues elements in first-in first-out order.

Differently from the tf.TensorArray, we can treat the queue like a tf.Variable and declare it in the init and use in a tf.function-decorated method without any problem.

Detecting the basins

From the design phase: part two section we know that we need to compute the image gradient and get the gradient magnitude. This is straightforward since TensorFlow has a ready-to-use method for doing it. Moreover, we know we need to implement a flood fill algorithm, hence we need a queue.

Thus, in the init we add the _norm variable that will contain the image gradient magnitude (initialized to -1), we also add the queue for the flood fill algorithm.

def __init__(self, dataset: tf.data.Dataset):
    # [...]
    self._norm = tf.Variable(tf.zeros(self._padded_shape, dtype=tf.int32) - 1)
    self._queue = tf.queue.FIFOQueue(-1, [tf.int32])

We can now write the basins function, which finds all the basins, computes their size, and returns the product of the three largest basins as output.

@tf.function
def basins(self) -> tf.Tensor:
    batch = tf.reshape(
        self._padded_image, (1, self._padded_shape[0], self._padded_shape[1], 1)
    )
    gradients = tf.squeeze(tf.image.image_gradients(batch), axis=1)

    y_grad, x_grad = gradients[0], gradients[1]

    # Gradient magnitude is constant where there are no changes
    # Increases or stray constants from the low point (seed)
    norm = tf.cast(tf.norm(tf.cast(y_grad + x_grad, tf.float32), axis=-1), tf.int32)
    # Set the basin thresholds to -1 (where the 9s are)
    norm = tf.where(tf.equal(self._padded_image, 9), -1, norm)
    self._norm.assign(norm)

The tf.image.image_gradients function works on a batch of images, hence we first need to reshape the image from a tf.Tensor with shape (H,W) to a tensor with shape (1, H, W, 1).

With these few lines of code, we are in the situation shown in the previous design phase, with an image containing -1 where the thresholds are, and the gradient magnitudes inside the basins.

Now, we can use the low_points method for getting the seeds for our flood fill algorithm and propagate them within the basin. We use a TensorArray to keep track of the three largest basins size and compute their product at the end.

# For every se_posd, "propagate" in a flood fill-fashion.
# The -1s are the thresholds
seeds = self.low_points()[0]
ta = tf.TensorArray(tf.int32, size=3)
ta.unstack([0, 0, 0])
for idx in tf.range(2, tf.shape(seeds)[0] + 2):
    # Fill with idx (watershed like: different colors)
    seed = seeds[idx - 2]
    y = seed[0]
    x = seed[1]

    # Set the seed position to the label
    self._norm.scatter_nd_update([[y, x]], [-idx])

    # Find the 4 neighborhood, and get the values != -1
    neighborhood, neigh_coords = self._four_neigh(self._norm, seed)
    update_coords = tf.gather_nd(
        neigh_coords, tf.where(tf.not_equal(neighborhood, -1))
    )
    if tf.greater(tf.size(update_coords), 0):
        self._queue.enqueue_many(update_coords)
        while tf.greater(self._queue.size(), 0):
            pixel = self._queue.dequeue()
            # Update this pixel to the label value
            py, px = pixel[0], pixel[1]
            self._norm.scatter_nd_update([[py, px]], [-idx])
            px_neigh_vals, px_neigh_coords = self._four_neigh(self._norm, pixel)
            px_update_coords = tf.gather_nd(
                px_neigh_coords,
                tf.where(
                    tf.logical_and(
                        tf.not_equal(px_neigh_vals, -1),
                        tf.not_equal(px_neigh_vals, -idx),
                    )
                ),
            )
            if tf.greater(tf.size(px_update_coords), 0):
                self._queue.enqueue_many(px_update_coords)

    basin_size = tf.reduce_sum(tf.cast(tf.equal(self._norm, -idx), 3))
    if tf.greater(basin_size, ta.read(0)):
        first = basin_size
        second = ta.read(0)
        third = ta.read(1)
        ta = ta.unstack([first, second, third])
    elif tf.greater(basin_size, ta.read(1)):
        first = ta.read(0)
        second = basin_size
        third = ta.read(1)
        ta = ta.unstack([first, second, third])
    elif tf.greater(basin_size, ta.read(2)):
        ta = ta.write(2, basin_size)

return tf.reduce_prod(ta.stack())

If we print the content of self._norm after the execution of the algorithm, we can visualize the seed propagation.


-2 -2 -1 -1 -1 -3 -3 -3 -3 -3
-2 -1 -4 -4 -4 -1 -3 -1 -3 -3
-1 -4 -4 -4 -4 -4 -1 -5 -1 -3
-4 -4 -4 -4 -4 -1 -5 -5 -5 -1
-1 -4 -1 -1 -1 -5 -5 -5 -5 -5

Here we go! Day 9 problem completely solved treating it as a computer vision problem :)

Conclusion

You can see the complete solution in folder 9 on the dedicated Github repository: https://github.com/galeone/tf-aoc.

Solving this problem has been really fun because reusing knowledge that comes from the computer vision domain (one of the domains I’m more involved in) is always exciting. In particular, treating the input as an image and, thus, solving the problem using the image gradients, the neighborhoods, the flood fill algorithms, and all the other computer vision-related concepts is really fascinating to me.

The challenge in the challenge of using only TensorFlow for solving the problem is slowly progressing, so far I solved all the puzzles up to Day 12 (inclusive). So get ready for at least 3 more articles :) Let’s see when (and if!) TensorFlow alone won’t be enough.

If you missed the articles about the previous days’ solutions, here’s a handy list:

The next article will be about my solution to Day 10 problem. The problem is completely different from the one faced so far, and it will be a nice showcase on how to use TensorFlow for working with strings (spoiler: the challenge is about text parsing and syntax checking!).

For any feedback or comment, please use the Disqus form below - thanks!

Don't you want to miss the next article? Do you want to be kept updated?
Subscribe to the newsletter!

Related Posts

Advent of Code 2021 in pure TensorFlow - day 12

Day 12 problem projects us the world of graphs. TensorFlow can be used to work on graphs pretty easily since a graph can be represented as an adjacency matrix, and thus, we can have a tf.Tensor containing our graph. However, the "natural" way of exploring a graph is using recursion, and as we'll see in this article, this prevents us to solve the problem using a pure TensorFlow program, but we have to work only in eager mode.

Advent of Code 2021 in pure TensorFlow - day 11

The Day 11 problem has lots in common with Day 9. In fact, will re-use some computer vision concepts like the pixel neighborhood, and we'll be able to solve both parts in pure TensorFlow by using only a tf.queue as a support data structure.

Advent of Code 2021 in pure TensorFlow - day 10

The day 10 challenge projects us in the world of syntax checkers and autocomplete tools. In this article, we'll see how TensorFlow can be used as a generic programming language for implementing a toy syntax checker and autocomplete.

Advent of Code 2021 in pure TensorFlow - day 8

The day 8 challenge is, so far, the most boring challenge faced 😅. Designing a TensorFlow program - hence reasoning in graph mode - would have been too complicated since the solution requires lots of conditional branches. A known AutoGraph limitation forbids variables to be defined in only one branch of a TensorFlow conditional if the variable is used afterward. That's why the solution is in pure TensorFlow eager.

Advent of Code 2021 in pure TensorFlow - day 7

The day 7 challenge is easily solvable with the help of the TensorFlow ragged tensors. In this article, we'll solve the puzzle while learning what ragged tensors are and how to use them.

Advent of Code 2021 in pure TensorFlow - day 6

The day 6 challenge has been the first one that obliged me to completely redesign for part 2 the solution I developed for part 1. For this reason, in this article, we'll see two different approaches to the problem. The former will be computationally inefficient but will completely model the problem, hence it will be easy to understand. The latter, instead, will be completely different and it will focus on the puzzle goal instead of the complete modeling.

Advent of Code 2021 in pure TensorFlow - day 5

The day 5 challenge is easily solvable in pure TensorFlow thanks to its support for various distance functions and the power of the tf.math package. The problem only requires some basic math knowledge to be completely solved - and a little bit of computer vision experience doesn't hurt.

Advent of Code 2021 in pure TensorFlow - day 4

Using tensors for representing and manipulating data is very convenient. This representation allows changing shape, organizing, and applying generic transformations to the data. TensorFlow - by design - executes all the data manipulation in parallel whenever possible. The day 4 challenge is a nice showcase of how choosing the correct data representation can easily simplify a problem.

Advent of Code 2021 in pure TensorFlow - day 3

A Solution to the AoC day 3 puzzle in pure TensorFlow. This challenge allows us to explore the TensorArray data type and find their limitations when used inside a static-graph context. We'll also use a tf.function experimental (but very useful) feature for avoiding useless retraces and reusing the same graph with tensors of different shapes.

Advent of Code 2021 in pure TensorFlow - day 2

A Solution to the AoC day 2 puzzle in pure TensorFlow. How to use Enums in TensorFlow programs and the limitations of tf.Tensor used for type annotation