Advent of Code 2021 in pure TensorFlow - day 9


The day 9 challenge can be seen as a computer vision problem. TensorFlow contains some computer vision utilities that we’ll use - like the image gradient - but it’s not a complete framework for computer vision (like OpenCV). Anyway, the framework offers primitive data types like tf.TensorArray and tf.queue that we can use for implementing a flood-fill algorithm in pure TensorFlow and solve the problem.

Day 9: Smoke Basin

You can click on the title above to read the full text of the puzzle. The TLDR version is:

Our dataset is a heightmap where each number corresponds to the height of a particular location. 9 is the maximum value, 0 is the lowest.

2199943210
3987894921
9856789892
8767896789
9899965678

Our first goal is to find the low points - the locations that are lower than any of its adjacent locations. Most locations have four adjacent locations (up, down, left, and right); locations on the edge or corner of the map have three or two adjacent locations, respectively. (Diagonal locations do not count as adjacent.).

Once the low points have been found, we must compute the sum of the risk levels. The risk level for a low point is defined as the sum of 1 plus its height.

In the example above, we have 4 low points (highlighted) and the sum of their risk levels is 15.

Design phase: part one

The heightmap can be seen as a 2D image. Every height is a pixel, and every pixel has is 4-neighborhood. We consider only the 4-neighborhood because we are asked to only look at horizontal and vertical neighbors and not to the diagonal ones.

We can loop over the image using a sliding window approach, centering a 3x3 kernel on every location and for each location searching for the lowest point.

Finding the low points requires looking at every pixel, extracting its 4-neighbors, and finding the minimum value among the 5 pixels (the neighbors and the pixel itself). For dealing with border pixels, we can just pad the input image with the correct amount of 9s (hence, point with maximum value).

Input pipeline

We create a tf.data.Dataset object for reading the text file line-by-line as usual.

dataset = (
    tf.data.TextLineDataset("input")
    .map(tf.strings.bytes_split)
    .map(lambda string: tf.strings.to_number(string, out_type=tf.int32))
)

Since we consider the whole dataset as a single image, we can convert it as a 2D tensor. We’ll do it directly in the constructor of the Finder object we are going to create for solving the problem. Moreover, since we are interested in the padded input image, we can also compute the padding amount and pad the image using tf.pad.

class Finder(tf.Module):
    def __init__(self, dataset: tf.data.Dataset):

        super().__init__()

        self._dataset = dataset
        self._image = tf.convert_to_tensor(list(self._dataset))
        self._shape = tf.shape(self._image)
        self._max_value = tf.reduce_max(self._image)

        if tf.not_equal(tf.math.mod(self._shape[0], 3), 0):
            pad_h = self._shape[0] - (self._shape[0] // 3 + 3)
        else:
            pad_h = 0
        if tf.not_equal(tf.math.mod(self._shape[1], 3), 0):
            pad_w = self._shape[1] - (self._shape[1] // 3 + 3)
        else:
            pad_w = 0

        self._padded_image = tf.pad(
            self._image,
            [[0, pad_w], [1, pad_h]],
            mode="CONSTANT",
            constant_values=self._max_value,
        )

        self._padded_shape = tf.shape(self._padded_image)

Finding the 4-neighbors

We’ll loop over the image in a sliding window fashion, and for every pixel, we’ll search for the 4-neighborhood. Hence, we can define a function that given an input grid (2D Tensor), and the coordinates of pixel (in TensorFlow format, hence (y,x)) it returns both the neighborhood values and the neighborhood coordinates.

@tf.function
def _four_neigh(
    self, grid: tf.Tensor, center: tf.Tensor
) -> Tuple[tf.Tensor, tf.Tensor]:
    neigh_mask = tf.constant([(-1, 0), (0, -1), (1, 0), (0, 1)])
    y, x = center[0], center[1]

    if tf.logical_and(tf.less(y, 1), tf.less(x, 1)):
        mask = neigh_mask[2:]
    elif tf.less(y, 1):
        mask = neigh_mask[1:]
    elif tf.less(x, 1):
        mask = tf.concat([[neigh_mask[0]], neigh_mask[2:]], axis=0)
    else:
        mask = neigh_mask

    coords = center + mask

    neighborhood = tf.gather_nd(grid, coords)
    return neighborhood, coords

The function just avoids looking in non-existent coordinates and gathers all the required values with tf.gather_nd.

We are now ready to loop over the padded image, center our “virtual” kernel (because we are not really defining a 3x3 kernel) on every pixel, find the neighborhood, and search for the lowest points.

Finding the lowest points

Since we are creating a pure TensorFlow program, the variables must be defined outside the methods decorated with tf.function. Thus, since we are interested in the sum of the risk level, we’ll add in the init a variable for this sum (count). We’ll use a tf.TensorArray to save the coordinate of every lowest point found and we return them as well.

def __init__(self, dataset: tf.data.Dataset):
    # [...]
    self._count = tf.Variable(0)

@tf.function
def low_points(self) -> Tuple[tf.Tensor, tf.Tensor]:
    self._count.assign(0)
    ta = tf.TensorArray(tf.int32, size=0, dynamic_size=True)

    for y in tf.range(self._padded_shape[0] - 1):
        for x in tf.range(self._padded_shape[1] - 1):
            center = tf.convert_to_tensor([y, x])
            neighborhood, _ = self._four_neigh(self._padded_image, center)
            extended_neighborhood = tf.concat(
                [tf.expand_dims(self._padded_image[y, x], axis=0), neighborhood],
                axis=0,
            )

            minval = tf.reduce_min(extended_neighborhood)
            if tf.logical_and(
                tf.reduce_any(tf.not_equal(extended_neighborhood, minval)),
                tf.equal(minval, self._padded_image[y, x]),
            ):
                self._count.assign_add(1 + self._padded_image[y, x])

                ta = ta.write(ta.size(), center)

    return ta.stack(), self._count

Execution

Here we go!

finder = Finder(dataset)
tf.print("Part one: ", finder.low_points()[1])

Part one is easily solved. Let’s see what part two is about.

Design phase: part 2

Part 2 asks us to find the 3 largest basins and multiply their size together. A basin is all locations that eventually flow downward to a single low point. Therefore, every low point has a basin. The size of a basin is the number of locations within the basin.

In the previous example there are 4 basins. Here’s the middle basin, of size 14:

2199943210
3987894921
9856789892
8767896789
9899965678

and here’s also the top-right basin of size 9:

2199943210
3987894921
9856789892
8767896789
9899965678

How can we find the basins? For sure we know that every low point is inside a basin, we can consider every low point a seed for a flood fill algorithm. But how can we find the basins thresholds? All we know is that every location with 9 is a natural threshold and that a basin is a flow of decreasing numbers around a low point.

Every computer vision practitioner immediately thinks to the image gradient when talking about changes in the color intensity. In particular, we can compute the horizontal and vertical gradients and compute the gradient magnitude for extracting the information about the rate of change in every pixel location.

For example, the gradient magnitude of the example image is:


7 0 16 1 2 6 4 0 6 0
6 12 2 4 0 0 6 10 8 6
0 2 4 2 2 2 2 4 0 8 1
1 0 0 4 3 2 6 0 0 0 1
0 1 2 0 0 3 2 5 4 3 2

The higher the magnitude the higher the change along the x and y directions. The gradient magnitude alone doesn’t give us a clear indication of the basin thresholds, but it shows are where there are no changes (zeros) and where there are changes (ascending/descending values). If we combine this information, with the location of the natural barriers (where the 9 are), we end up with this image:


0  16 -1 -1 -1  4  0  6  0 10
12 -1  4  0  0 -1 10 -1  6  9
-1  4  2  2  2  2 -1  0 -1 14
 0  0  4  3  2 -1  0  0  0 -1
-1  2 -1 -1 -1  2  5  4  3  2

Where every 9 location has been replaced with a -1. The basins are perfectly detected :)

Now that we are able to create the artificial thresholds for the flood fill algorithm, we only need to implement it. For doing it we’ll rely upon the tf.queue.FIFOQueue object.

TensorFlow queues

The TensorFlow tf.queue module offers several different implementations of the tf.queue.QueueBase object. The simples implementation to use is the FIFOQueue, that a queue implementation that dequeues elements in first-in first-out order.

Differently from the tf.TensorArray, we can treat the queue like a tf.Variable and declare it in the init and use in a tf.function-decorated method without any problem.

Detecting the basins

From the design phase: part two section we know that we need to compute the image gradient and get the gradient magnitude. This is straightforward since TensorFlow has a ready-to-use method for doing it. Moreover, we know we need to implement a flood fill algorithm, hence we need a queue.

Thus, in the init we add the _norm variable that will contain the image gradient magnitude (initialized to -1), we also add the queue for the flood fill algorithm.

def __init__(self, dataset: tf.data.Dataset):
    # [...]
    self._norm = tf.Variable(tf.zeros(self._padded_shape, dtype=tf.int32) - 1)
    self._queue = tf.queue.FIFOQueue(-1, [tf.int32])

We can now write the basins function, which finds all the basins, computes their size, and returns the product of the three largest basins as output.

@tf.function
def basins(self) -> tf.Tensor:
    batch = tf.reshape(
        self._padded_image, (1, self._padded_shape[0], self._padded_shape[1], 1)
    )
    gradients = tf.squeeze(tf.image.image_gradients(batch), axis=1)

    y_grad, x_grad = gradients[0], gradients[1]

    # Gradient magnitude is constant where there are no changes
    # Increases or stray constants from the low point (seed)
    norm = tf.cast(tf.norm(tf.cast(y_grad + x_grad, tf.float32), axis=-1), tf.int32)
    # Set the basin thresholds to -1 (where the 9s are)
    norm = tf.where(tf.equal(self._padded_image, 9), -1, norm)
    self._norm.assign(norm)

The tf.image.image_gradients function works on a batch of images, hence we first need to reshape the image from a tf.Tensor with shape (H,W) to a tensor with shape (1, H, W, 1).

With these few lines of code, we are in the situation shown in the previous design phase, with an image containing -1 where the thresholds are, and the gradient magnitudes inside the basins.

Now, we can use the low_points method for getting the seeds for our flood fill algorithm and propagate them within the basin. We use a TensorArray to keep track of the three largest basins size and compute their product at the end.

# For every se_posd, "propagate" in a flood fill-fashion.
# The -1s are the thresholds
seeds = self.low_points()[0]
ta = tf.TensorArray(tf.int32, size=3)
ta.unstack([0, 0, 0])
for idx in tf.range(2, tf.shape(seeds)[0] + 2):
    # Fill with idx (watershed like: different colors)
    seed = seeds[idx - 2]
    y = seed[0]
    x = seed[1]

    # Set the seed position to the label
    self._norm.scatter_nd_update([[y, x]], [-idx])

    # Find the 4 neighborhood, and get the values != -1
    neighborhood, neigh_coords = self._four_neigh(self._norm, seed)
    update_coords = tf.gather_nd(
        neigh_coords, tf.where(tf.not_equal(neighborhood, -1))
    )
    if tf.greater(tf.size(update_coords), 0):
        self._queue.enqueue_many(update_coords)
        while tf.greater(self._queue.size(), 0):
            pixel = self._queue.dequeue()
            # Update this pixel to the label value
            py, px = pixel[0], pixel[1]
            self._norm.scatter_nd_update([[py, px]], [-idx])
            px_neigh_vals, px_neigh_coords = self._four_neigh(self._norm, pixel)
            px_update_coords = tf.gather_nd(
                px_neigh_coords,
                tf.where(
                    tf.logical_and(
                        tf.not_equal(px_neigh_vals, -1),
                        tf.not_equal(px_neigh_vals, -idx),
                    )
                ),
            )
            if tf.greater(tf.size(px_update_coords), 0):
                self._queue.enqueue_many(px_update_coords)

    basin_size = tf.reduce_sum(tf.cast(tf.equal(self._norm, -idx), 3))
    if tf.greater(basin_size, ta.read(0)):
        first = basin_size
        second = ta.read(0)
        third = ta.read(1)
        ta = ta.unstack([first, second, third])
    elif tf.greater(basin_size, ta.read(1)):
        first = ta.read(0)
        second = basin_size
        third = ta.read(1)
        ta = ta.unstack([first, second, third])
    elif tf.greater(basin_size, ta.read(2)):
        ta = ta.write(2, basin_size)

return tf.reduce_prod(ta.stack())

If we print the content of self._norm after the execution of the algorithm, we can visualize the seed propagation.


-2 -2 -1 -1 -1 -3 -3 -3 -3 -3
-2 -1 -4 -4 -4 -1 -3 -1 -3 -3
-1 -4 -4 -4 -4 -4 -1 -5 -1 -3
-4 -4 -4 -4 -4 -1 -5 -5 -5 -1
-1 -4 -1 -1 -1 -5 -5 -5 -5 -5

Here we go! Day 9 problem completely solved treating it as a computer vision problem :)

Conclusion

You can see the complete solution in folder 9 on the dedicated Github repository: https://github.com/galeone/tf-aoc.

Solving this problem has been really fun because reusing knowledge that comes from the computer vision domain (one of the domains I’m more involved in) is always exciting. In particular, treating the input as an image and, thus, solving the problem using the image gradients, the neighborhoods, the flood fill algorithms, and all the other computer vision-related concepts is really fascinating to me.

The challenge in the challenge of using only TensorFlow for solving the problem is slowly progressing, so far I solved all the puzzles up to Day 12 (inclusive). So get ready for at least 3 more articles :) Let’s see when (and if!) TensorFlow alone won’t be enough.

If you missed the articles about the previous days’ solutions, here’s a handy list:

The next article will be about my solution to Day 10 problem. The problem is completely different from the one faced so far, and it will be a nice showcase on how to use TensorFlow for working with strings (spoiler: the challenge is about text parsing and syntax checking!).

For any feedback or comment, please use the Disqus form below - thanks!

Don't you want to miss the next article? Do you want to be kept updated?
Subscribe to the newsletter!

Related Posts

Building a RAG for tabular data in Go with PostgreSQL & Gemini

In this article we explore how to combine a large language model (LLM) with a relational database to allow users to ask questions about their data in a natural way. It demonstrates a Retrieval-Augmented Generation (RAG) system built with Go that utilizes PostgreSQL and pgvector for data storage and retrieval. The provided code showcases the core functionalities. This is an overview of how the "chat with your data" feature of fitsleepinsights.app is being developed.

Using Gemini in a Go application: limits and details

This article explores using Gemini within Go applications via Vertex AI. We'll delve into the limitations encountered, including the model's context window size and regional restrictions. We'll also explore various methods for feeding data to Gemini, highlighting the challenges faced due to these limitations. Finally, we'll briefly introduce RAG (Retrieval-Augmented Generation) as a potential solution, but leave its implementation details for future exploration.

Custom model training & deployment on Google Cloud using Vertex AI in Go

This article shows a different approach to solving the same problem presented in the article AutoML pipeline for tabular data on VertexAI in Go. This time, instead of relying on AutoML we will define the model and the training job ourselves. This is a more advanced usage that allows the experienced machine learning practitioner to have full control on the pipeline from the model definition to the hardware to use for training and deploying. At the end of the article, we will also see how to use the deployed model. All of this, in Go and with the help of Python and Docker for the custom training job definition.

Integrating third-party libraries as Unreal Engine plugins: solving the ABI compatibility issues on Linux when the source code is available

In this article, we will discuss the challenges and potential issues that may arise during the integration process of a third-party library when the source code is available. It will provide guidance on how to handle the compilation and linking of the third-party library, manage dependencies, and resolve compatibility issues. We'll realize a plugin for redis plus plus as a real use case scenario, and we'll see how tough can it be to correctly compile the library for Unreal Engine - we'll solve every problem step by step.

AutoML pipeline for tabular data on VertexAI in Go

In this article, we delve into the development and deployment of tabular models using VertexAI and AutoML with Go, showcasing the actual Go code and sharing insights gained through trial & error and extensive Google research to overcome documentation limitations.

Advent of Code 2022 in pure TensorFlow - Day 12

Solving problem 12 of the AoC 2022 in pure TensorFlow is a great exercise in graph theory and more specifically in using the Breadth-First Search (BFS) algorithm. This problem requires working with a grid of characters representing a graph, and the BFS algorithm allows us to traverse the graph in the most efficient way to solve the problem.

Advent of Code 2022 in pure TensorFlow - Day 11

In this article, we'll show how to solve problem 11 from the Advent of Code 2022 (AoC 2022) using TensorFlow. We'll first introduce the problem and then provide a detailed explanation of our TensorFlow solution. The problem at hand revolves around the interactions of multiple monkeys inspecting items, making decisions based on their worry levels, and following a set of rules.

Advent of Code 2022 in pure TensorFlow - Day 10

Solving problem 10 of the AoC 2022 in pure TensorFlow is an interesting challenge. This problem involves simulating a clock signal with varying frequencies and tracking the state of a signal-strength variable. TensorFlow's ability to handle complex data manipulations, control structures, and its @tf.function decorator for efficient execution makes it a fitting choice for tackling this problem. By utilizing TensorFlow's features such as Dataset transformations, efficient filtering, and tensor operations, we can create a clean and efficient solution to this intriguing puzzle.

Advent of Code 2022 in pure TensorFlow - Day 9

In this article, we'll show two different solutions to the Advent of Code 2022 day 9 problem. Both of them are purely TensorFlow solutions. The first one, more traditional, just implement a solution algorithm using only TensorFlow's primitive operations - of course, due to some TensorFlow limitations this solution will contain some details worth reading (e.g. using a pairing function for being able to use n-dimensional tf.Tensor as keys for a mutable hashmap). The second one, instead, demonstrates how a different interpretation of the problem paves the way to completely different solutions. In particular, this solution is Keras based and uses a multi-layer convolutional model for modeling the rope movements.

Advent of Code 2022 in pure TensorFlow - Day 8

Solving problem 8 of the AoC 2022 in pure TensorFlow is straightforward. After all, this problem requires working on a bi-dimensional grid and evaluating conditions by rows or columns. TensorFlow is perfectly suited for this kind of task thanks to its native support for reduction operators (tf.reduce) which are the natural choice for solving problems of this type.