Advent of Code 2021 in pure TensorFlow - day 7


The day 7 challenge is easily solvable with the help of the TensorFlow ragged tensors. In this article, we’ll solve the puzzle while learning what ragged tensors are and how to use them.

Part 1 doesn’t require ragged tensors, but to easily solve part 2 we’ll introduce them briefly.

Day 7: The Treachery of Whales

You can click on the title above to read the full text of the puzzle. The TLDR version is:

Our input is a comma-separated list of horizontal positions

16,1,2,0,4,2,7,1,2,14

Each number represents the position on the X-axis of a crab. Every crab can move along the horizontal plane, and every movement costs 1 unit of fuel.

The text shows us the position that costs the least fuel: 2.

  • Move from 16 to 2: 14 fuel
  • Move from 1 to 2: 1 fuel
  • Move from 2 to 2: 0 fuel
  • Move from 0 to 2: 2 fuel
  • Move from 4 to 2: 2 fuel
  • Move from 2 to 2: 0 fuel
  • Move from 7 to 2: 5 fuel
  • Move from 1 to 2: 1 fuel
  • Move from 2 to 2: 0 fuel
  • Move from 14 to 2: 12 fuel

This costs a total of 37 fuel. This is the cheapest possible outcome; more expensive outcomes include aligning at position 1 (41 fuel), position 3 (39 fuel), or position 10 (71 fuel).

The puzzle goal is to align all the crabs to a position that minimizes the overall fuel consumption.

Design phase: part one

There are two observations to do:

  • The smallest possible fuel consumption is 0. Hence, it’s likely for the optimal configuration to be in the neighborhood of the more crowded position. In fact, in the example, 2 is the most frequent position.
  • Minimizing the overall fuel consumption means finding the $x$ value that satisfies the relation \(\sum_{i} {\left\|p_i - x\right\|} < \sum_{i} {\left\|p_i - y\right\|} \quad \forall x \neq y\) where \(x\) and \(y\) are elements of the position dataset and the sums are performed over all the available positions.

The observation 2 domain can be constrained to something smaller than all the available positions because of observation 1. The correct estimation of the neighborhood size is not required, hence we can just fix it to an arbitrary big value like half the dataset size.

Input pipeline

We create a tf.data.Dataset object for reading the text file line-by-line as usual. Since the dataset is a single line, we can keep it in memory and convert the dataset as a tf.Tensor that’s easy to use.

dataset = (
    tf.data.TextLineDataset("input")
    .map(lambda string: tf.strings.split(string, ","))
    .map(lambda string: tf.strings.to_number(string, out_type=tf.int64))
    .unbatch()
)

dataset_tensor = tf.convert_to_tensor(list(dataset))

Finding the neighborhood of the most frequent position

Observation 1 requires us to

  1. Find the most frequent position
  2. Decide a neighborhood size
  3. Find the neighborhood extremes (the min/max positions)
y, idx, count = tf.unique_with_counts(dataset_tensor, tf.int64)

max_elements = tf.reduce_max(count)
most_frequent_position = y[idx[tf.argmax(count)]]

neighborhood_size = tf.constant(
    tf.shape(dataset_tensor, tf.int64)[0] // tf.constant(2, tf.int64), tf.int64
)

min_neigh_val = tf.clip_by_value(
    most_frequent_position - neighborhood_size,
    tf.constant(0, tf.int64),
    most_frequent_position,
)

max_val = tf.reduce_max(dataset_tensor) + 1
max_neigh_val = tf.clip_by_value(
    most_frequent_position + neighborhood_size,
    most_frequent_position,
    max_val,
)

tf.unique_with_counts in combination with tf.argmax allows us to find the most frequent position.

The neighborhood size is arbitrary set to half the dataset size, and the neighborhood extremes are constrained into the [0, max(dataset)] range.

Minimizing the cost

Observation 2 is precisely the formula to implement. We look around in the neighborhood of the most frequent value, find the cost, and check if it’s the minimum cost found.

min_cost = tf.Variable(tf.cast(-1, tf.uint64))
found_position = tf.Variable(-1, dtype=tf.int64)

for x in tf.range(min_neigh_val, max_neigh_val):
    cost = tf.cast(tf.reduce_sum(tf.abs(dataset_tensor - x)), tf.uint64)
    if tf.less(cost, min_cost):
        min_cost.assign(cost)
        found_position.assign(x)
tf.print("(part one) min_cost: ", min_cost, " in position: ", found_position)

Pretty standard. The only peculiarity is how the min_cost variable has been initialized. Since there are no constants exposed by TensorFlow for the min/max values for integer values we can just define min_cost as an uint64 variable, initialized with a -1. -1 is represented as an integer value with all the bits set to 1, and thus if interpreted as an uint64 it gives us the maximum representable value for an unsigned int at 64 bits.

Part 1 is solved! We are ready for the part 2 challenge.

Design phase: part 2

The puzzle introduces a slight variation of fuel consumption. It’s not more constant (1 fuel unit per position), but it grows linearly with the position change: the first step costs 1, the second step costs 2, the third step costs 3, and so on.

The previous example best position is not 5, in fact

  • Move from 16 to 5: 66 fuel
  • Move from 1 to 5: 10 fuel
  • Move from 2 to 5: 6 fuel
  • Move from 0 to 5: 15 fuel
  • Move from 4 to 5: 1 fuel
  • Move from 2 to 5: 6 fuel
  • Move from 7 to 5: 3 fuel
  • Move from 1 to 5: 10 fuel
  • Move from 2 to 5: 6 fuel
  • Move from 14 to 5: 45 fuel

This costs a total of 168 fuel. This is the new cheapest possible outcome; the old alignment position (2) now costs 206 fuel instead.

This requirement requires us only to change how our fuel consumption is calculated, the observations performed in the initial design phase are still valid.

In practice, now we can compute the distance from a candidate position as we previously did (\(d_x = \left\| p_i - x \right\|\)) and use this distance to compute the fuel consumption.

The new fuel consumption is d_x + d_{x-} + \cdots + 1 + 0. There are several ways for doing this calculation, one of them is to create tensors with a different number of elements - depending on the value of the distance - and sum them together. The perfect tool for doing it is using ragged tensors.

TensorFlow ragged tensors

The documentation is pretty clear:

Your data comes in many shapes; your tensors should too. Ragged tensors are the TensorFlow equivalent of nested variable-length lists. They make it easy to store and process data with non-uniform shapes

There’s a whole package dedicated to ragged tensors: tf.ragged. This package defines ops for manipulating ragged tensors (tf.RaggedTensor), which are tensors with non-uniform shapes. In particular, each RaggedTensor has one or more ragged dimensions, which are dimensions whose slices may have different lengths. For example, the inner (column) dimension of rt=[[3, 1, 4, 1], [], [5, 9, 2], [6], []] is ragged, since the column slices (rt[0, :], ..., rt[4, :]) have different lengths.

Ragged range

In the tf.ragged package we have the tf.ragged.range function that’s the perfect candidate for implementing our solution. In fact, given a tensor containing the distances of the point x from all the other points of the dataset [3, 1, 5, ...] we can create in a single - parallel - step a ragged tensor containing all the fuel consumptions

[
    [0 1 2],
    [0],
    [0 1 2 3 4],
    ...
]

and sum 1 to all of them for obtaining the lists with the values to sum for obtaining the fuel consumption, and thus the cost.

# -- Part 2 --
min_cost, found_position = tf.cast(-1, tf.uint64), -1
min_cost.assign(tf.cast(-1, tf.uint64))
found_position.assign(-1)
for x in tf.range(min_neigh_val, max_neigh_val):
    diff = tf.abs(dataset_tensor - x)
    lists = tf.ragged.range(tf.ones(tf.shape(diff)[0], dtype=tf.int64), diff + 1)
    cost = tf.cast(tf.reduce_sum(lists), tf.uint64)
    if tf.less(cost, min_cost):
        min_cost.assign(cost)
        found_position.assign(x)

tf.print("(part two) min_cost: ", min_cost, " in position: ", found_position)

Using the ragged tensors has been possible to solve part 2 with a single change to part 1.

Conclusion

You can see the complete solution in folder 7 on the dedicated Github repository: https://github.com/galeone/tf-aoc.

The challenge in the challenge of using only TensorFlow for solving the problem is slowly progressing, so far I solved all the puzzles up to Day 11 (inclusive). So get ready for at least 4 more articles :) Let’s see when (and if!) TensorFlow alone won’t be enough.

If you missed the articles about the previous days’ solutions, here’s a handy list:

The next article will be about my solution to Day 8 problem. I’d be honest, that solution is ugly since the problem itself requires a bunch of if statement for solving it - nothing exciting. The solution to Day 9 problem, instead, is way more interesting because I solved it using lots of computer vision concepts like image gradients and flood fill algorithm!

For any feedback or comment, please use the Disqus form below - thanks!

Don't you want to miss the next article? Do you want to be kept updated?
Subscribe to the newsletter!

Related Posts

Advent of Code 2021 in pure TensorFlow - day 12

Day 12 problem projects us the world of graphs. TensorFlow can be used to work on graphs pretty easily since a graph can be represented as an adjacency matrix, and thus, we can have a tf.Tensor containing our graph. However, the "natural" way of exploring a graph is using recursion, and as we'll see in this article, this prevents us to solve the problem using a pure TensorFlow program, but we have to work only in eager mode.

Advent of Code 2021 in pure TensorFlow - day 11

The Day 11 problem has lots in common with Day 9. In fact, will re-use some computer vision concepts like the pixel neighborhood, and we'll be able to solve both parts in pure TensorFlow by using only a tf.queue as a support data structure.

Advent of Code 2021 in pure TensorFlow - day 10

The day 10 challenge projects us in the world of syntax checkers and autocomplete tools. In this article, we'll see how TensorFlow can be used as a generic programming language for implementing a toy syntax checker and autocomplete.

Advent of Code 2021 in pure TensorFlow - day 9

The day 9 challenge can be seen as a computer vision problem. TensorFlow contains some computer vision utilities that we'll use - like the image gradient - but it's not a complete framework for computer vision (like OpenCV). Anyway, the framework offers primitive data types like tf.TensorArray and tf.queue that we can use for implementing a flood-fill algorithm in pure TensorFlow and solve the problem.

Advent of Code 2021 in pure TensorFlow - day 8

The day 8 challenge is, so far, the most boring challenge faced 😅. Designing a TensorFlow program - hence reasoning in graph mode - would have been too complicated since the solution requires lots of conditional branches. A known AutoGraph limitation forbids variables to be defined in only one branch of a TensorFlow conditional if the variable is used afterward. That's why the solution is in pure TensorFlow eager.

Advent of Code 2021 in pure TensorFlow - day 6

The day 6 challenge has been the first one that obliged me to completely redesign for part 2 the solution I developed for part 1. For this reason, in this article, we'll see two different approaches to the problem. The former will be computationally inefficient but will completely model the problem, hence it will be easy to understand. The latter, instead, will be completely different and it will focus on the puzzle goal instead of the complete modeling.

Advent of Code 2021 in pure TensorFlow - day 5

The day 5 challenge is easily solvable in pure TensorFlow thanks to its support for various distance functions and the power of the tf.math package. The problem only requires some basic math knowledge to be completely solved - and a little bit of computer vision experience doesn't hurt.

Advent of Code 2021 in pure TensorFlow - day 4

Using tensors for representing and manipulating data is very convenient. This representation allows changing shape, organizing, and applying generic transformations to the data. TensorFlow - by design - executes all the data manipulation in parallel whenever possible. The day 4 challenge is a nice showcase of how choosing the correct data representation can easily simplify a problem.

Advent of Code 2021 in pure TensorFlow - day 3

A Solution to the AoC day 3 puzzle in pure TensorFlow. This challenge allows us to explore the TensorArray data type and find their limitations when used inside a static-graph context. We'll also use a tf.function experimental (but very useful) feature for avoiding useless retraces and reusing the same graph with tensors of different shapes.

Advent of Code 2021 in pure TensorFlow - day 2

A Solution to the AoC day 2 puzzle in pure TensorFlow. How to use Enums in TensorFlow programs and the limitations of tf.Tensor used for type annotation