The day 3 challenge is very different from the easy challenges faced during day 1 and day 2. This time, we need to face a more difficult challenge and by doing so we’ll explore some useful, although not widely used, TensorFlow features like
Moreover, we’ll find some limitations (bug?) of the
TensorArray data type and we’ll write some interesting utility that’s not present in the TensorFlow standard library.
Day 3: Binary Diagnostic: part one
You can click on the title above to read the full text of the puzzle. The TLDR version is:
You are given a dataset in the format
00100 11110 10110 10111 10101 01111 00111 11100 10000 11001 00010 01010
The text asks to generate two new binary numbers called gamma rate and epsilon rate. The goal is to multiply these numbers together and give the result in decimal. This result is called “power consumption”.
- Gamma rate: the number can be determined by finding the most common bit in the corresponding position of all numbers. For example, the gamma rate for the example dataset is
- Epsilon rate: exactly like the gamma rate, but instead of finding the most common bit, we are asked to find the least common bit. In the example, the epsilon rate is
The result is thus
22 * 9 = 198.
There are several small challenges to face
- Convert a binary number to decimal. There’s no ready-to-use function in the TensorFlow library, so we have to write it by ourselves.
- Finding the most common element in a set. A set because the order doesn’t matter.
There are also some observations to do:
- The gamma rate and the epsilon rate are complementary. Hence we can switch from one to the other by applying the bitwise not. e.g
~01001 = 10110. This means we can focus only on finding the most common element and the least common element can be easily found with a single bitwise operation.
- When searching for the most frequent elements, we can have undefined situations where the set is perfectly balanced (e.g. 50% of
1and 50% of
0). We should handle this undefined situation.
In addition, differently from day 1 and day 2, there’s no advantage in using a
tf.data.Dataset object and loop throw it. In fact, the
tf.data.Dataset object is very convenient when we have to loop over the data, group them, filter them, and apply transformations over it. In this case, however, once we have the input converted it’s more convenient to consider the whole dataset as a single
In fact, knowing the cardinality (number of elements) of a TensorFlow dataset is not always possible. For easily addressing consideration 2 expressed above, knowing the cardinality can simplify the problem (we can effectively know if we reached 50%).
We create a
tf.data.Dataset object as usual but this time, we convert it to a
dataset = ( tf.data.TextLineDataset("input") # "0101" .map(tf.strings.bytes_split) # '0', '1', '0', '1' .map(lambda digit: tf.strings.to_number(digit, out_type=tf.int64)) # 0 1 0 1 ) # We can do this in a raw way, treating the whole dataset as a tensor # so we can know its shape and extract the most frequent elements easily tensor_dataset = tf.convert_to_tensor(list(dataset))
Interesting is the usage of the
tf.strings.bytes_split function to convert a string into a tensor of chars and then convert every char into a number.
tensor_dataset is a
tf.Tensor with a shape of
(cardinality, number of bits). This representation is very friendly when searching for the most frequent bit.
Most frequent bits
Using TensorFlow we have the huge advantages of parallel calculation. In fact, instead of singularly looking for the first position, search for the most frequent bit, then move to the second position, search for the most frequent bit, and so on… We can do it all at once.
The most frequent bit, for every position, can be computed as the comparison between the sum of all the elements across the 0-axis and half the dataset cardinality.
We can take into account the undefined scenario (number of ones equal to the number of zero, the equivalent of the number of ones equal to half dataset cardinality), by returning a bitmask-like
True where this condition holds.
@tf.function def most_frequent_bits(tensor: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor]: """Counts the most frequent bits in the input tensor. Args: tensor: a tensor with shape (cardinality, number of bits) Returns: (most_frequent, undefined). Both tensor with shape (n_bits). - most_frequent: every position contains the most frequent bit - undefined: every position containing a True marks that position like undefined. There's perfect balanced beweeen 1s and 0s. """ count = tf.reduce_sum(tensor, axis=0) tot = tf.cast(tf.shape(tensor), tf.int64) half = tot // 2 ret = tf.cast(tf.greater(count, half), tf.int64) return tf.squeeze(ret), tf.squeeze( tf.logical_and(tf.equal(count, half), tf.equal(tf.math.mod(tot, 2), 0)) ) # True where #1 == #0
Binary to decimal conversion
The binary number computed by
most_frequent_bits, that’s a
tf.Tensor with shape
(n_bits), should be converted from binary to decimal to submit (and visualize) the result.
bin2dec function can be easily defined. It’s just the implementation, as usual in pure TensorFlow, of the binary to decimal base conversion.
@tf.function def bin2dec(bin_tensor: tf.Tensor): two = tf.cast(2, tf.int64) return tf.reduce_sum( tf.reverse(bin_tensor, axis=) * two ** tf.range(tf.size(bin_tensor), dtype=tf.int64) )
I defined the
two as a constant by using the
tf.cast operation because I wanted to accept as input a
tf.int64 dtype (I know, a complete waste of storage for only 0s and 1s), and since TensorFlow requires all the types to be the same, I had to explicitly define
two in this way. The alternative was to define it as a ‘tf.constant(2, dtype=tf.int64)`.
Since we know, from the initial observations, that
gamma_rate is the complement of
epsilon_rate (and vice versa), we already have all we need to solve the problem!
gamma_rate, _ = most_frequent_bits(tensor_dataset) tf.print("gamma rate (bin): ", gamma_rate) gamma_rate_dec = bin2dec(gamma_rate) tf.print("gamma rate (dec): ", gamma_rate_dec) # epsilon rate is the complement epsilon_rate = tf.cast(tf.logical_not(tf.cast(gamma_rate, tf.bool)), tf.int64) tf.print("epsilon rate (bin): ", epsilon_rate) epsilon_rate_dec = bin2dec(epsilon_rate) tf.print("epislon rate (dec): ", epsilon_rate_dec) power_consuption = gamma_rate_dec * epsilon_rate_dec tf.print("power consumption: ", power_consuption)
Here we go, part 1 solved!
As I mentioned in the introduction, part 3 requires the usage of
tf.TensorArray - let’s see where precisely in part 2.
Day 3: Binary Diagnostic: part two
The puzzle becomes way more complicated. The text asks to determine the so-called life support rating, that’s the product of the oxygen generator rating and CO2 scrubber rating.
These 2 rating values are hidden in the input dataset, and there are several conditions for extracting them. I hereby quote the text from the [official Day 3 - Advent of Code 2021] page, since there’s no way to summarize the process more than this.
Both values are located using a similar process that involves filtering out values until only one remains. Before searching for either rating value, start with the full list of binary numbers from your diagnostic report and consider just the first bit of those numbers. Then:
- Keep only numbers selected by the bit criteria for the type of rating value for which you are searching. Discard numbers which do not match the bit criteria.
- If you only have one number left, stop; this is the rating value for which you are searching.
- Otherwise, repeat the process, considering the next bit to the right.
The bit criteria depends on which type of rating value you want to find:
- To find oxygen generator rating, determine the most common value (0 or 1) in the current bit position, and keep only numbers with that bit in that position. If 0 and 1 are equally common, keep values with a 1 in the position being considered.
- To find CO2 scrubber rating, determine the least common value (0 or 1) in the current bit position, and keep only numbers with that bit in that position. If 0 and 1 are equally common, keep values with a 0 in the position being considered.
The highlighted parts are great hints since they pinpoint some design direction.
Design phase - part two
The task requires to start considering the first bit. Filter the dataset based on the first bit by satisfying the bit criteria, and with this new dataset repeat the process considering the next bit. Repeat until a single value remains.
The bit criteria depend on the rating we are interested in, if oxygen or CO2, and this conditions the usage of our previously defined
most_common_bits. Moreover, the “undefined” condition we considered earlier will be very handy in this phase given that we need to decide to keep values with a 1 (or 0) in a certain position if the number of 1s and 0s in the examined position matches.
Let’s see how to implement the filter by bit criteria.
Filter by bit criteria
Depending on the criteria, we should produce a new dataset ready for the next iteration. The parameters that change the behavior of the filter are:
current_bit_positionindicates which bit to consider while filtering.
- A boolean flag (called
oxygen) that changes the behavior from the search from the most common to the least common bit.
class RateFinder(tf.Module): def __init__(self, bits): super().__init__() # Constants self._zero = tf.constant(0, tf.int64) self._one = tf.constant(1, tf.int64) self._two = tf.constant(2, tf.int64) # ... we'll add more fields later in the tutorial @tf.function(experimental_relax_shapes=True) def filter_by_bit_criteria( self, dataset_tensor: tf.Tensor, current_bit_position: tf.Tensor, oxygen: tf.Tensor, ): if oxygen: flag = self._one frequencies, mask = most_frequent_bits(dataset_tensor) else: flag = self._zero frequencies, mask = most_frequent_bits(dataset_tensor) frequencies = tf.cast( tf.logical_not(tf.cast(frequencies, tf.bool)), tf.int64, ) # #0 == #1 pick the elements with the correct bitflag if mask[current_bit_position]: indices = tf.where( tf.equal( dataset_tensor[:, current_bit_position], flag, ) ) else: indices = tf.where( tf.equal( dataset_tensor[:, current_bit_position], frequencies[current_bit_position], ) ) # All elements with the bit "position" equal to frequencies[position] gathered = tf.gather_nd(dataset_tensor, indices) return gathered
That’s the precise implementation of the requirements for the bit criteria. There are 2 details worth mentioning of this implementation
- The constants usage. Instead of using Python numbers, I defined
tf.constants in the
init. This is the recommended approach for having control over the data types, and for avoiding useless conversions. Our graph is really static, and these are constants. Autograph cannot change this.
experimental_relax_shapes=True: the input dataset will change every time we iterate over a new result.
tf.functionby default creates a new graph every time the input changes. If you’re using Python values, it creates it every time the value changes (hence, never use Python scalars as input!). If you’re using (as we are)
tf.Tensoras input, if the
tf.Tensorchanges, a new graph is created. This behavior is not ideal for this scenario, but luckily we can set the
Trueto change this behavior and re-use the same graph even when the shape changed
Having a tensor that changes its shape is something that may sound strange to many TensorFlow practitioner.
In fact, almost all the objects in TensorFlow are immutable. For example, a
tf.Variable once defined, can never change its shape. If you try to assign something with a different shape to a
tf.Variable you’ll get an error (and graph-definition time if the shape of the right-hand element is known at that time, at runtime otherwise).
TensorArray as mutable-shape variables
The only data structure that can change its shape dynamically and be treated (more or less) like
tf.TensorArray( dtype, size=None, dynamic_size=None, clear_after_read=None, tensor_array_name=None, handle=None, flow=None, infer_shape=True, element_shape=None, colocate_with_first_write_call=True, name=None )
The signature is pretty clear. The only required argument is the
dtype. What’s really interesting are the
The former allows to define the initial size of the TensorArray, the latter enables the TensorArray to grow past its initial size.
This feature seems to perfectly match our requirement of a shape-changing
tf.Tensor dataset (and not a
tf.data.Dataset since we do need to know, without looping over the dataset uselessly, the cardinality).
It’s possible to read and write singularly the elements of a TensorArray, but for our case, we are interested in the complete read and complete write. For completely overwriting the TensorArray content we need to call the
unstack method, while for converting the content to a
tf.Tensor the method to call is
This is all we need to know for solving our problem. However, the documentation contains lots of examples and info.
Finding the ratings
So we have two ratings to find,
CO2. We just need to loop over the bits, apply
filter_by_bit_criteria to obtain a new dataset, and continue until we don’t have a single
tf.Tensor that’s our searched rating.
We need some states for our TensorFlow program, one is the
_ta) the other is the rating
_rating. In particular, we need this
_rating variable since it’s not possible to invoke the
return statement while we are inside a loop when working in graph mode.
def __init__(self, bits): # ... previous fields omitted self._bits = tf.constant(tf.cast(bits, tf.int64)) # Variables self._rating = tf.Variable(tf.zeros([bits], dtype=tf.int64), trainable=False) self._frequencies = tf.Variable( tf.zeros([bits], dtype=tf.int64), trainable=False ) self._ta = tf.TensorArray( size=1, dtype=tf.int64, dynamic_size=True, clear_after_read=True ) # @tf.function def find(self, dataset_tensor: tf.Tensor, oxygen: tf.Tensor): num_bits = tf.shape(dataset_tensor)[-1] self._ta.unstack(dataset_tensor) for current_bit_position in tf.range(num_bits): ta = self._ta.stack() gathered = tf.squeeze( self.filter_by_bit_criteria(ta, current_bit_position, oxygen) ) if tf.equal(tf.size(gathered), num_bits): self._rating.assign(gathered) break self._ta.unstack(gathered) return self._rating
find method iterates over the
dataset_tensor, invokes the filter method passing the
oxygen boolean (tensor) flag and it returns the found rating when found.
Execution - part two
Very similar to the previous execution, this time we create our
finder that’s an instance of
RateFinder and use it to find the required rates.
# gamma_rate contains the most frequent bit in each position 0 1 0 1 0 ... # starting from that, we can gather all the numbers that have the more common bit # in the "position". finder = RateFinder(bits=tf.size(epsilon_rate)) oxygen_generator_rating = finder.find(tensor_dataset, True) tf.print("Oxygen generator rating (bin): ", oxygen_generator_rating) oxygen_generator_rating_dec = bin2dec(oxygen_generator_rating) tf.print("Oxygen generator rating (dec): ", oxygen_generator_rating_dec) co2_generator_rating = finder.find(tensor_dataset, False) tf.print("C02 scrubber rating (bin): ", co2_generator_rating) co2_generator_rating_dec = bin2dec(co2_generator_rating) tf.print("C02 scrubber rating (dec): ", co2_generator_rating_dec) tf.print( "life support rating = ", oxygen_generator_rating_dec * co2_generator_rating_dec )
It works! Problem 3 is solved!
But maybe you noticed something strange in the find method…
TensorArray limitation in graph-mode
find method can’t be converted to its graph counterpart. That’s why I commented out the
@tf.function decoration. There’s a known bug/limitation/it-works-in-this-way-by-design in TensorArray that’s not clear whenever (if ever) it will be solved.
If I re-enable the decoration, this is what happens
File "/home/pgaleone/tf/aoc/3/main.py", line 99, in find * self._ta.unstack(gathered) ValueError: Cannot infer argument `num` from shape <unknown>
In fact, TensorFlow creates static graphs and all this dynamism that TensorArrays give seems to not integrate correctly with the rest of the static-graph ecosystem. The problem is during the
unstack call, but since
gathered Tensor shape will change on every iteration this can’t be converted to a static graph. :(
You can see the complete solution in folder
3 on the dedicated Github repository: https://github.com/galeone/tf-aoc.
The challenge has been way more interesting with respect to the day previous two of day 1 and day 2. Solving this puzzle allowed me to show how to exploit the native parallelism of TensorFlow for computing reduction operation in parallel over a Tensor.
The second part showed that
TensorArray is perhaps the more dynamic data structure offered by TensorFlow but it has some problems while working in graph-mode - and that’s a pity.
I solved puzzles 4 and 5 and both have been fun. The next articles about my pure TensorFlow solution for day 4 will arrive soon!
For any feedback or comment, please use the Disqus form below - thanks!