The day 10 challenge projects us in the world of syntax checkers and autocomplete tools. In this article, we’ll see how TensorFlow can be used as a generic programming language for implementing a toy syntax checker and autocomplete.

## Day 10: Syntax Scoring

You can click on the title above to read the full text of the puzzle. The TLDR version is:

The puzzle gives us a dataset of lines with different lengths, each of them contains several **chunks**. A chunk is nothing but some text that starts with a character and closes with another character. In particular:

- If a chunk opens with
`(`

, it must close with`)`

. - If a chunk opens with
`[`

, it must close with`]`

. - If a chunk opens with
`{`

, it must close with`}`

. - If a chunk opens with
`<`

, it must close with`>`

.

The dataset looks like

```
[(()[<>])]({[<{<<[]>>(
{([(<{}[<>[]}>{[]{[(<()>
(((({<>}<{<{<>}{[]{[]{}
[[<[([]))<([[{}[[()]]]
[{[{({}]{}}([{[{{{}}([]
{<[[]]>}<{[{[{[]{()[[[]
[<(<(<(<{}))><([]([]()
<{([([[(<>()){}]>(<<{{
<{([{{}}[<[[[<>{}]]]>[]]
```

where some lines are **corrupted** others are **incomplete**.

- A
**corrupted**line is one where a chunk closes with the wrong character. - An
**incomplete**line is missing some closing characters at the end of the line.

Part one asks us to implement a **syntax checker**. So, given the example dataset, our program should detect errors like

Expected

`]`

, but found`}`

instead.

at the third line (that’s the first corrupted line): `{([(<{}[<>[]}>{[]{[(<()>`

.

Every time we detect a corrupted line, we should stop our parsing for that line, **take the first illegal character** on the line and use this **lookup table** to compute a score.

Char | Points |
---|---|

`)` |
3 |

`]` |
57 |

`}` |
1197 |

`>` |
25137 |

The final score is the sum of all the errors detected in all the corrupted lines.

### Design phase: part one

For detecting the first wrong character we need to tokenize the string and consider every character has a language token.

We know from the rules that every opening token as a corresponding closing token. This means that every time we found an opening token we can **push into a stack** the corresponding closing token.
As soon as we detect a closing token, we pop the expected one from the stack. If the popped one does not correspond to the token under analysis: we found a **corrupted** line, and we also know what the expected char is, and what we found instead. Problem solved!

If instead, we reached the end of the line and our stack is not empty, it means or line is **incomplete**.

### Input pipeline

We create a `tf.data.Dataset`

object for reading the text file line-by-line as usual. Since we work with characters, we can directly map the function `tf.strings.bytes_split`

that given a `tf.string`

explodes it in a batch of `tf.string`

with length 1.

```
dataset = tf.data.TextLineDataset("input").map(tf.strings.bytes_split)
```

We can now define our TensorFlow program named `Tokenizer`

.

### Finding corrupted lines

We can define a `Tokenizer`

class that contains all the mapping defined in the requirements. We have all the opening tokens, the closing tokens, the mapping between opening and closing, and the scores to use when we find a wrong closing token.

```
class Tokenizer(tf.Module):
def __init__(self):
super().__init__()
self._opening_tokens = tf.constant(["(", "[", "{", "<"])
self._closing_tokens = tf.constant([")", "]", "}", ">"])
self._syntax_score_table = tf.lookup.StaticHashTable(
tf.lookup.KeyValueTensorInitializer(
self._closing_tokens,
tf.constant([3, 57, 1197, 25137], tf.int64),
),
default_value=tf.constant(-1, tf.int64),
)
self._open_close = tf.lookup.StaticHashTable(
tf.lookup.KeyValueTensorInitializer(
self._opening_tokens,
self._closing_tokens,
),
default_value="",
)
self._close_open = tf.lookup.StaticHashTable(
tf.lookup.KeyValueTensorInitializer(
self._closing_tokens,
self._opening_tokens,
),
default_value="",
)
self._pos = tf.Variable(0, dtype=tf.int64)
self._corrupted_score = tf.Variable(0, dtype=tf.int64)
```

TensorFlow offers us a constant hashtable that perfectly fits our needs: `tf.lookup.StaticHashTable`

. All the data is constant and we know this mapping in advance.

Since we are writing a TensorFlow program, the `tf.Variable`

objects must be defined outside the `@tf.function`

-decorated methods. In this case, we defined the `_corrupted_score`

that will hold the final score and the `_pos`

score that will use to index a `tf.TensorArray`

we use a **stack**.

The detection of the corrupted lines and the computation of the score is precisely what we described in the previous design phase.

```
@tf.function
def corrupted(self, dataset):
for line in dataset:
stack = tf.TensorArray(tf.string, size=0, dynamic_size=True)
self._pos.assign(0)
for position in tf.range(tf.size(line)):
current_token = line[position]
if tf.reduce_any(tf.equal(current_token, self._opening_tokens)):
stack = stack.write(tf.cast(self._pos, tf.int32), current_token)
self._pos.assign_add(1)
else:
expected_token = self._open_close.lookup(
stack.read(tf.cast(self._pos - 1, tf.int32))
)
self._pos.assign_sub(1)
if tf.not_equal(current_token, expected_token):
tf.print(
position,
": expected: ",
expected_token,
" but found ",
current_token,
" instead",
)
self._corrupted_score.assign_add(
self._syntax_score_table.lookup(current_token)
)
break
return self._corrupted_score
```

Every line parsing requires its own stack, that’s why the `tf.TensorArray`

stack is defined inside the loop.

### Execution

Here we go!

```
tokenier = Tokenizer()
tf.print("Part one: ", tokenier.corrupted(dataset))
```

Part one is easily solved. Let’s see what part two is about.

## Design phase: part 2

This second part asks us to discard the corrupted lines and focus on the **incomplete** lines. The puzzle wants us to implement an autocomplete system for incomplete (but not corrupted) lines.

For example, given the line `[({(<(())[]>[[{[]{<()<>>`

our autocomplete system should be able to generate the correct sequence of closing characters: `}}]])})]`

.

The puzzle goal is to compute a score following this rule:

Start with a total score of 0. Then, for each character, multiply the total score by 5 and then increase the total score by the point value given for the character in the following table:

Char | Points |
---|---|

`)` |
1 |

`]` |
2 |

`}` |
3 |

`>` |
4 |

The puzzle asks us to find “the winner”. The winner is found by sorting all of the scores and then taking the middle score.

### Implementing the autocomplete

We know from the previous design phase how to detect **incomplete** lines (when the stack is not empty) and inside the stack, we have **in reversed order** the expected closing characters. Implementing the required algorithm is straightforward.

We first need to add a new lookup table that associates the points with the closing tokens

```
self._autocomplete_score_table = tf.lookup.StaticHashTable(
tf.lookup.KeyValueTensorInitializer(
self._closing_tokens,
tf.constant([1, 2, 3, 4], tf.int64),
),
default_value=tf.constant(-1, tf.int64),
)
```

then, we can implement the `incomplete`

method by replicating the very same code used for the `corrupted`

method and extending it a bit. Since we need to find “the winner” we need another `tf.TensorArray`

for storing every autocomplete score computed.

```
@tf.function
def incomplete(self, dataset):
scores = tf.TensorArray(tf.int64, size=0, dynamic_size=True)
for line in dataset:
# identical loop code of `corrupted`
# [ omitted ]
# visible here
# https://github.com/galeone/tf-aoc/blob/main/10/main.py#L87
if tf.not_equal(self._pos, 0): # stack not completely unrolled
unstacked = tf.squeeze(
tf.reverse(
tf.expand_dims(stack.stack()[: self._pos], axis=0), axis=[1]
)
)
closing = self._open_close.lookup(unstacked)
tf.print("Unstacked missing part: ", closing, summarize=-1)
# Use pos variable as line score
self._pos.assign(0)
for idx in tf.range(tf.shape(closing)[0]):
char = closing[idx]
self._pos.assign(self._pos * 5)
self._pos.assign_add(self._autocomplete_score_table.lookup(char))
scores = scores.write(scores.size(), self._pos)
# sort the scores
scores_tensors = tf.sort(scores.stack())
# tf.print(scores_tensors)
return scores_tensors[(tf.shape(scores_tensors)[0] - 1) // 2]
```

Here we go! Challenge 10 is solved in pure TensorFlow just using a pair of stacks and some static lookup table :)

## Conclusion

You can see the complete solution in folder `10`

on the dedicated Github repository: https://github.com/galeone/tf-aoc.

Solving this problem has been straightforward and TensorFlow has proved enough flexibility for being used to solve all the problems faced so far, without the need of any external library. Will we find a problem impossible to solve in pure TensorFlow? Who knows! I’m currently solving problem 13 and TensorFlow is still showing to be flexible enough for solving it.

If you missed the articles about the previous days’ solutions, here’s a handy list:

The next article will be about my solution to Day 11 problem. That problem shares some similarities with the Day 9 solution - I’ll re-use some computer vision concepts like the pixel neighborhood since the problem is again organized as a grid of numbers with some relations among each other.

For any feedback or comment, please use the Disqus form below - thanks!