Analyzing tf.function to discover AutoGraph strengths and subtleties - part 3


In part 1 we learned how to convert a TensorFlow 1.x code to its eager version, the eager version to its graph representation, and faced the problems that arise when working with functions that create a state.

In part 2 we learned that tf.function creates a new graph for every different input value if the input is not a tf.Tensor object but a Python native type and how this could slow down (or speed up if correctly used) the execution. Moreover, the differences between the tf.autograph generated source code and what happens, instead, when using AutoGraph trough tf.function have been highlighted.

In this third and last part, we analyze what happens when tf.function is used to convert a function that contains “complex” Python constructs in its body. Should we design functions thinking about how they are going to be converted?

AutoGraph capabilities and limitations

In the TensorFlow repository, in the python/autograph folder, we can find a document that explains which are the capabilities and the limitations of the AutoGraph module together with a list of the Python constructs it is able to convert.

The table in the section “Python Language Support Status” contains all the Python constructs that AutoGraph explicitly supports, plan to support, or won’t support. Among them, we can find the widely used while, for, if statements, the Python built-in print, len, range, and the iterator construct.

In the next sections, various Python functions that use these Python constructs are analyzed, to understand if the function body gets converted as we expect or if it is required to design the functions thinking about the graph conversion.

if … else

Here’s the function we are going to analyze:

@tf.function
def if_else(a, b):
  if a > b:
    tf.print("a > b", a, b)
  else:
    tf.print("a <= b", a, b)

It’s trivial: when a is greater than b then it prints a > b followed by the value of a and b; otherwise it prints a <= b and their value.

Step 1: graph conversion

As seen in the previous articles, the tf.autograph package can be used to inspect the result of the graph conversion.

print(tf.autograph.to_code(if_else.python_function))

The generated code is:

def tf__if_else(a, b):
    cond = a > b

    def get_state():
        return ()

    def set_state(_):
        pass

    def if_true():
        ag__.converted_call(
            "print",
            tf,
            ag__.ConversionOptions(
                recursive=True,
                force_conversion=False,
                optional_features=(),
                internal_convert_user_code=True,
            ),
            ("a > b", a, b),
            None,
        )
        return ag__.match_staging_level(1, cond)

    def if_false():
        ag__.converted_call(
            "print",
            tf,
            ag__.ConversionOptions(
                recursive=True,
                force_conversion=False,
                optional_features=(),
                internal_convert_user_code=True,
            ),
            ("a <= b", a, b),
            None,
        )
        return ag__.match_staging_level(1, cond)

    ag__.if_stmt(cond, if_true, if_false, get_state, set_state)

The conversion is trivial too: the if_stmt maps, more or less, with the tf.cond function; the first parameter is the condition to check, the second is the branch to take when the condition is True, the third the branch to take otherwise. The get_state and set_state methods basically do nothing and we can safely ignore them.

Step 2: execution

As seen in part 2 tf.function by design does not do the boxing of the Python native types; therefore we use a tf.Tensor produced by a tf.constant operation as input.

x = tf.constant(1)
if_else(x, x)

As expected, the output is: a <= b 1 1.

if … elif … else

Let’s change the function a little bit, adding an elif statement. The function now is:

@tf.function
def if_elif(a, b):
  if a > b:
    tf.print("a > b", a, b)
  elif a == b:
    tf.print("a == b", a, b)
  else:
    tf.print("a < b", a, b)

Step 1: graph conversion

The generated function, with the removed tf.print conversion and (get|set)\_state function definitions, is

def tf__if_elif(a, b):
    cond_1 = a > b

    def if_true_1():
        # tf.print("a > b", a, b)
        return ag__.match_staging_level(1, cond_1)

    def if_false_1():
        cond = a == b

        def if_true():
            # tf.print(a == b, a, b)
            return ag__.match_staging_level(1, cond)

        def if_false():
            # tf.print(a < b, a,b)
            return ag__.match_staging_level(1, cond)

        ag__.if_stmt(cond, if_true, if_false, get_state, set_state)
        return ag__.match_staging_level(1, cond_1)

    ag__.if_stmt(cond_1, if_true_1, if_false_1, get_state_1, set_state_1)

The conversion seems correct: two tf.cond nested. The inner tf.cond is defined inside the false branch of the outer one. The outer tf.cond checks if a > b, and if it is True then it prints a > b, otherwise executes the if_false_1 branch that contains the inner tf.cond.

The inner tf.cond has the equality condition cond = a == b to verify; if it holds, it prints a == b, otherwise it prints a < b.

Step 2: execution

x = tf.constant(1)
if_elif(x, x)

Executing it, we expect to see a == b, 1, 1 since this is the truth. However, the output is a < b 1 1. WHAT?

OK then, debug time.


Update (14 Sept 2019): as Raphael Meudec pointed out in the tweet below, this behavior has been changed in TensorFlow 2.0-rc0 and it works as expected. However, the lessons presented later in the article are still valid: following them helps you writing idiomatic TensorFlow 2.0 code.


Step 3: debugging

The AutoGraph representation looks correct. Moreover, we can try by using the non-converted function to see if everything goes as expected in eager mode.

x = tf.constant(1)
if_elif.python_function(x, x)

In eager mode the output is correct: a == b 1 1. So we do expect to see the same output when we feed the function with two tf.Tensor objects that hold the same value

x, y = tf.constant(1), tf.constant(1)
if_elif.python_function(x, y)

Surprise! The output is a < b 1 1. What’s going on?

Lesson 1: not all operators are created equal

This lesson is not about AutoGraph or tf.function but is about tf.Tensor.

This “weird” behavior that also happens when the eager mode is enabled is due to the different way the __eq__ operator for the tf.Tensor objects have been overridden.

There is a question on StackOverflow and a related Github issue about this. In short: the __eq__ operator has been overridden, but the operator does not use tf.equal to check for the Tensor equality, it just checks for the Python variable identity (if you are familiar with the Java programming language, this is precisely like the == operator used on string objects). The reason is that the tf.Tensor object needs to be hashable since it is used everywhere in the TensorFlow codebase as key for dict objects.

OK then, to solve it is required to do not rely upon the __eq__ operator but use tf.equal to check if the equality holds.

However, something should still sound strange: why when invoking the graph-converted function, passing the same tf.Tensor x, the execution produces a < b 1 1 instead of a == b 1 1 as it happens in eager execution?

Lesson 2: how AutoGraph (don’t) converts the operators

So far we supposed that AutoGraph is able to translate not only the if, elif, and else statements to the graph equivalent, but also the Python built-in operators like __eq__, __gt__, and __lt__. In practice, this conversion (still?) does not happen at all.

In the previously converted graph-code, the two condititions are expressed as a > b and a == b and not as function calls to AutoGraph converted functions (ag__.converted_call(...)).

In practice, what happens is that the cond is always False. We can verify this assertion by adding an additional elif to the previous function and calling it again.

@tf.function
def if_elif(a, b):
  if a > b:
    tf.print("a > b", a, b)
  elif a == b:
    tf.print("a == b", a, b)
  elif a < b:
    tf.print("a < b", a, b)
  else:
    tf.print("wat")
x = tf.constant(1)
if_elif(x,x)

Output: wat.

Hurray?

Lesson 3: how to write a function

To have the very same behavior in both eager and graph execution we have to know that:

  1. The semantic of the operations matters.
  2. There are operators that have been overridden following a different semantic (respect to the most natural one, common in Python).
  3. AutoGraph converts Python statements naturally (if, elif, …) but it requires some extra care when designing a function that is going to be tf.function decorated.

In practice, and this is the most important lesson, use the TensorFlow operators explicitly everywhere (in the end, the Graph is still present, and we are building it!).

Thus, we can write the correctly eager and graph-convertible function by using the correct tf. methods.

@tf.function
def if_elif(a, b):
  if tf.math.greater(a, b):
    tf.print("a > b", a, b)
  elif tf.math.equal(a, b):
    tf.print("a == b", a, b)
  elif tf.math.less(a, b):
    tf.print("a < b", a, b)
  else:
    tf.print("wat")

The generated graph code now looks like (removed long parts for clarity):

def tf__if_elif(a, b):
    cond_2 = ag__.converted_call("greater", ...)  # a > b

    def if_true_2():
        ag__.converted_call("print", ...)  # tf.print a > b
        return ag__.match_staging_level(1, cond_2)

    def if_false_2():
        cond_1 = ag__.converted_call("equal", ...)  # tf.math.equal

        def if_true_1():
            ag__.converted_call("print", ...)  # tf.print a == b
            return ag__.match_staging_level(1, cond_1)

        def if_false_1():
            cond = ag__.converted_call("less", ...)  # a < b

            def if_true():
                ag__.converted_call("print", ...)  # tf.print a < b
                return ag__.match_staging_level(1, cond)

            def if_false():
                ag__.converted_call("print", ...)  # tf.print wat
                return ag__.match_staging_level(1, cond)

            ag__.if_stmt(cond, if_true, if_false, get_state, set_state)
            return ag__.match_staging_level(1, cond_1)

        ag__.if_stmt(cond_1, if_true_1, if_false_1, get_state_1, set_state_1)
        return ag__.match_staging_level(1, cond_2)

    ag__.if_stmt(cond_2, if_true_2, if_false_2, get_state_2, set_state_2)

Now that every single part of the function has been converted (note the ag__converted_call everywhere) the function works as we want, also when it is converted to its graph representation.

for … in range

Following the previous 3 lessons, writing a function that uses a for loop is trivial. To be entirely sure that the code is correctly graph-converted, we can design the function by using the tensorflow tf. methods to help the conversion. So, for a simple function that sums the number from 1 to X-1 the correct way of designing it is to use:

  1. An external tf.Variable since the function creates a state and from part 1 we know how to deal with it.
  2. Use tf.range instead of range since tf.range exists and therefore it is just better to use it.
x = tf.Variable(1)
@tf.function
def test_for(upto):
  for i in range(upto):
    x.assign_add(i)

x.assign(tf.constant(0))
test_for(tf.constant(5))
print("x value: ", x.numpy())

The value of the x variable is 10, as expected.

The reader is invited to convert the function to its graph representation and check if every statement has been correctly converted.

Question (please feel free to answer in the comment section!): what happens if the line x.assign_add(1) is replaced by x = x + i?

Conclusions

Writing functions that work correctly in both eager mode and their graph-converted representation requires to know some subtleties that in this three-part particle have been highlighted. To summarize them:

  • Functions that create a state need a dedicated design since in eager mode they just work while when converted the stateful objects can create problems. (part 1)
  • AutoGraph does not perform the boxing of the Python native type, and this can slow down the execution a lot (part 2); use tf.Tensor whenever possible!
  • tf.print and print are different objects; there is a clear distinction between the first call (AutoGraph + function execution + tracing) and any other call of the graph-converted function (part 2).
  • The operator overloading of tf.Tensor has its own peculiarities. In order to be 100% confident of your function design, and making it also work when it is graph-converted, I highly recommend to use the TensorFlow operators explicitly (call tf.equal(a,b) instead of a == b and so on).

Announcement

The article is finished, but I hope to say something pleasing by announcing that I’m authoring my first book about TensorFlow 2.0 and Neural Networks!

Hands-On Neural Networks with TensorFlow 2.0

Understand TensorFlow, from static graph to eager execution, and design neural networks

The book is divided into two parts: the first part is more theoretical and is about machine learning and neural networks, with a focus on the intuitive idea behind the presented concepts. The second part, that’s the main topic of the book, is about the TensorFlow architecture (from 1.x to 2.0) followed by the implementation of several neural-networks-based solutions to challenging machine learning problems, all using TensorFlow 2.0.

If you want to receive an email when the book is out and also stay up-to-date with the latest articles, leave your email in the form below!

Don't you want to miss the next article? Do you want to be kept updated?
Subscribe to the newsletter!

Related Posts

Analyzing tf.function to discover AutoGraph strengths and subtleties - part 2

In part 1 we learned how to convert a 1.x code to its eager version, the eager version to its graph representation and faced the problems that arise when working with functions that create a state. In this second part, we’ll analyze what happens when instead of a tf.Variable we pass a tf.Tensor or a Python native type as input to a tf.function decorated function. Are we sure everything is going to be converted to the Graph representation we expect?

Analyzing tf.function to discover AutoGraph strengths and subtleties - part 1

AutoGraph is one of the most exciting new features of Tensorflow 2.0: it allows transforming a subset of Python syntax into its portable, high-performance and language agnostic graph representation bridging the gap between Tensorflow 1.x and the 2.0 release based on eager execution. As often happens all that glitters is not gold: although powerful, AutoGraph hides some subtlety that is worth knowing; this article will guide you through them using an error-driven approach.

Tensorflow 2.0: Keras is not (yet) a simplified interface to Tensorflow

In Tensorflow 2.0 Keras will be the default high-level API for building and training machine learning models, hence complete compatibility between a model defined using the old tf.layers and the new tf.keras.layers is expected. In version 2 of the popular machine learning framework the eager execution will be enabled by default although the static graph definition + session execution will be still supported. In this post, you'll see that the compatibility between a model defined using tf.layers and tf.keras.layers is not always guaranteed.

Fixed camera setup for object localization and measurement

A common task in Computer Vision is to use a camera for localize and measure certain objects in the scene. In the industry is common to use images of objects on a high contrast background and use Computer Vision algorithms to extract useful information. There's a lot of literature about the computer vision algorithm that we can use to extract the information, but something that's usually neglected is how to correctly setup the camera in order to correctly address the problem. This post aim is to shed light on this subject.

Tensorflow 2.0: models migration and new design

Tensorflow 2.0 will be a major milestone for the most popular machine learning framework: lots of changes are coming, and all with the aim of making ML accessible to everyone. These changes, however, require for the old users to completely re-learn how to use the framework: this article describes all the (known) differences between the 1.x and 2.x version, focusing on the change of mindset required and highlighting the pros and cons of the new implementation.

Understanding Tensorflow's tensors shape: static and dynamic

Describing computational graphs is just a matter connecting nodes correctly. Connecting nodes seems a trivial operation, but it hides some difficulties related to the shape of tensors. This article will guide you through the concept of tensor's shape in both its variants: static and dynamic.

Camera calibration guidelines

The process of geometric camera calibration (camera resectioning) is a fundamental step for machine vision and robotics applications. Unfortunately, the result of the calibration process can vary a lot depending on various factors. There are a lot of empirical guidelines that have to be followed in order to achieve good results: this post will drive you through them.

Ethereum on Raspberry Pi: secure wallet and complete node with redundant storage

Ethereum is a relatively new player in the crypto-currencies ecosystem. If you are a researcher, an algorithmic trader or an investor, you could want to run an ethereum node to study, develop and store your ETH while contributing to the network good.

Understanding Tensorflow using Go

Tensorflow is not a Machine Learning specific library, instead, is a general purpose computation library that represents computations with graphs. Its core is implemented in C++ and there are also bindings for different languages. The bindings for the Go programming language, differently from the Python ones, are a useful tool not only for using Tensorflow in Go but also for understanding how Tensorflow is implemented under the hood.

Analysis of Dropout

Overfitting is a problem in Deep Neural Networks (DNN): the model learns to classify only the training set, adapting itself to the training examples instead of learning decision boundaries capable of classifying generic instances. Many solutions to the overfitting problem have been presented during these years; one of them have overwhelmed the others due to its simplicity and its empirical good results: Dropout.