Tensorflow 2.0: models migration and new design

Tensorflow 2.0 will be a major milestone for the most popular machine learning framework: lots of changes are coming, and all with the aim of making ML accessible to everyone. These changes, however, require for the old users to completely re-learn how to use the framework: this article describes all the (known) differences between the 1.x and 2.x version, focusing on the change of mindset required and highlighting the pros and cons of the new and implementations.

This article can be a good starting point also for the novice: start thinking in the Tensorflow 2.0 way right now, so you don’t have to re-learn a new framework (unless until Tensorflow 3.0 will be released).

Tensorflow 2.0: why and when?

The idea is to make Tensorflow easier to learn and apply.

The first glimpse on what Tensorlow 2.0 will be has been given by Martin Wicke, one of the Google Brain Engineers, in the Announcements Mailing List, here. In short:

  • Eager execution will be a central feature of 2.0. It aligns users’ expectations about the programming model better with TensorFlow practice and should make TensorFlow easier to learn and apply.
  • Support for more platforms and languages, and improved compatibility and parity between these components via standardization on exchange formats and alignment of APIs.
  • Remove deprecated APIs and reduce the amount of duplication, which has caused confusion for users.
  • Public 2.0 design process: the community can now work together with the Tensorflow developers and discuss the new features, using the Tensorflow Discussion Group
  • Compatibility and continuity: a compatibility module with Tensorflow 1.x will be offered, this means that Tensorflow 2.0 will have a module with all the Tensorflow 1.x API inside
  • On-disk compatibility: the exported models (checkpoints and frozen models) in Tensorflow 1.x will be compatible with Tensorflow 2.0, only some variable rename could be required
  • tf.contrib: completely removed. Huge, maintained, modules will be moved to separate repositories; unused and unmaintained modules will be removed.

In practice, if you’re new to Tensorflow, you’re lucky. If like me, you’re using Tensorflow from the 0.x release, you have to rewrite all your codebase (and differently from 0.x to 1.x transition, the changes are massive); however, Tensorflow authors claim that a conversion tool will be released to help the transition. However, conversion tools are not perfect hence manual intervention could be required.

Moreover, you have to change your way of thinking; this can be challenging, but everyone likes challenges, isn’t it?

Let’s face this challenge and start looking at the changes in detail, starting from the first huge difference: the removal of tf.get_variable, tf.variable_scope, tf.layers and the mandatory transition to a Keras based approach, using tf.keras.

Just a note on the release date: it is not defined yet. But from the Tensorflow discussion group, we know that a preview could be released by the end of 2018 and the official release of 2.0 could be in Spring 2019.

Hence is better to update all the existing codebase as soon as the RFCs are accepted in order to have a smooth transition to this new Tensorflow version.

Keras (OOP) vs Tensorflow 1.x

The RFC: Variables in TensorFlow 2.0 has been accepted. This RFC is probably the one with the biggest impact on the existing codebase and requires a new way of thinking for the old Tensorflow users.

As described in the article Understanding Tensorflow using Go every variable has a unique name in the computational graph.

As an early Tensorflow user, I’m used to designing my computational graphs following this pattern:

  1. Which operations connect my variable nodes? Define the graph as multiple sub-graphs connected. Define every sub-graph inside a separate tf.variable_scope in order to define the variables of different graphs, inside different scopes and obtain a clear graph representation in Tensorboard.
  2. Do I have to use a sub-graph more than once in the same execution step? Be sure to exploit the reuse parameter of tf.variable_scope in order to avoid the creation of a new graph, prefixed with _n.
  3. The graph has been defined? Create the variable initialization op (how many times have you seen the tf.global_variables_initializer() call?)
  4. Load the graph into a Session and run it.

The example that better shows the reasoning steps, IMHO, is how a simple GAN can be implemented in Tensorflow.

A GAN to understand Tensorflow 1.x

The GAN discriminator \(D\) must be defined using the tf.variable_scope, reuse) parameter, because first we want to feed \(D\) with real samples, then we want to feed it again with fake samples and only at the end compute the gradient of \(D\) w.r.t. its parameters.

The generator network \(G\), instead, is never used twice in the same iteration, hence there’s no need to worry about its variables reusing.

def generator(inputs):
    """generator network.
        inputs: a (None, latent_space_size) tf.float32 tensor
        G: the generator output node
    with tf.variable_scope("generator"):
        fc1 = tf.layers.dense(inputs, units=64, activation=tf.nn.elu, name="fc1")
        fc2 = tf.layers.dense(fc1, units=64, activation=tf.nn.elu, name="fc2")
        G = tf.layers.dense(fc2, units=1, name="G")
    return G

def discriminator(inputs, reuse=False):
    """discriminator network
        inputs: a (None, 1) tf.float32 tensor
        reuse: python boolean, if we expect to reuse (True) or declare (False) the variables
        D: the discriminator output node
    with tf.variable_scope("discriminator", reuse=reuse):
        fc1 = tf.layers.dense(inputs, units=32, activation=tf.nn.elu, name="fc1")
        D = tf.layers.dense(fc1, units=1, name="D")
    return D

This two functions, when called, define inside the default graph 2 different sub-graphs, each one with its own scope (“generator” or “discriminator”). Please note that this function returns the output tensor of the defined sub-graph, not the graph itself.

In order to share the same \(D\) graph, we define 2 inputs (real and fake) and define the loss functions required to train \(G\) and \(D\).

# Define the real input, a batch of values sampled from the real data
real_input = tf.placeholder(tf.float32, shape=(None,1))
# Define the discriminator network and its parameters
D_real = discriminator(real_input)

# Arbitrary size of the noise prior vector
latent_space_size = 100
# Define the input noise shape and define the generator
input_noise = tf.placeholder(tf.float32, shape=(None,latent_space_size))
G = generator(input_noise)

# now that we have defined the generator output G, we can give it in input to 
# D, this call of `discriminator` will not define a new graph, but it will
# **reuse** the variables previously defined
D_fake = discriminator(G, True)

The last thing to do is to just define the 2 loss functions and the 2 optimizers required to train \(D\) and \(G\) respectively.

D_loss_real = tf.reduce_mean(
    tf.nn.sigmoid_cross_entropy_with_logits(logits=D_real, labels=tf.ones_like(D_real))

D_loss_fake = tf.reduce_mean(
    tf.nn.sigmoid_cross_entropy_with_logits(logits=D_fake, labels=tf.zeros_like(D_fake))

# D_loss, when invoked it first does a forward pass using the D_loss_real
# then another forward pass using D_loss_fake, sharing the same D parameters.
D_loss = D_loss_real + D_loss_fake

G_loss = tf.reduce_mean(
    tf.nn.sigmoid_cross_entropy_with_logits(logits=D_fake, labels=tf.ones_like(D_fake))

The loss functions are easily defined. The peculiarity of the adversarial training is that first \(D\) must be trained, using the real samples and the samples generated by \(G\). Then, the adversarial, \(G\), is trained using the result of the \(D\) evaluation as the input signal.

The adversarial training requires to run separately this 2 training steps, but we have defined the models inside the same graph and we don’t want to update the \(G\) variables when we train \(D\) and vice-versa.

Thus, since we defined every variable inside the default graph, hence every variable is global, we have to gather the correct variables in 2 different lists and be sure to define the optimizers in order to compute the gradients and apply the updates only to the correct sub-graphs.

# Gather D and G variables
D_vars = tf.trainable_variables(scope="discriminator")
G_vars = tf.trainable_variables(scope="generator")

# Define the optimizers and the train operations
train_D = tf.train.AdamOptimizer(1e-5).minimize(D_loss, var_list=D_vars)
train_G = tf.train.AdamOptimizer(1e-5).minimize(G_loss, var_list=G_vars)

Here we go, we’re at step 3, graph defined so the last thing to do is to define the variables initialization op:

init_op = tf.global_variables_initializer()

Pros / Cons

The graph has been correctly defined and, when used inside the training loop and within a session, it works. However, from the software engineering point of view, there are certain peculiarities that are worth noting:

  1. The usage of tf.variable_scope context manager to change the (full) name of the variables defined by tf.layers: the same call to a tf.layers.* method in a different variable scope defines a new set of variables under a new scope.
  2. The boolean flag reuse can completely change the behavior of any call to a tf.layers.* method (define or reuse)
  3. Every variable is global: the variables defined by tf.layers calling tf.get_variable (that’s used inside tf.layers) are accessible from everywhere: tf.trainable_variables(prefix) used above to gather the 2 lists of variables perfectly describes this.
  4. Defining sub-graphs is not easy: you just can’t call discriminator and get a new, completely independent, discriminator. Is a little bit counterintuitive.
  5. The return value of a sub-graph definition (call to generator/discriminator) is only its output tensor and not something with all the graph information inside (although is possible to backtrack to the input, but it’s not that easy)
  6. Defining the variables initialization op is just boring (but this is just been resolved using tf.train.MonitoredSession and tf.train.MonitoredTrainingSession. hint: use them.)

Those 6 points are probably all cons.

We defined our GAN in the Tensorflow 1.x way: let’s start the migration to Tensorflow 2.0

A GAN to understand Tensorflow 2.x

As stated in the previous section, in Tensorflow 2.x, the way of thinking changes. The removal of tf.get_variable, tf.variable_scope, tf.layers and the mandatory transition to a Keras based approach, using tf.keras forces the Tensorflow developer to change its mindset.

We have to define the generator \(G\) and discriminator \(D\) using tf.keras: this will give us for free the variable sharing feature that we used to define \(D\), but implemented differently under the hood.

Please note: tf.layers will be removed, hence starting to use tf.keras right now to define your models is mandatory in order to be ready to 2.x.

def generator(input_shape):
    """generator network.
        input_shape: the desired input shape (e.g.: (latent_space_size))
        G: The generator model
    inputs = tf.keras.layers.Input(input_shape)
    net = tf.keras.layers.Dense(units=64, activation=tf.nn.elu, name="fc1")(inputs)
    net = tf.keras.layers.Dense(units=64, activation=tf.nn.elu, name="fc2")(net)
    net = tf.keras.layers.Dense(units=1, name="G")(net)
    G = tf.keras.Model(inputs=inputs, outputs=net)
    return G

def discriminator(input_shape):
    """discriminator network.
        input_shape: the desired input shape (e.g.: (latent_space_size))
        D: the discriminator model
    inputs = tf.keras.layers.Input(input_shape)
    net = tf.keras.layers.Dense(units=32, activation=tf.nn.elu, name="fc1")(inputs)
    net = tf.keras.layers.Dense(units=1, name="D")(net)
    D = tf.keras.Model(inputs=inputs, outputs=net)
    return D

Look at the different approach: both generator and discriminator returns a tf.keras.Model and not just an output tensor.

This means that using Keras we can instantiate our model and use the same model in different parts of the source code and we effectively use the variables of that model, without the problem of defining a new sub-graph prefixed with _n. In fact, differently from the 1.x version, we’re going to define just one \(D\) model and use it twice.

# Define the real input, a batch of values sampled from the real data 
real_input = tf.placeholder(tf.float32, shape=(None,1))

# Define the discriminator model
D = discriminator(real_input.shape[1:])

# Arbitrary set the shape of the noise prior vector
latent_space_size = 100
# Define the input noise shape and define the generator
input_noise = tf.placeholder(tf.float32, shape=(None,latent_space_size))
G = generator(input_noise.shape[1:])

Again: there’s no need to define D_fake as we did above, and there’s no need to think ahead when defining the graphs and worry about the variable sharing.

Now we can go on and define the \(G\) and \(D\) loss functions:

D_real = D(real_input)
D_loss_real = tf.reduce_mean(
    tf.nn.sigmoid_cross_entropy_with_logits(logits=D_real, labels=tf.ones_like(D_real))

G_z = G(input_noise)

D_fake = D(G_z)
D_loss_fake = tf.reduce_mean(
    tf.nn.sigmoid_cross_entropy_with_logits(logits=D_fake, labels=tf.zeros_like(D_fake))

D_loss = D_loss_real + D_loss_fake

G_loss = tf.reduce_mean(
    tf.nn.sigmoid_cross_entropy_with_logits(logits=D_fake, labels=tf.ones_like(D_fake))

So far so good. The last thing to do is defining the 2 optimizers that will optimize \(D\) and \(G\) separately. Since we’re using tf.keras there’s no need to manually create the list of the variables to update, since are the tf.keras.Models objects themselves that are carrying this attribute:

# Define the optimizers and the train operations
train_D = tf.train.AdamOptimizer(1e-5).minimize(D_loss, var_list=D.trainable_variables)
train_G = tf.train.AdamOptimizer(1e-5).minimize(G_loss, var_list=G.trainable_variables)

We’re ready to go: we reached step 3 and since we’re still working using the static graph mode, we have to define the variables initialization op:

init_op = tf.global_variables_initializer()


  • Transitioning from tf.layers to tf.keras it easy: all tf.layers methods have their own tf.keras.layers counterpart
  • tf.keras.Model completely removes to worry about variables reusing, issues on graph redefinition
  • tf.keras.Model is not an output tensor, but is a complete model that carries its own variables
  • We still have to initialize all variables, but as said before tf.train.MonitoredSession can do it for us

The GAN example, in both Tensorflow 1.x and 2.x, has been developed using the “old” paradigm of graph definition first, execution in a session next (that is and will be a good and valid paradigm to follow and - personal opinion - is the best one).

However, another big change in Tensorflow 2.x is to make the eager mode the default execution mode. In Tensorflow 1.x we have to explicitly enable the eager execution, while in Tensorflow 1.x we’ll have to do the opposite.

Eager mode first

As stated in the Eager execution guide:

TensorFlow’s eager execution is an imperative programming environment that evaluates operations immediately, without building graphs: operations return concrete values instead of constructing a computational graph to run later. This makes it easy to get started with TensorFlow and debug models, and it reduces boilerplate as well. To follow along with this guide, run the code samples below in an interactive python interpreter.

Eager execution is a flexible machine learning platform for research and experimentation, providing:

  • An intuitive interface—Structure your code naturally and use Python data structures. Quickly iterate on small models and small data.
  • Easier debugging—Call ops directly to inspect running models and test changes. Use standard Python debugging tools for immediate error reporting.
  • Natural control flow—Use Python control flow instead of graph control flow, simplifying the specification of dynamic models.

In short: there’s no need to define the graph first and then evaluate it inside a session. Using Tensorflow in eager mode allow to mix the definition and execution, exactly as a standard python program.

There’s no a 1:1 match with the static graph version, since things that are natural in a graph are not in a imperative environment.

The most important example here is the tf.GradientTape context manager that only exists in eager mode.

When we have a graph, we do know how nodes are connected and when we have to compute the gradient of a certain function we can backtrack from the output to the input of the graph, compute the gradient and get the result.

In eager mode we can’t. The only way to compute the gradient of a function using automatic differentiation is to build a graph. The graph of the operations executed within the tf.GradientTape context manager on some watchable element (like variables) is built and then we can ask the tape to compute the gradient we need.

On the tf.GradientTape documentation page we can find an example that clearly explains how and why tapes are needed:

x = tf.constant(3.0)
with tf.GradientTape() as g:
  y = x * x
dy_dx = g.gradient(y, x) # Will compute to 6.0

Also, the control flow operations are just the python control flow operations (like for loops, if statements, …) differently from the tf.while_loop, tf.map_fn, tf.cond methods that we have to use in the static-graph version.

There’s a tool, called Autograph that helps you write complicated graph code using normal Python. Behind the scenes, AutoGraph automatically transforms your code into the equivalent TensorFlow graph code.

However, the python code you need to write is not exactly pure python (for instance, you have to declare that a function return a list of elements with a specified Tensorflow data type, using operations that you won’t use in a standard python function) and its capabilities, at least at the time of writing are limited.

This tool has been created because the graph version has the great advantage of being “a single file” once exported, and therefore shipping trained machine learning models in a production environment is way more easier using the static-graph mode. Also, the static-graph mode is faster.

Personally, I don’t like eager mode that much. Probably because I’m used to the static graph version and I found the eager mode a coarse imitation of PyTorch. Also, trying to implement a GAN from a PyTorch implementation to a Tensorflow 2.x version, using both static graph and eager mode version, I wasn’t able to get the eager one working and I still don’t know why (while the static graph implementation works perfectly). I opened a bug report on GitHub (but the error could be mine of course): Tensorflow eager version fails, while Tensorflow static graph works.

The transition to Tensorflow 2.x carries other changes that I tried to summarize in the next what if section.

What if?

The following is a list of what I think will be the F.A.Q. about the transition to Tensorflow 2.x.

What if my project uses tf.contrib?

All the information about the fate of the projects inside tf.contrib can be found here: Sunsetting tf.contrib.

Probably you just have to pip install a new python package or rename your tf.contrib.something to tf.something.

What if a project working in Tensorflow 1.x stops working in 2.x?

This shouldn’t happen: please double check that the transition has been correctly implemented and if it is, open a bug report on GitHub.

What if a project works in static graph mode but it doesn’t in eager mode?

That’s a problem I’m currently facing, as I reported here: Tensorflow eager version fails, while Tensorflow static graph works.

Right now I don’t know if this is a bug from my side or there’s something wrong in the actual Tensorflow eager version. However, since I’m used to thinking in a static graph oriented way, I’ll just avoid using the eager version.

What if a method from tf. disappeared in 2.x?

There’s a high chance the method has only been moved. In Tensorflow 1.x there are a lot of aliases for a lot of methods, in Tensorflow 2.x instead, there’s the aim (if the RFC: TensorFlow Namespaces will be accepted - as I wish) of removing a lot of these aliases and move methods to a better location, in order to increase the overall coherence.

In the RFC you can find the newly proposed namespaces, the list of the one that will be removed and all the other changes that (probably) will be made to increase the coherence of the framework.

Also, the conversion tool that will be released will be probably able to correctly apply all these updates for you (this is just my speculation on the conversion tool, but since it’s an easy task that’s probably a feature that will be present).


This article has been created with the specific aim of shed a light on the changes and the challenges that Tensorflow 2.0 will bring to us, the framework users.

The GAN implementation in Tensorflow 1.x and its conversion in Tensorflow 2.x should be a clear example of the mindset change required to work with the new version.

Overall I think Tensorflow 2.x will improve the quality of the framework and it will standardize and simplifies how to use it. New users that never seen a static-graph approach and are used to work with imperative languages could find the eager mode a good entry point to the Tensorflow world.

However, there are certain parts of the update that I don’t like (please not that those are just my personal opinions):

  • The focus on the eager execution and make it the default: it looks too much a marketing move to me. It looks like Tensorflow wants to chase PyTorch (eager by default)
  • The missing 1:1 compatibility with static-graph and eager (and the possibility of mixing them) could create a mess in big projects IMHO: it would be hard to maintain this projects
  • Switching to a Keras based approach is a good move, but it makes the graph visualized in Tensorboard really ugly. In fact, the variables and the graphs are defined globally, and the tf.named_scope (invoked every time a Keras Model is called, in order to share the variables easily) that creates a new “block” in the Tensorflow graph, is separated by the graph it uses internally and it has in the list of the input nodes all the variables of the model - this makes the graph visualization of Tensorboard pretty much useless and that’s a pity for such a good tool.

If you liked the article feel free to share it using the buttons below and don’t hesitate to comment to let me know if there’s something wrong/that can be improved in the article.

Thanks for reading!

Don't you want to miss the next article? Do you want to be kept updated?
Subscribe to the newsletter!

Related Posts

Creating TensorFlow Custom Ops, Bazel, and ABI compatibility

Custom ops are a way for extending the TensorFlow framework by adding operations that are not natively available in the framework. Adding a new operation is a relatively simple thing especially if you work in the officially supported environment (Ubuntu16, CUDA 10). However, if you built TensorFlow from scratch to support your target environment (e.g. Archlinux, CUDA 11) the official TensorFlow support for creating a custom op - that relies upon a Docker image - becomes useless.

Deploy and Train TensorFlow models in Go: Human Activity Recognition case study

Every Machine Learning (ML) product should reach its final stage: the deployment to production. Unfortunately, there isn't a plethora of examples containing information on how to deploy a model to production and how to design the model environment for the production. In this article, I'm going to cover these points using TensorFlow 2 as the framework of choice and Go as the target language for the deployment and training.

GitLab CI/CD for cross-platform Unreal Engine 4 projects

Continuous Integration (CI) is an essential step in the development pipeline of well-designed software infrastructure. Unreal Engine 4, on its side, does not provide an handy way to design CI/CD easily, since the project itself is huge and the system administration challenges to make it scale are big. This article will guide you through the development of a CI/CD pipeline (using GitLab) for cross-platform Unreal Engine 4 projects

FaceCTRL: control your media player with your face

After being interrupted dozens of times a day while coding with my headphones on, I decided to find a solution that eliminates the stress of pausing and re-playing the song I was listening to. The solution is machine learning / computer vision application developed with TensorFlow 2, OpenCV, and Playerctl. This article will guide you trough the step required to develop such an application.

Hands-On Neural Networks with TensorFlow 2.0

The first book on TensorFlow 2.0 and neural networks is out now!

Analyzing tf.function to discover AutoGraph strengths and subtleties - part 3

In this third and last part, we analyze what happens when tf.function is used to convert a function that contains complex Python constructs in its body. Should we design functions thinking about how they are going to be converted?

Analyzing tf.function to discover AutoGraph strengths and subtleties - part 2

In part 1 we learned how to convert a 1.x code to its eager version, the eager version to its graph representation and faced the problems that arise when working with functions that create a state. In this second part, we’ll analyze what happens when instead of a tf.Variable we pass a tf.Tensor or a Python native type as input to a tf.function decorated function. Are we sure everything is going to be converted to the Graph representation we expect?

Analyzing tf.function to discover AutoGraph strengths and subtleties - part 1

AutoGraph is one of the most exciting new features of Tensorflow 2.0: it allows transforming a subset of Python syntax into its portable, high-performance and language agnostic graph representation bridging the gap between Tensorflow 1.x and the 2.0 release based on eager execution. As often happens all that glitters is not gold: although powerful, AutoGraph hides some subtlety that is worth knowing; this article will guide you through them using an error-driven approach.

Tensorflow 2.0: Keras is not (yet) a simplified interface to Tensorflow

In Tensorflow 2.0 Keras will be the default high-level API for building and training machine learning models, hence complete compatibility between a model defined using the old tf.layers and the new tf.keras.layers is expected. In version 2 of the popular machine learning framework the eager execution will be enabled by default although the static graph definition + session execution will be still supported. In this post, you'll see that the compatibility between a model defined using tf.layers and tf.keras.layers is not always guaranteed.

Fixed camera setup for object localization and measurement

A common task in Computer Vision is to use a camera for localize and measure certain objects in the scene. In the industry is common to use images of objects on a high contrast background and use Computer Vision algorithms to extract useful information. There's a lot of literature about the computer vision algorithm that we can use to extract the information, but something that's usually neglected is how to correctly setup the camera in order to correctly address the problem. This post aim is to shed light on this subject.