Understanding Tensorflow using Go

Tensorflow is not a Machine Learning specific library, instead, is a general purpose computation library that represents computations with graphs. Its core is implemented in C++ and there are also bindings for different languages. The bindings for the Go programming language, differently from the Python ones, are a useful tool not only for using Tensorflow in Go but also for understanding how Tensorflow is implemented under the hood.

The bindings

Officially, the Tensorflow’s developers released:

  • The C++ source code: the real Tensorflow core where the high & low level operations are concretely implemented.
  • The Python bindings & the Python library: the bindings are automatically generated from the C++ implementation, in this way we can use Python to invoke C++ functions: that’s how, for instance, the core of numpy is implemented. The library, moreover, combines calls to the bindings in order to define the higher level API that everyone’s using Tensorflow knows well.
  • The Java bindings
  • The Go binding

Being a Gopher and not a Java lover, I started looking at the Go bindings in order to understand what kind of tasks they were created for.

The Go bindings

Tensorflow & Go

The Gopher (created by Takuya Ueda (@tenntenn). Licensed under the Creative Commons 3.0 Attributions license)
overlapping the Tensorflow Logo.

The first thing to note is that the Go API, for admission of the maintainers itself, lacks the Variable support: this API is designed to use trained models and not for training models from scratch. This is clearly stated in the Installing Tensorflow for Go:

TensorFlow provides APIs for use in Go programs. These APIs are particularly well-suited to loading models created in Python and executing them within a Go application.

If we’re not interested in training ML models: hooray! If, instead, you’re interested in training models here’s an advice:

Be a real gopher, keep it simple! Use Python to define & train models; you can always load trained models and using them with Go later!

In short: the go bindings can be used to import and define constants graphs; where constant, in this context, means that there’s no training process involved and thus no trainable variables.

Let’s now start diving into Tensorflow using Go: let’s create our first application.

In the following, I suppose that the reader has its Go environment ready and the Tensorflow bindings compiled and installed as explained in the README.

Understand Tensorflow structure

Let’s repeat what Tensorflow is (kept from the Tensorflow website, the emphasis is mine):

TensorFlow™ is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them.

We can think of Tensorflow as a descriptive language, a bit like SQL, in which you describe what you want and let the underlying engine (the database) parse your query, check for syntactic and semantic errors, convert it to its private representation, optimize it and compute the results: all this to give you the correct results.

Therefore, what we really do when we use any of the available APIs is to describe a graph: the evaluation of the graph starts when we place it into a Session and explicitly decide to Run the graph within the Session.

Knowing this, let’s try to define a computational graph and evaluate it within a Session. The API documentation gives us a pretty clear list of the available methods within the packages tensorflow (shorthanded tf) & op.

As we can see, these two packages contains everything we need to define and evaluate a graph.

The former contains the functions to construct the basic “empty” structures like the Graph itself, the latter is the most important package that contains the bindings automatically generated from the C++ implementation.

However, suppose that we want to compute the matrix multiplication between \(A\) and \(x\) where

\[A = \left(\begin{array}{cc}1 & 2\\-1 & -2\end{array}\right), x = \left(\begin{array}{c}10\\100\end{array}\right)\]

I suppose that the reader is already familiar with the tensorflow graph definition idea and knows what placeholders are and how they work. The code below is the first attempt that a Tensorflow Python bindings user would make. Let’s call this file attempt1.go

package main

import (
	tf "github.com/tensorflow/tensorflow/tensorflow/go"

func main() {
	// Let's describe what we want: create the graph

	// We want to define two placeholder to fill at runtime
	// the first placeholder A will be a [2, 2] tensor of integers
	// the second placeholder x will be a [2, 1] tensor of intergers

	// Then we want to compute Y = Ax

	// Create the first node of the graph: an empty node, the root of our graph
	root := op.NewScope()

	// Define the 2 placeholders
	A := op.Placeholder(root, tf.Int64, op.PlaceholderShape(tf.MakeShape(2, 2)))
	x := op.Placeholder(root, tf.Int64, op.PlaceholderShape(tf.MakeShape(2, 1)))

	// Define the operation node that accepts A & x as inputs
	product := op.MatMul(root, A, x)

	// Every time we passed a `Scope` to an operation, we placed that
	// operation **under** that scope.
	// As you can see, we have an empty scope (created with NewScope): the empty scope
	// is the root of our graph and thus we denote it with "/".

	// Now we ask tensorflow to build the graph from our definition.
	// The concrete graph is created from the "abstract" graph we defined
	// using the combination of scope and op.

	graph, err := root.Finalize()
	if err != nil {
		// It's useless trying to handle this error in any way:
		// if we defined the graph wrongly we have to manually fix the definition.

		// It's like a SQL query: if the query is not syntactically valid
		// we have to rewrite it

	// If here: our graph is syntatically valid.
	// We can now place it within a Session and execute it.

	var sess *tf.Session
	sess, err = tf.NewSession(graph, &tf.SessionOptions{})
	if err != nil {

	// In order to use placeholders, we have to create the Tensors
	// containing the values to feed into the network
	var matrix, column *tf.Tensor

	// A = [ [1, 2], [-1, -2] ]
	if matrix, err = tf.NewTensor([2][2]int64{ {1, 2}, {-1, -2} }); err != nil {
	// x = [ [10], [100] ]
	if column, err = tf.NewTensor([2][1]int64{ {10}, {100} }); err != nil {

	var results []*tf.Tensor
	if results, err = sess.Run(map[tf.Output]*tf.Tensor{
		A: matrix,
		x: column,
	}, []tf.Output{product}, nil); err != nil {
	for _, result := range results {

The code is completely commented and I invite the reader to read every single comment.

Now, the Tensorflow-Python user expects that this code compiles and works fine. Let’s see if he’s right:

go run attempt1.go

Here’s what he got:

panic: failed to add operation "Placeholder": Duplicate node name in graph: 'Placeholder'

wait: what’s going on here? Apparently, there are 2 operations “Placeholder” with the same name “Placeholder”.

Lesson 1: node IDs

The Python API generates different nodes everytime we invoke a method to define an operation, no matter if this has been already called before. In fact, the following code returns 3 without problems.

import tensorflow as tf
a = tf.placeholder(tf.int32, shape=())
b = tf.placeholder(tf.int32, shape=())
add = tf.add(a,b)
sess = tf.InteractiveSession()
print(sess.run(add, feed_dict={a: 1,b: 2}))

We can verify that this program creates two different nodes printing the placeholder names: print(a.name, b.name) produces Placeholder:0 Placeholder_1:0. Thus, the b placeholder is Placeholder_1:0 whilst the a placeholder is Placeholder:0.

In Go, instead, the previous program fails because A and x are both called Placeholder. We can conclude that:

The Go API does not automatically generate new names every time we invoke a function that defines an operation: the operation name is thus fixed and we can’t change it.

Question time:

  • What do we have learned about the Tensorflow architecture? Every node in a graph must have a unique name. Every node is identified by its name.
  • Is the name of the node the same of the operation that defined it? Yes, or better, not completely, it’s just its last part

To clarify the second answer, let’s fix the duplicate node name problem.

Lesson 2: Scoping

As we just saw, the Python API automatically creates a new name every time an operation is defined. Under the hood, the Python API invokes the C++ method WithOpName of the class Scope. Here are the method documentation and its signature, kept from scope.h:

/// Return a new scope. All ops created within the returned scope will have
/// names of the form <name>/<op_name>[_<suffix].
Scope WithOpName(const string& op_name) const;

We can note that this method, used to name the nodes returns a Scope, thus a node name is, in reality, a Scope. A Scope is a complete path from the root / (empty graph) to the op_name.

The method WithOpName adds a suffix _<suffix> (where <suffix> is a counter) when we try to add a node that has the same path from / to op_name and thus it would be a duplicate node under the same scope.

Knowing this, to solve the issue of duplicated node names we expect to find the method WithOpName in the type Scope. Sadly, this method is not present.

Instead, looking at the documentation of the type Scope we can see that the only method that returns a new Scope is SubScope(namespace string).

Quoting the documentation:

SubScope returns a new Scope which will cause all operations added to the graph to be namespaced with ‘namespace’. If namespace collides with an existing namespace within the scope, then a suffix will be added.

The collision management using a suffix is different from the C++ WithOpName: WithOpName adds the suffix after the operation name, within the same scope (thus Placeholder becomes Placeholder_1) while the Go SubScope adds the suffix after the scope name.

This difference generates completely different graphs, but although different (nodes are placed under different scopes) they’re equivalent computationally speaking.

Let’s change the placeholder definitions in order to define two different nodes, moreover, let’s Print the Scope name.

Let’s create the file attempt2.go changing the lines

	A := op.Placeholder(root, tf.Int64, op.PlaceholderShape(tf.MakeShape(2, 2)))
	x := op.Placeholder(root, tf.Int64, op.PlaceholderShape(tf.MakeShape(2, 1)))


	// define 2 subscopes of the root subscopes, called "input". In this
	// way we expect to have a input/ and a input_1/ scope under the root scope
	A := op.Placeholder(root.SubScope("input"), tf.Int64, op.PlaceholderShape(tf.MakeShape(2, 2)))
	x := op.Placeholder(root.SubScope("input"), tf.Int64, op.PlaceholderShape(tf.MakeShape(2, 1)))
	fmt.Println(A.Op.Name(), x.Op.Name())

Usual compile & run: go run attempt2.go. Results:

input/Placeholder input_1/Placeholder

Question time:

What do we have learned about the Tensorflow architecture? A node is completely identified by the Scope in which it is defined. The scope is the path that we have to follow from the root of the graph to reach the node. There are 2 way of defining nodes that execute the same operation: define the operation in a different Scope (Go style) or change the operation name (how Python automatically do or how we can do in C++)

We solved the problem of the duplicate node names but another problem landed on our terminal.

panic: failed to add operation "MatMul": Value for attr 'T' of int64 is not in the list of allowed values: half, float, double, int32, complex64, complex128

Why the MatMul node definition panics? We just want to multiply two tf.int64 matrices! It looks like that int64 is the only type that MatMul does not accept.

Value for attr ‘T’ of int64 is not in the list of allowed values: half, float, double, int32, complex64, complex128

What’s this list? Why can we multiply 2 matrices of int32 but not of int64?

Let’s go solve this understanding what’s going on.

Lesson 3: Tensorflow typing system

Let’s dig into the source code looking for the C++ declaration of the MatMul operation:

    .Input("a: T")
    .Input("b: T")
    .Output("product: T")
    .Attr("transpose_a: bool = false")
    .Attr("transpose_b: bool = false")
    .Attr("T: {half, float, double, int32, complex64, complex128}")
Multiply the matrix "a" by the matrix "b".
The inputs must be two-dimensional matrices and the inner dimension of
"a" (after being transposed if transpose_a is true) must match the
outer dimension of "b" (after being transposed if transposed_b is
*Note*: The default kernel implementation for MatMul on GPUs uses
transpose_a: If true, "a" is transposed before multiplication.
transpose_b: If true, "b" is transposed before multiplication.

This line defines an interface for the MatMul operation: in particular, we can see that using the REGISTER_OP macro we’re declaring the op’s:

  • Name: MatMul
  • Parameters: a, b
  • Attributes (optional parameters): transpose_a, transpose_b
  • Supported types for the template T: half, float, double, int32, complex64, complex128
  • Output Shape: automatically inferred
  • Documentation

This macro call doesn’t contain any C++ code, but it tells us that when defining an operation, even though it uses a template, we have to specify the list of the supported type for the specified type T (or attribute). In practice, the attribute .Attr("T: {half, float, double, int32, complex64, complex128}") constraints the type T to be one value of that list.

As we can read from the tutorial even when using a Template T we have to explicitly register the kernel for every supported overload. The kernel is the CUDA-way for referring to C/C++ functions that will be executed in parallel.

The MatMul author, thus, decided to support only the previously listed types and to not support the int64. There are 2 possible reasons:

  1. Oversight: it’s possible, the Tensorflow’s authors are humans!
  2. Supporting devices where int64 operations are not fully supported, thus this specific implementation of the kernel wouldn’t be general enough to run on every supported hardware.

Coming back to our panic: the fix is obvious. We have to pass parameters of a supported type to MatMul.

Let’s create attempt3.go changing every line that refers to an int64 with an int32.

There’s just a thing to note: the Go bindings have their own set of types, with a 1:1 mapping (almost complete) with the Go types. When we feed values into the graph we have to respect the mapping (for instance feed int32 when defining tf.Int32 placeholders). The same thing has to be done when fetching values from the graph. The *tf.Tensor type returned from a Tensor evaluation, has the Value() method that returns an interface{} that must be converted to the correct type (that we know from the graph construction).

Usual go run attempt3.go. Results

input/Placeholder input_1/Placeholder
[[210] [-210]]


Here’s a Gist with the complete code of attempt3 in case you’d like too build and run it (also remember that’s a Gist, you can contribute if you see something that can be improved!)

Question time

What do we have learned about the Tensorflow architecture? Every operation has its own set of kernel associated. Tensorflow, seen as a descriptive language, is a strong typed language. It not only has to respect the C++ typing rules, but it also has the capability of implementing only certain types that are specified during the op regisration phase.


Using Go for defining and executing a graph gave us the opportunity for a better understanding of the underlying Tensorflow structure. Using a trial-and-error approach we solved this simple problem and step by step we learned something new about the graph, its nodes and the typing system.

If you find this article useful, feel free to share it using the buttons below!

Don't you want to miss the next article? Do you want to be kept updated?
Subscribe to the newsletter!

Related Posts

The (Hidden?) Costs of Vertex AI Resource Pools: A Cautionary Tale

In the article "Custom model training & deployment on Google Cloud using Vertex AI in Go" we explored how to leverage Go to create a resource pool and train a machine learning model using Vertex AI's allocated resources. While this approach offers flexibility, there's a crucial aspect to consider: the cost implications of resource pools. This article details my experience with a sudden price increase in Vertex AI and the hidden culprit – a seemingly innocuous resource pool.

Building a RAG for tabular data in Go with PostgreSQL & Gemini

In this article we explore how to combine a large language model (LLM) with a relational database to allow users to ask questions about their data in a natural way. It demonstrates a Retrieval-Augmented Generation (RAG) system built with Go that utilizes PostgreSQL and pgvector for data storage and retrieval. The provided code showcases the core functionalities. This is an overview of how the "chat with your data" feature of fitsleepinsights.app is being developed.

Using Gemini in a Go application: limits and details

This article explores using Gemini within Go applications via Vertex AI. We'll delve into the limitations encountered, including the model's context window size and regional restrictions. We'll also explore various methods for feeding data to Gemini, highlighting the challenges faced due to these limitations. Finally, we'll briefly introduce RAG (Retrieval-Augmented Generation) as a potential solution, but leave its implementation details for future exploration.

Custom model training & deployment on Google Cloud using Vertex AI in Go

This article shows a different approach to solving the same problem presented in the article AutoML pipeline for tabular data on VertexAI in Go. This time, instead of relying on AutoML we will define the model and the training job ourselves. This is a more advanced usage that allows the experienced machine learning practitioner to have full control on the pipeline from the model definition to the hardware to use for training and deploying. At the end of the article, we will also see how to use the deployed model. All of this, in Go and with the help of Python and Docker for the custom training job definition.

Integrating third-party libraries as Unreal Engine plugins: solving the ABI compatibility issues on Linux when the source code is available

In this article, we will discuss the challenges and potential issues that may arise during the integration process of a third-party library when the source code is available. It will provide guidance on how to handle the compilation and linking of the third-party library, manage dependencies, and resolve compatibility issues. We'll realize a plugin for redis plus plus as a real use case scenario, and we'll see how tough can it be to correctly compile the library for Unreal Engine - we'll solve every problem step by step.

AutoML pipeline for tabular data on VertexAI in Go

In this article, we delve into the development and deployment of tabular models using VertexAI and AutoML with Go, showcasing the actual Go code and sharing insights gained through trial & error and extensive Google research to overcome documentation limitations.

Advent of Code 2022 in pure TensorFlow - Day 12

Solving problem 12 of the AoC 2022 in pure TensorFlow is a great exercise in graph theory and more specifically in using the Breadth-First Search (BFS) algorithm. This problem requires working with a grid of characters representing a graph, and the BFS algorithm allows us to traverse the graph in the most efficient way to solve the problem.

Advent of Code 2022 in pure TensorFlow - Day 11

In this article, we'll show how to solve problem 11 from the Advent of Code 2022 (AoC 2022) using TensorFlow. We'll first introduce the problem and then provide a detailed explanation of our TensorFlow solution. The problem at hand revolves around the interactions of multiple monkeys inspecting items, making decisions based on their worry levels, and following a set of rules.

Advent of Code 2022 in pure TensorFlow - Day 10

Solving problem 10 of the AoC 2022 in pure TensorFlow is an interesting challenge. This problem involves simulating a clock signal with varying frequencies and tracking the state of a signal-strength variable. TensorFlow's ability to handle complex data manipulations, control structures, and its @tf.function decorator for efficient execution makes it a fitting choice for tackling this problem. By utilizing TensorFlow's features such as Dataset transformations, efficient filtering, and tensor operations, we can create a clean and efficient solution to this intriguing puzzle.

Advent of Code 2022 in pure TensorFlow - Day 9

In this article, we'll show two different solutions to the Advent of Code 2022 day 9 problem. Both of them are purely TensorFlow solutions. The first one, more traditional, just implement a solution algorithm using only TensorFlow's primitive operations - of course, due to some TensorFlow limitations this solution will contain some details worth reading (e.g. using a pairing function for being able to use n-dimensional tf.Tensor as keys for a mutable hashmap). The second one, instead, demonstrates how a different interpretation of the problem paves the way to completely different solutions. In particular, this solution is Keras based and uses a multi-layer convolutional model for modeling the rope movements.