Fixed camera setup for object localization and measurement

A common task in Computer Vision is to use a camera for localize and measure certain objects in the scene. In the industry is common to use images of objects on a high contrast background and use Computer Vision algorithms to extract useful information.

There’s a lot of literature about the computer vision algorithm that we can use to extract the information, but something that’s usually neglected is how to correctly setup the camera in order to correctly address the problem. This post aim is to shed light on this subject.

The problem

The problem we aim to solve with Computer vision is to measure (in mm) objects of unknown shape, but with known thickness and max height and width values, while satisfying the constraint on the required minimum accuracy / error tolerance.

The camera setup for this kind of problem consists in:

  • Finding the correct working distance (distance between the object surface and the lenses)
  • Choose the right focal length.

In the following I’m going to show a possible 3 steps approach that can be used to correctly setup the camera.

Step 1: camera calibration & px/mm ratio calculation

Without entering in the detail of camera calibration, all we need to know is that the calibration process allow to represent the camera intrinsic parameters as a matrix. What the calibration does is to estimate the parameters of a pinhole camera model that approximate the camera that produces the set of photos given in input to the process.

where and are the focal distances in px and is the optical center in px.

In case of a squared sensor and are equal, but in general we can consider and consider a single focal length in px

The theory of the camera resectioning gives us the relation between the estimated focal lengths (in px) and the real focal length (in mm).

Since we’re considering we can just consider a single equation

In short, the estimated focal length in pixel is the real focal length (mm) times a scaling factor (px/mm).

This scaling factor is extremely important, because it measure the number of pixels in a millimeter of sensor.

Step 2: relationship between distance, object on sensor and object in scene

There’s a relation between the size of an object in the scene and the size of the object on the image plane. This relation comes from the thin lenses equiation.

Given the real size of the object (mm) and the size of the object in pixels, we know that

That in English it can be read as “the working distance in millimeters is the object real size in millimeter times the focal length in millimiters, divived by the object size on the image sensor”.

Hence it’s pretty easy to measure the size of the object in millimeters, when every other variable is know:

Step 3: satisfy constraints

There are 2 constraints that have to be satisfied when designing an object measurement system:

  1. Being able to measure the whole object
  2. Minimum accuracy

Step 3.1: FOV constraint

The constraint on the ability of measure the whole object can be satisfied analyzing the Field of View (FOV) of the camera.

Let , where is a “safety margin” used to compensate the camera calibration distortion removal and the need for a background around the object (usual values for are in range mm). Let and be the height and width of the sensor respectively (these values are available on the camera datasheet), then

Since the object can be in any possible orientation we can consider only the smaller FOV when finding the right distance for the camera (because this is the constrained one):

It’s obvious that is the angle (in radians) between the working distance and the “last ray of light” (in the sense of farther from the center) captured by the sensor. It’s also clear that the length of this ray of light changes according to the working distance.

The following images will make everything clear:

Field Of View

On the axis the position of is highlighted because we have to find the distance that makes the whole object (and the safety margin) visible. Hence:

This means that our working distance (noted as d in the picture) can be found exactly.

Please note that we’re creating an object measurement application, hence we can exploit other information regard the object in order to improve the precision. In fact, if we know in advance the set of thickness (in mm) that our objects could have, we can place our camera at a smaller distance and hence increase the accuracy (see next section).

In practice, the real working distance (that’s the one we’re really interested) can be found as:

The offset term is an optional term, that usually can be found on the camera datasheet, that’s the relative position of the sensor with respect to the measurement point (in the order of mm usually).

WARNING: The working distance computed in this way is a theoretical estimation of the real working distance since the camera model we’re using is the pinhole using, hence we’re using the thin lens equation as the foundation for our reasoning. In practice, the working distance to use in a real-world application must be computed using a software solution (exploiting the information about the size of a known object and the measured object in pixel) since the thin lens equations can’t model complex lens system in a precise way. Hence, you can use all the content of this article to get a rough estimation of the working distance in order to properly setup the camera physically.

Step 3.2: minimum accuracy constraint

The constraint on the accuracy can be formalized as follow:

where is the accuracy required and the 1 represents a lower bound (we can’t have a number of pixel less than 1 at a specified tolerance). In english: the number of pixel of the image per millimiter of the scene must be greather than 1.

If, for instance, the requirement is to have an accuracy of 3mm, the inequality becomes:

From the relation of the object in the scene on the object on the sensor (now with the real working distance) we can measure the number of pixels per millimiter, in fact

So, now is extremely easy to calculate the number of pixels per millimiter in the scene and check if the previous relation holds:

if the relation holds, we have correctly setup our system (but another safety margin can be to increase the number of pixels per accuracy required and hence change that 1 to something bigger).

Instead, if this relation does not hold we have to change the moving part of our system in order to satisfy every requirement:

  1. Check if the thickness of the object you’re measuring can help you making the camera closed to the object
  2. Change the focal length (and repeat every calculation, but only after a new calibration!)
  3. Evaluate the usage of more cameras and stitch the images together
  4. Last resort: change the camera(s)

One last tip: the relation allows also to measure the system accuracy (in px/mm), hence the number of pixels per single millimiter of the scene, just set and you’re done!


This article has been posted on the Zuru Tech Italy blog first and cross-posted here.

Don't you want to miss the next article? Do you want to be kept updated?
Subscribe to the newsletter!

Related Posts

FaceCTRL: control your media player with your face

After being interrupted dozens of times a day while coding with my headphones on, I decided to find a solution that eliminates the stress of pausing and re-playing the song I was listening to. The solution is machine learning / computer vision application developed with TensorFlow 2, OpenCV, and Playerctl. This article will guide you trough the step required to develop such an application.

Hands-On Neural Networks with TensorFlow 2.0

The first book on TensorFlow 2.0 and neural networks is out now!

Analyzing tf.function to discover AutoGraph strengths and subtleties - part 3

In this third and last part, we analyze what happens when tf.function is used to convert a function that contains complex Python constructs in its body. Should we design functions thinking about how they are going to be converted?

Analyzing tf.function to discover AutoGraph strengths and subtleties - part 2

In part 1 we learned how to convert a 1.x code to its eager version, the eager version to its graph representation and faced the problems that arise when working with functions that create a state. In this second part, we’ll analyze what happens when instead of a tf.Variable we pass a tf.Tensor or a Python native type as input to a tf.function decorated function. Are we sure everything is going to be converted to the Graph representation we expect?

Analyzing tf.function to discover AutoGraph strengths and subtleties - part 1

AutoGraph is one of the most exciting new features of Tensorflow 2.0: it allows transforming a subset of Python syntax into its portable, high-performance and language agnostic graph representation bridging the gap between Tensorflow 1.x and the 2.0 release based on eager execution. As often happens all that glitters is not gold: although powerful, AutoGraph hides some subtlety that is worth knowing; this article will guide you through them using an error-driven approach.

Tensorflow 2.0: Keras is not (yet) a simplified interface to Tensorflow

In Tensorflow 2.0 Keras will be the default high-level API for building and training machine learning models, hence complete compatibility between a model defined using the old tf.layers and the new tf.keras.layers is expected. In version 2 of the popular machine learning framework the eager execution will be enabled by default although the static graph definition + session execution will be still supported. In this post, you'll see that the compatibility between a model defined using tf.layers and tf.keras.layers is not always guaranteed.

Tensorflow 2.0: models migration and new design

Tensorflow 2.0 will be a major milestone for the most popular machine learning framework: lots of changes are coming, and all with the aim of making ML accessible to everyone. These changes, however, require for the old users to completely re-learn how to use the framework: this article describes all the (known) differences between the 1.x and 2.x version, focusing on the change of mindset required and highlighting the pros and cons of the new implementation.

Understanding Tensorflow's tensors shape: static and dynamic

Describing computational graphs is just a matter connecting nodes correctly. Connecting nodes seems a trivial operation, but it hides some difficulties related to the shape of tensors. This article will guide you through the concept of tensor's shape in both its variants: static and dynamic.

Camera calibration guidelines

The process of geometric camera calibration (camera resectioning) is a fundamental step for machine vision and robotics applications. Unfortunately, the result of the calibration process can vary a lot depending on various factors. There are a lot of empirical guidelines that have to be followed in order to achieve good results: this post will drive you through them.

Ethereum on Raspberry Pi: secure wallet and complete node with redundant storage

Ethereum is a relatively new player in the crypto-currencies ecosystem. If you are a researcher, an algorithmic trader or an investor, you could want to run an ethereum node to study, develop and store your ETH while contributing to the network good.