Fixed camera setup for object localization and measurement


A common task in Computer Vision is to use a camera for localize and measure certain objects in the scene. In the industry is common to use images of objects on a high contrast background and use Computer Vision algorithms to extract useful information.

There’s a lot of literature about the computer vision algorithm that we can use to extract the information, but something that’s usually neglected is how to correctly setup the camera in order to correctly address the problem. This post aim is to shed light on this subject.

The problem

The problem we aim to solve with Computer vision is to measure (in mm) objects of unknown shape, but with known thickness \(T_o\) and max height \(H_o\) and width \(W_o\) values, while satisfying the constraint on the required minimum accuracy / error tolerance.

The camera setup for this kind of problem consists in:

  • Finding the correct working distance (distance between the object surface and the lenses)
  • Choose the right focal length.

In the following I’m going to show a possible 3 steps approach that can be used to correctly setup the camera.

Step 1: camera calibration & px/mm ratio calculation

Without entering in the detail of camera calibration, all we need to know is that the calibration process allow to represent the camera intrinsic parameters as a \(3 \times 3\) matrix. What the calibration does is to estimate the parameters of a pinhole camera model that approximate the camera that produces the set of photos given in input to the process.

\[A = \begin{pmatrix} f_x ~~\quad \gamma ~~\quad c_x \\ 0 ~~\quad f_y ~~\quad c_y \\ 0 ~~~\quad ~0 ~~\quad ~1 \end{pmatrix}\]

where \(f_x\) and \(f_y\) are the focal distances in px and \((c_x, c_y)\) is the optical center in px.

In case of a squared sensor \(f_x\) and \(f_y\) are equal, but in general we can consider \(f_x \approx f_y\) and consider a single focal length in px

\[f_{xy} = \frac{f_x + f_y}{2} \quad [px]\]

The theory of the camera resectioning gives us the relation between the estimated focal lengths (in px) and the real focal length (in mm).

\[f_x = m_x \cdot f \quad , \quad f_y = m_y \cdot f\]

Since we’re considering \(f_{xy}\) we can just consider a single equation

\[f_{xy} = m \cdot f\]

In short, the estimated focal length in pixel is the real focal length \(f\) (mm) times a scaling factor \(m\) (px/mm).

\[m = \frac{f_{xy}}{f} \quad [\frac{px}{mm}]\]

This scaling factor is extremely important, because it measure the number of pixels in a millimeter of sensor.

Step 2: relationship between distance, object on sensor and object in scene

There’s a relation between the size of an object in the scene and the size of the object on the image plane. This relation comes from the thin lenses equiation.

Given \(X\) the real size of the object (mm) and \(x\) the size of the object in pixels, we know that

\[\text{WD} = \frac{X \cdot f}{\frac{x}{m}} \quad [\frac{mm^2}{\frac{px}{\frac{px}{mm}}} = mm]\]

That in English it can be read as “the working distance in millimeters is the object real size in millimeter times the focal length in millimiters, divived by the object size on the image sensor”.

Hence it’s pretty easy to measure the size of the object in millimeters, when every other variable is know:

\[X = \frac{\text{WD} \cdot \frac{x}{m}}{f}\]

Step 3: satisfy constraints

There are 2 constraints that have to be satisfied when designing an object measurement system:

  1. Being able to measure the whole object
  2. Minimum accuracy

Step 3.1: FOV constraint

The constraint on the ability of measure the whole object can be satisfied analyzing the Field of View (FOV) of the camera.

Let \(M_o = max(W_o, H_o) + \delta\), where \(\delta\) is a “safety margin” used to compensate the camera calibration distortion removal and the need for a background around the object (usual values for \(\delta\) are in range \([50, 100]\) mm). Let \(h\) and \(w\) be the height and width of the sensor respectively (these values are available on the camera datasheet), then

\[\alpha_H = FOV_H = 2\text{atan} \frac{h}{2f} \quad ,\quad \alpha_W = FOV_W = 2\text{atan} \frac{w}{2f}\]

Since the object can be in any possible orientation we can consider only the smaller FOV when finding the right distance for the camera (because this is the constrained one):

\[\alpha = \min(\alpha_H, \alpha_W)\]

It’s obvious that \(\alpha\) is the angle (in radians) between the working distance \(\text{WD}\) and the “last ray of light” (in the sense of farther from the center) captured by the sensor. It’s also clear that the length of this ray of light changes according to the working distance.

The following images will make everything clear:

Field Of View

On the \(y\) axis the position of \(M_o\) is highlighted because we have to find the distance \(d\) that makes the whole object (and the safety margin) visible. Hence:

\[\begin{cases} WD \quad= L \cos \alpha \\ M_o \quad~~= L \sin \alpha \end{cases} \quad \Rightarrow \text{WD} = \frac{M_o}{\tan \alpha}\]

This means that our working distance (noted as d in the picture) can be found exactly.

Please note that we’re creating an object measurement application, hence we can exploit other information regard the object in order to improve the precision. In fact, if we know in advance the set of thickness (in mm) \(T = \left\{T_1, T_2, \dots, T_n\right\}\) that our objects could have, we can place our camera at a smaller distance and hence increase the accuracy (see next section).

In practice, the real working distance (that’s the one we’re really interested) can be found as:

\[\text{WD}_r = \text{WD} - \min\{T\} \text{offset}\]

The offset term is an optional term, that usually can be found on the camera datasheet, that’s the relative position of the sensor with respect to the measurement point (in the order of \([0, 5]\)mm usually).

WARNING: The working distance computed in this way is a theoretical estimation of the real working distance since the camera model we’re using is the pinhole using, hence we’re using the thin lens equation as the foundation for our reasoning. In practice, the working distance to use in a real-world application must be computed using a software solution (exploiting the information about the size of a known object and the measured object in pixel) since the thin lens equations can’t model complex lens system in a precise way. Hence, you can use all the content of this article to get a rough estimation of the working distance in order to properly setup the camera physically.

Step 3.2: minimum accuracy constraint

The constraint on the accuracy can be formalized as follow:

\[\frac{\#px}{\Delta} \geq 1\]

where \(\Delta\) is the accuracy required and the 1 represents a lower bound (we can’t have a number of pixel less than 1 at a specified tolerance). In english: the number of pixel of the image per \(\Delta\) millimiter of the scene must be greather than 1.

If, for instance, the requirement is to have an accuracy of 3mm, the inequality becomes:

\[\frac{\#px}{3} \geq 1\]

From the relation of the object in the scene on the object on the sensor (now with the real working distance) we can measure the number of pixels per millimiter, in fact

\[X = \frac{\text{WD}_r \cdot \frac{x}{m}}{f} \Leftrightarrow x = \frac{X f m}{\text{WD}_r} = \frac{Xf_{xy}}{\text{WD}_r}\]

So, now is extremely easy to calculate the number of pixels per millimiter in the scene and check if the previous relation holds:

\[\frac{\Delta f m}{\text{WD}_r} \ge 1\]

if the relation holds, we have correctly setup our system (but another safety margin can be to increase the number of pixels per accuracy required and hence change that 1 to something bigger).

Instead, if this relation does not hold we have to change the moving part of our system in order to satisfy every requirement:

  1. Check if the thickness of the object you’re measuring can help you making the camera closed to the object
  2. Change the focal length (and repeat every calculation, but only after a new calibration!)
  3. Evaluate the usage of more cameras and stitch the images together
  4. Last resort: change the camera(s)

One last tip: the relation \(x = \frac{X f m}{\text{WD}_r}\) allows also to measure the system accuracy (in px/mm), hence the number of pixels per single millimiter of the scene, just set \(\Delta=1\) and you’re done!

Disclosure

This article has been posted on the Zuru Tech Italy blog first and cross-posted here.

Don't you want to miss the next article? Do you want to be kept updated?
Subscribe to the newsletter!

Related Posts

Building a RAG for tabular data in Go with PostgreSQL & Gemini

In this article we explore how to combine a large language model (LLM) with a relational database to allow users to ask questions about their data in a natural way. It demonstrates a Retrieval-Augmented Generation (RAG) system built with Go that utilizes PostgreSQL and pgvector for data storage and retrieval. The provided code showcases the core functionalities. This is an overview of how the "chat with your data" feature of fitsleepinsights.app is being developed.

Using Gemini in a Go application: limits and details

This article explores using Gemini within Go applications via Vertex AI. We'll delve into the limitations encountered, including the model's context window size and regional restrictions. We'll also explore various methods for feeding data to Gemini, highlighting the challenges faced due to these limitations. Finally, we'll briefly introduce RAG (Retrieval-Augmented Generation) as a potential solution, but leave its implementation details for future exploration.

Custom model training & deployment on Google Cloud using Vertex AI in Go

This article shows a different approach to solving the same problem presented in the article AutoML pipeline for tabular data on VertexAI in Go. This time, instead of relying on AutoML we will define the model and the training job ourselves. This is a more advanced usage that allows the experienced machine learning practitioner to have full control on the pipeline from the model definition to the hardware to use for training and deploying. At the end of the article, we will also see how to use the deployed model. All of this, in Go and with the help of Python and Docker for the custom training job definition.

Integrating third-party libraries as Unreal Engine plugins: solving the ABI compatibility issues on Linux when the source code is available

In this article, we will discuss the challenges and potential issues that may arise during the integration process of a third-party library when the source code is available. It will provide guidance on how to handle the compilation and linking of the third-party library, manage dependencies, and resolve compatibility issues. We'll realize a plugin for redis plus plus as a real use case scenario, and we'll see how tough can it be to correctly compile the library for Unreal Engine - we'll solve every problem step by step.

AutoML pipeline for tabular data on VertexAI in Go

In this article, we delve into the development and deployment of tabular models using VertexAI and AutoML with Go, showcasing the actual Go code and sharing insights gained through trial & error and extensive Google research to overcome documentation limitations.

Advent of Code 2022 in pure TensorFlow - Day 12

Solving problem 12 of the AoC 2022 in pure TensorFlow is a great exercise in graph theory and more specifically in using the Breadth-First Search (BFS) algorithm. This problem requires working with a grid of characters representing a graph, and the BFS algorithm allows us to traverse the graph in the most efficient way to solve the problem.

Advent of Code 2022 in pure TensorFlow - Day 11

In this article, we'll show how to solve problem 11 from the Advent of Code 2022 (AoC 2022) using TensorFlow. We'll first introduce the problem and then provide a detailed explanation of our TensorFlow solution. The problem at hand revolves around the interactions of multiple monkeys inspecting items, making decisions based on their worry levels, and following a set of rules.

Advent of Code 2022 in pure TensorFlow - Day 10

Solving problem 10 of the AoC 2022 in pure TensorFlow is an interesting challenge. This problem involves simulating a clock signal with varying frequencies and tracking the state of a signal-strength variable. TensorFlow's ability to handle complex data manipulations, control structures, and its @tf.function decorator for efficient execution makes it a fitting choice for tackling this problem. By utilizing TensorFlow's features such as Dataset transformations, efficient filtering, and tensor operations, we can create a clean and efficient solution to this intriguing puzzle.

Advent of Code 2022 in pure TensorFlow - Day 9

In this article, we'll show two different solutions to the Advent of Code 2022 day 9 problem. Both of them are purely TensorFlow solutions. The first one, more traditional, just implement a solution algorithm using only TensorFlow's primitive operations - of course, due to some TensorFlow limitations this solution will contain some details worth reading (e.g. using a pairing function for being able to use n-dimensional tf.Tensor as keys for a mutable hashmap). The second one, instead, demonstrates how a different interpretation of the problem paves the way to completely different solutions. In particular, this solution is Keras based and uses a multi-layer convolutional model for modeling the rope movements.

Advent of Code 2022 in pure TensorFlow - Day 8

Solving problem 8 of the AoC 2022 in pure TensorFlow is straightforward. After all, this problem requires working on a bi-dimensional grid and evaluating conditions by rows or columns. TensorFlow is perfectly suited for this kind of task thanks to its native support for reduction operators (tf.reduce) which are the natural choice for solving problems of this type.