Face Miner: data mining applied to face detection

Face Miner

Data Mining applied to

Face Detection

Paolo Galeone

Face Detection & Data Mining

"Face detection can be regarded as a specific case of object-class detection.
In object-class detection, the task is to find the locations and sizes of all objects in an image that
belong to a given class. Face-detection algorithms aims to
detect frontal human faces." [1]

"Data Mining is the computational process of discovering
patterns in large data sets" [2]

Let's see what Wikipedia says about these two topics...

[1] https://en.wikipedia.org/wiki/Face_detection
[2] https://en.wikipedia.org/wiki/Data_Mining

How can these two topics be related?

Let's think of an image as a set of pixel. Each pixel has its local
and non local characteristics (position, color, neighborhood...).
Therefore a set of images of the same subject will share the
same characteristics.

A human face has its own characteristics that give us the ability to detect a human face just
by looking for those characteristics.

If follows that we can use Data Mining to discover patterns in image datasets and use
the discovered patterns as a set of characteristics that helps us with the Face Detection task.

Mining Maximal Frequent Patterns

The MAFIA [3] algorithm can efficiently
generate maximal frequent patterns from
a database.

Considering a dataset of face images
like a transaction database, in which every
row (transaction) is an image, we can use
MAFIA to mine those pixels that co-occurs
in a human face. Those feature pixels will
form a maximal frequent pattern.

To use the MAFIA algorithm a preprocessing step is required, because the intensity value range
is too wide to extract useful information: there are too many items in the basket!
In fact, the MAFIA algorithm is based on the Apriori algorithm frequently used in the market
basked analysis.

[3] http://himalaya-tools.sourceforge.net/Mafia/

Preprocessing

The preprocessing steps are:

Histogram equalization: in order to improve constrast
Smooth derivate along the horizontal axis, using the Sobel operator
Thresholding of the derivate magnitude
Dilatation using a structuring element with a cross shape

The image below shows the preprocessing steps results

The resulting image is called edge-image. Its complement is called non edge-image.

Performance:

Instead of calculating the average and the variance in the raw image, the used approach is to use
the integral image to compute these values in constant time.

Build the face detector

Based on the positive and negative feature patterns mined, we adopt the coarse-to-fine
strategy to construct the face detector which consists of three cascaded classifiers.

Variance classifier: to prune windows with a small variance.
Face features classifier: to select the windows with most facial features.
Suppor Vector Machine: to refine the results.

In the detection phase, a sliding window approach is
used to search for the faces in an image.
For each sliding window, we extract the regions
of size 19x19 from every location in the input image
pyramid by downsampling.

Variance Classifier

This classifier is used to improve the computational efficiency of the detection algorithm.
It prunes windows that can't contain faces at all.

The window is divided into 5 regions. For every region, the variance
is computed.

Then, the window gets discarded if:
or is less then a predefined threshold.

Threshold that must be learned with an iterative process.

If the windows passes this first step, a second criterion on the pixels average differences is
applied to the remaining window.

If the windows passes this second step, it's passed to the face features classifier, otherwise
it gets discarded.

Back to MAFIA...

Positive and negative maximal frequent patterns.

From the edge and non edge images, we built 2 databases. The first is the positive patterns DB,
the second is the negative patterns DB. Every row contains the coordinates of the black pixels.

Applying MAFIA on these databases we obtained the following patterns, called
positive & negative feature patterns.

Thus,
we can automatically find the features patterns that capture most of the facial features.

Face Feature classifier

The face feature classifier utilizes the positive and negative feature patterns mined to select the
images with the most facial features and discard the non-face images.

This classifier is rule based and needs an iterative process to learn 6 different thresholds.
Rules are defined using the patterns as a mask and computing the sum of the intensities.

A long and iterative process has been done to learn the
thresholds.
The learning process was based on the precision &
recall metrics and was stopped when satisfactory values
got reached.

Any image that meets rules 1, 2 and 3 is passed to the SVM classifier,
otherwise it gets discarded.

Support Vector Machine

To increase the accuracy of the face detector a SVM is used
in the last step.

Intuitively, the SVM tries to find the optimal hyperplane so
that the margin between the two classes is maximized.

The SVM needs to be trained in order to achieve
this result.

In Face Miner the SVM classifies vectors in two classes:
face and non face.

Every vector is a n-dimensional vector whose components are features of the current image.

The features selected in Face Miner are the coefficients of the HAAR
transform of the rows in the eye and mouth regions.

The true and false positives obtained from the first two classifiers
are used as train data for the SVM.

SVM: dimensionality reduction

Working with a dataset in which every image is 19x19, we select 2 rows at eye and 3 rows
at mouth. Thus, 19x2 + 19x3 = 95 features for vector.

To speed-up the SVM and to increase the chance to find an optimal hyperplane a
dimensionality reduction technique has been used: the Principal Component Analysis.

The PCA gives us the ability to select only those features that have the highest variance,
the ones with the greatest information content.

After various tests, we find out that the number of
components can be reduced from 95 to 29.

Even with 29 features only, an optimal hyperplane can't be
found in the 29-d space.

Therefore a radial basis function kernel has been used to
project the vectors in high-dimension space, where those
features are linearly separable.

DEMO TIME

Face Miner vs & Jones

The performance of Face Miner are bad in term of speed, especially if compared to the speed
achieved by the state of art algorithm used for the face detection task: the Viola & Jones algorithm.

In fact, the average speed for a detection on a 320 x 243 image is: 0.528 (FM) vs 0.017 (V&J).
The accuracy is quite good:

The other main difference, is the region detected.

In Face Miner the ROI is smaller
than the ROI detected
with Viola & Jones.

This can be a cause of the speed
difference between the 2 methods,
but can be useful in
applications of facial recognition where we have to reduce noisy parts to focus on the face only

Credits

License

Face Miner is based on the paper of Wen-Kwang Tsao at al.: "A Data Mining approach to Face
Detection".

In Face Miner, a critical discussion of the paper has been done. Therefore Face Miner is not a
reproduction of the work described in the paper, but a different implementation resulting
from this critical discussion.

The original paper was not reproducible due to the usage of private datasets and, as a conseguence,
the resulting work was publicly available.

Face Miner is FOSS, completely reproducible and the available on
GitHub: https://github.com/galeone/face-miner

The datasets used are:
Yale Face Database B: http://vision.ucsd.edu/~leekc/ExtYaleDatabase/ExtYaleB.html
MIT CBCL Face Database: http://cbcl.mit.edu/software-datasets/FaceData2.html

Face Miner: data mining applied to face detection
Copyright (C) 2016 Paolo Galeone <[email protected]>
This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0.