To make some of the more recent techniques easier to access for high-level developers, facetools abstracts away the details of the detection and recognition methods found in dlib. It wraps two face detection algorithms from dlib (including a state of the art of the art deep learning method), and a state of the art deep learning face recogniser from dlib.

A similar face searching tool called ‘facegrep’ (inspired by the Unix grep tool) is implemented on top of the facetools framework for when you want to conveniently find friends or family in your photo albums.

]]>While many different methods, and models exist for detecting edges, one simple way is to look for areas of fast changes in pixel gradient, i.e., looking at derivatives (calculus) in an image. The idea is that an edge is more likely occur where there are large sudden changes in contrast.

The first derivative tells us that the rate of change is the most when the first derivative is at a maximum. Since we are dealing with the real world, we cannot calculate a precise derivative for our image at each point. Instead, we will approximate it by taking a finite difference between two points locally, i.e.,

Since directions matter, one needs to consider the direction of change in our 2D image. In practice, only the derivatives along the x, and y axes are computed.

To emphasize the edges in the vertical direction, we would compute differences along the x axis (to estimate the change in y). Similarly, if we are looking for edges along the horizontal direction, we would compute differences along the y axis.

An example 3×3 kernel with uniform weights (*Prewitt kernel*) might look like:

To give further emphasis around the edge region of the current pixel being considered, one could apply a larger weight to the difference of its neighbours.

For example (*Sobel kernel*):

In practice the Sobel kernel gives better edge highlights than the Prewitt kernel.

**Original**

**3×3 Sobel kernel along x axis (vertical lines)**

**3×3 Sobel kernel along y axis (horizontal lines)**

A similar game can be played for things involving the second derivative. We know the first derivative is at a maximum when the second derivative is zero. One mathematical operator that captures this is the Laplacian:

This can be approximated using a finite difference method as:

Translating this to a convolution kernel:

To account for diagonal directions, one could also use:

**3×3 Laplacian kernel (x and y directions)**

Which edge detector will perform better for your image depends on the edge profile of your image. Derivative based filters are often sensitive to noise, so a de-noising step like a blur is often desirable somewhere in your processing pipeline.

]]>Sharpening on the other hand, emphasises differences between neighbouring pixel values, increasing the contrast between pixels.

The unsharp filter works in 2 steps. It identifies areas of high contrast in the image – essentially the edges. Then, it exaggerates these contrasts, and adds them back to the original image. This gives you the intended sharpening effect.

Now the details.

Let S be a smoothing filter on our picture P. Calculate a gradient image G with:

i.e., smooth the image, and subtract it from the original image. You can use any smoothing filter, including the Box or Gaussian filters.

This gradient image G emphasises the parts where the pixel values differ a lot from the average at a particular point, i.e., where there are sharp contrasts (like where edges occur).

**Original**

**G(P), where S is the Box blur with a 3×3 kernel.**

**G(P), where S is the Gaussian blur with a 3×3 kernel, and standard deviation of 2.**

Add a multiple of the G(P) calculated in Step 1 back to the original image P to obtain the sharpened image H.

where is the “exaggeration” factor.

This increases the contrast in parts of the image where there were lots of contrasts, with the effect of exaggerating the contrasts, hence giving a “sharper” look.

**Original**

**Sharpened image H, with a Box filter using a 3×3 kernel**

**Sharpened image H, with a Gaussian filter using a 3×3 kernel and a standard deviation of 2**

You will notice that there are sometimes artifacts in the sharpened image (like above). In general, the more you exaggerate sharpening, the more noise you can potentially introduce.

In the next post, we will look at other examples of edge detection using convolution filters.

]]>Now that we’ve seen what a linear filter is, we will look at some examples of commonly used kernels in image processing for things like blurring, and edge detection. For each, we will briefly discuss the rationale behind picking those particular kernels.

A combination of these different convolutions allows one to do a surprisingly large amount of things, including in computer vision applications.

While some of these operations can be achieved using far more sophisticated approaches, applying a simple convolution often achieves pretty good results.

This post focuses on blurring in particular.

A simple averaging (box filter), and Gaussian blur will be considered. In both cases, the key idea is that you are taking an average over a neighbouring area to produce pixel values that are closer to in its neighbours.

This increased similarity means the picture becomes less defined as visual features which we associate with higher contrasting values, start to blend closer to its neighbours.

A 3×3 box blur kernel looks like

As you can see, you really are just taking an average of the 9 pixel values around your chosen point.

A Gaussian blur is a similar, except instead of giving all the pixels the same weight, we weight them using the Gaussian function. For those, that are mathematically inclined, it looks like

where the is a standard deviation you choose, is the distance in the axis away from our centre, and is the disatnce in the axis away from our centre.

The idea with using a Gaussian is to capture the idea that things that are far away lose their relevance quickly, so their weight should also get small quickly, but in a smooth way. You could also very well choose another weighting function that also penalises distance from origin.

A 3×3 Gaussian blur kernel with standard deviation , might look approximately like

Where do these numbers come from, you ask? Lets refer back to the Gaussian function. When we are at the origin, the distance away from the origin is 0 in the x and y axis, so that

If we shift along the x axis by 1, then

so that moving by a distance of 1 either vertically, or horizontally gives you a Gaussian value that is roughly times the size of the origin.

If we move vertically by 1, then horizontally by 1, our Gaussian

is roughly times the size of the origin.

Hence, the chosen matrix is roughly proportional. If you sum up all the matrix entries, you end up with 50, so we divide the whole thing by 50 to average it out the sum (similar to the box blur).

Integers are chosen to make it easier for a computer to work with. Along the same train of thought, one could also approximate to powers of 2 to make arithmetic even faster on a computer.

Generated with OpenCV blur and GaussianBlur functions.

**Original**

**Box blur 31×31**

**Gaussian blur 31×31 with standard deviation 5**

In the next post we examine sharpening using another convolution.

]]>

This post aims to lay out some of the theory behind linear filters used in image processing. The aim is not to be as general or abstract as possible with the ideas, rather, to specialise towards implementation instead. Some Python code will be presented to illustrate how one can apply these filters.

In a future post, examples of common convolutions used in image processing will be presented.

For the rest of this post, **indexing** of matrices will start at 0, rather than 1. If you have not come across matrices before, you can think of them like 2D arrays. If is a matrix, then refers to the entry of the matrix , where refers to the coordinate, and refers to the coordinate.

Consider the following example. Let be a 4×4 matrix representing an image,

,

and be a matrix of the coefficients used in the linear combinations (the **filter coefficients**),

.

is sometimes called a **kernel**, or a **mask**.

We define a linear filter ,

.

After applying the filter, the new pixel value of the image at coordinate is,

Notice that it is just the weighted sum of the surrounding pixel values around the point (1,1), with the matrix providing the weights. This top-down, left-right multiply and add operation is sometimes called a **correlation operation**.

If instead, we picked the point , the pixel values that are out of bounds, e.g., , are treated as having the value 0.

To be a little more general, if the size of was , for some integer , then

.

With the way the filter is presented here, it is necessary to pick odd numbered dimensions (i.e., ). Note that you can always turn non square, and even dimensional kernels into equivalent square, odd dimensional kernels, so we will just consider the latter.

Alternatively, you can use a **convolution operator** rather than correlation, i.e.,

.

Applying the convolution operator with the example , and , yields

Notice the order of the coefficients is applied in reverse order, from bottom to top, right to left, so to get the same filter using a convolution operation, we would instead define

.

In practice, a lot of kernels have nice rotational symmetry (e.g., rotating the values by multiples of gives the same matrix), making it irrelevant whether you choose to apply a convolution or a correlation operation.

However, since convolutions have nice theoretical properties, there are good reasons why many people present these transformations using convolutions.

One of these nice properties is the idea of separability. If a linear filter is separable, then the key information in the kernel is contained in a column (or row) matrix. This allows us to reduce the number of operations required to compute the filter from at each point, to just by convolving the image with the column kernel , followed by a convolution with its **transpose** ( as a row).

For example, if we take the same image as before, but now with

,

then we can compute a separable filter by first computing

,

and then

.

This is equivalent to the convolution with the 2D kernel

,

which is the matrix product (linear algebra) of , and .

Unfortunately, not all linear filters are separable. While we won’t go into the details, a few comments won’t go amiss. Determining if your 2D kernel matrix is separable is done by checking whether it has **matrix** **rank** (linear algebra) one or that it only has one non-zero **singular value**. To actually calculate what the column vector is, one could use the first column of one of the unitary matrices obtained by the **singular value decomposition (SVD)**, scaled by its only non-zero singular value.

These examples assume a monochrome bitmap image stored in a 2D array.

# Returns a point with coordinates (i,j) of image # after the filter is applied. def filter2D(image, kernel, i, j): result = 0.0 kLen = len(kernel[0]) wdth = len(image[0]) hght = len(image) N = (kLen-1)/2 for k in range(-N, N+1): for l in range(-N, N+1): if i-k>=0 and j-1>=0 and i-k<hght and j-l<wdth: result += image[i-k][j-l]*kernel[N+k][N+l] return result

def convCol(image, kernel): kLen = len(kernel[0]) width = len(image[0]) height = len(image) result = [height * [width * [0]]] N = (kLen-1)/2 for i in range(height): for j in range(width): for k range(-N, N+1): if i-k>=0 and i-k<height: result[i][j] += image[i-k][j]*kernel[N+k] return result def convRow(image, kernel): kLen = len(kernel[0]) width = len(image[0]) height = len(image) result = [height * [width * [0]]] N = (kLen-1)/2 for i in range(height): for j in range(width): for k range(-N, N+1): if j-k>=0 and j-k<width: result[i][j] += image[i][j-k]*kernel[N+k] return result # Returns the entire image with filter applied to it. def separableFilter(image, kernel): fPrime = convCol(image, kernel) return convRow(fPrime, kernel)

Most of the time though, you will probably just want to use something like OpenCV, and get these access to these sorts of functions for free.

[1] Computer Vision: Algorithms and Applications, Richard Szeliski, Springer, 2011. Author homepage: http://szeliski.org/Book

[2] Computer vision: filtering (lecture notes), Raquel Urtasun: http://www.cs.toronto.edu/~urtasun/courses/CV/lecture02.pdf

[3] Separable convolutions, Steve Eddins: http://blogs.mathworks.com/steve/2006/10/04/separable-convolution

[4] Separable convolutions – Part 2, Steve Eddins: http://blogs.mathworks.com/steve/2006/11/28/separable-convolution-part-2

[5] Basics of convolutions: http://matlabtricks.com/post-3/the-basics-of-convolution

]]>

Most classes have a set of lecture notes. These are a valuable source of learning, and revision for many students. They also give an opportunity for the lecturer to delve into a level of detail not possible during lectures due to various constraints.

In the modern age of the PowerPoint presentation (or Beamer for the mathematically inclined), teachers often produce another set of materials in the form of lecture slides, solely for presentation. They are often much lighter on the intricate details than the notes, but still overlap significantly with the notes. In this way, many teachers find this activity rather loathesome, as they are duplicate a lot of work into a different form.

Thankfully, together with the Beamer package, gives one the ability to maintain a single repository of material from which lecture notes, and lecture slides can be produced. If you are new to LaTeX, I recommend reading the introductory article from the LaTeX Project in the references below.

The Beamer package is mostly well known for producing slide style presentations. It also has a less well known ability to tag special sections of a document using a special environment called a frame.

You can then produce two publications from the same latex source document, with the flexibility to tag content for notes only, slides only, or for both.

The following PDFs are produced from the same collection of source documents.

Notes: Sample lecture notes

Slides: Sample lecture slides

Since it can be tedious, and inconvenient for a lot of people to configure, and set this up for themselves, I produced a template system that people can use, with instructions on how to use it. For those people who can make use of a Makefile (most Linux, and Mac OS users), it’s as easy as typing the following in a terminal, at the project root directory:

` make `

to produce all documents, or

` make notes `

` make slides `

to generate notes, or slides, respectively.

You can find it listed on the Projects page, or follow this direct link: [Download]

[1] An introduction to LaTeX (LaTeX Project)

[2] Download LaTeX for your operating system (Linux, Mac OS, Windows, or use it online)

]]>We’ll demonstrate a way to use recursive template functions to evaluate a factorial at compile time, and do loop unpacking. A rough evaluation of the efficacy of these hand micro-optimisations is also considered.

This means substituting a sum using a for-loop with the full summand on a single line. For example, replacing

for(int i=1; i<=3; i++) sum += A[i];

with

sum += A[1] + A[2] + A[3];

In principle, you might save on the overhead maintaining the loop, and take advantage of any optimisations for single line evaluations.

template<int n> class sum { public: static inline int add(int *A) { return A[n] + sum<n-1>::add(A); } }; template<> class sum<0> { public: static inline int add(int *A) { return A[0]; } }; int main() { int A[3] = {2, 5, 10}; cout << "adding " << sum<2>::add(A) << "\n"; return 0; }

template<int n> inline int add(int *A) { return A[n] + add<n-1>(A); } template<> inline int add<0>(int *A) { return A[0]; } int main() { int A[3] = {2, 5, 10}; cout << "adding " << add<2>(A) << "\n"; return 0; }

As long as the value is something that can be computed at compile time, this allows for constant time access to those values at run time.

constexpr int metafactorial(int n) { return n <= 1? 1 : (n * metafactorial(n - 1)); }

template<int i> class Factorial { public: enum {factorial = i*Factorial<i-1>::factorial}; }; template<> class Factorial<1> { public: enum {factorial = 1}; }; int main() { cout << Factorial<10>::factorial << "\n"; }

So is this actually a good idea to do? I guess with almost everything in life, context is everything. Development time aside, lets look at some metrics.

LinuxMint 18 with 4.4.0 kernel, running on an Intel i7 4770K with 32 GB RAM, and gcc/g++ 5.4.0.

The benchmark program is not particularly interesting (can provide source if people want it). It does a run time calculation on the factorial computation, and the summation computation.

The factorial computation uses n = 200000, and compares a regular recursive factorial function against against the meta programming approach.

For the loop unpacking, it compares the for-loop, against the two approaches listed above. Stack and heap allocated arrays are tested. Elements are dynamically initialised to values between 0 and 2. The array sizes are 90000 each.

The enum recursion was unable to be tested for anything meaningfully large(even n=20 was too much), as integer overflows were treated as compiler errors, and attempting to evaluate really large factorials also caused a compiler segmentation fault.

In principle, one should ideally run an isolated benchmark for each of those modules individually. I run them in batch, only with the aim of giving a ball park comparison.

The calculations are averaged over 5 repeated samples.

**for-loop and normal recursive factorial function :**negligible compile time.**Full benchmark with -O0 optimisation:**~ real 8m16.409s, user 8m15.776s, sys 0m0.684s**Full benchmark with -O2 optimisation:**~ real 9m9.529s, user 9m11.992s, sys 0m3.296s

That’s a huge difference in compile time. The templating engine does a whole lot of extra work.

**for-loop and normal recursive factorial function :**~ 13 KB**Full benchmark with -O0 optimisation:**~ 23.9 MB.**Full benchmark with -O2 optimisation:**~ 1.1 MB.

The difference in binary sizes is stark, with 13 KB on the low end, and almost 24MB at the high end!

This was a lot harder to analyse, since modern compilers are magical unicorns that spread magic, sometimes even when you ask it not to. But let’s try anyway.

**Full benchmark with -O0 optimisation**

The *factorial calculation* had the most noticeable gain.

- Normal recursive function: ~ 7.1 e-3 s
- Approach 1 (function recursion): ~ 1.5 e-3 s

The constexpr function looks to be almost 5 times faster with these parameters. One would expect the meta programming approach to stay about constant as n got larger, whereas the normal recursive function would scale with n.

Now let’s try the *loop unpacking*.

Stack array

- for-loop: ~3.6 e-4 s
- Approach 1(template class): ~ 5.9 e-3 s
- Approach 2(template function): ~ 8.4 e-3 s

Heap array

- for-loop: ~ 3.3 e-4 s
- Approach 1 (template class): ~ 5.1 e-3 s
- Approach 2 (template function): ~ 6.0 e-3 s

What’s interesting is that the for loop created faster code than the template methods. I speculate that there could be several reasons for this, like the compiler being able to recognise the for loops better for optimisation, or extra baggage at OS or hardware from the templating itself (memory use, etc).

What’s also interesting is the slight decrease in time taken to access the heap array.

**Full benchmark with -O2 optimisation**

Since this is the most commonly used optimisation level in gcc/g++, these results are probably of most interest to us.

*Factorial calculation:*

- Normal recursive function: ~ 5.1 e-5 s
- Approach 1 (function recursion): 0 s

While meta programming is noticeably faster, the normal recursive function has closed the gap significantly (faster by 2 orders of magnitudes), and for our choice of n, the difference is negligible.

Now let’s try the *loop unpacking*.

Stack array

- No optimisation: ~7.1 e-5 s
- Approach 1 (template class): ~ 1.5 e-4 s
- Approach 2 (template function): ~ 1.5 e-4 s

Heap array

- No optimisation: ~ 5.7 e-5 s
- Approach 1 (template class): ~ 4.9 e-5 s
- Approach 2 (template function): ~ 7.0 e-5 s

It seems like the compiler does a better job optimising for-loops than templates, with the exception of the template class method that is accessing the heap array. Approach 1 appears to outperform Approach 2.

This little experiment has raised some more questions about heap vs stack access times with this particular test setup. Interestingly, the slightly faster heap access does not seem to be a fluke as it was seen consistently across all tests.

The take home message, by and large, is that attempting to beat compilers at micro-optimisations is a fool’s errand for these examples. You should spend your time doing something else instead (unless of course, long compile times are your objective).

Computing literals may be an exception to this, as it consistent beat the run time of the naive method. However, as more compiler optimisations were enabled, the gap closed significantly. Nevertheless, if you are computing constants, with non trivial algorithmic run times, for fast lookup, then it is worth considering. You could also just cache that information in a file, and load it into your program.

At the end of it all, if you still insist on using template meta programming for something like loop unpacking, you should favour using Approach 1 (template classes).

In C++14, one might try using STL tuples instead, with variadic templates to gain efficiency. Then again, it might not.

]]>

**Mr Potatohead**

**Kawekas**

**Kea**

**Te Urewera National Park**

**Te Urewera National Park**

**Te Urewera National Park**

**Mt Cook National Park – obligatory cliche Mt Cook photo.**

**Mt Cook National Park – Mueller Hut**

**Mt Cook National Park – Kitchener Peak**

**Mt Ruapehu – Whakapapa**

**That time I got snowed on sleeping in a hut**

**Mt Alfred**

**A spare nut from a totally safe swing bridge**

**Abel Tasman National Park**

**Mt Doom**

**Cliff’s friend’s cute puppy**

**Green things in America**

**This is what a frozen beach looks like**

**More frozen beach**