An introduction to convolutions for image processing

Many techniques in image processing involve transformations that  make use of neighbouring information.  One simple class of transformations just takes a linear combination of some fixed neighbours. This class of transformation can be used to perform operations like blurring, sharpening, and edge detection. A surprisingly large amount of image manipulation can be done effectively using a combination of these transformations. They are sometimes referred to as linear filters.

This post aims to lay out some of the theory behind linear filters used in image processing. The aim is not to be as general or abstract as possible with the ideas, rather, to specialise towards implementation instead. Some Python code will be presented to illustrate how one can apply these filters.

In a future post, examples of common convolutions used in image processing will be presented.

For the rest of this post, indexing of matrices will start at 0, rather than 1. If you have not come across matrices before, you can think of them like 2D arrays. If A is a matrix, then A(i,j) refers to the (i,j) entry of the matrix A, where i refers to the y coordinate, and j refers to the x coordinate.

Theory

Consider the following example. Let I be a 4×4 matrix representing an image,

I = \begin{bmatrix} 11 & 12 & 13 & 14 \\ 15 & 16 & 17 & 18 \\ 19 & 20 & 21 & 22 \\ 23 & 24 & 25 & 26 \end{bmatrix},

and H be a matrix of the coefficients used in the linear combinations (the filter coefficients),

H = \begin{bmatrix} 1 & 2 & 3\\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}.

H is sometimes called a kernel, or a mask.

We define a linear filter F,

\displaystyle{F(i,j) := \sum_{k=-1}^1 \sum_{\ell = -1}^1 \Big( I(i + k, j + \ell) H(1+k,1+\ell)\Big)}.

After applying the filter, the new pixel value of the image at coordinate (1,1) is,

\begin{aligned} F(1,1) = 798 = \, &11 \cdot 1 + 12 \cdot 2 + 13 \cdot 3 \, + \\ &15 \cdot 4 + 16 \cdot 5 + 17 \cdot 6 \, + \\ &19 \cdot 7 + 20 \cdot 8 + 21 \cdot 9 \end{aligned}

Notice that it is just the weighted sum of the surrounding pixel values around the point (1,1), with the matrix H providing the weights. This top-down, left-right multiply and add operation is sometimes called a correlation operation.

If instead, we picked the point (0,1), the pixel values that are out of bounds, e.g., (-1, 1), are treated as having the value 0.

To be a little more general, if the size of H was (2N+1) \times (2N+1), for some integer N>0, then

\displaystyle{F(i,j) := \sum_{k=-N}^N \sum_{\ell=-N}^N I(i+k, j + \ell) H(N+k, N+\ell)}.

With the way the filter is presented here, it is necessary to pick odd numbered dimensions (i.e., 2N+1).   Note that you can always turn non square, and even dimensional kernels into equivalent square, odd dimensional kernels, so we will just consider the latter.

Alternatively, you can use a convolution operator rather than correlation, i.e.,

\displaystyle{F(i,j) := \sum_{k=-N}^N \sum_{\ell = -M}^M I(i-k, j-\ell) H(N+k, M + \ell)}.

Applying the convolution operator with the example H, and I, yields

\begin{aligned} F(1,1) = 642 = \, &21 \cdot 1 + 20 \cdot 2 + 19 \cdot 3 \, + \\ &17 \cdot 4 + 16 \cdot 5 + 15 \cdot 6 \, + \\ &13 \cdot 7 + 12 \cdot 8 + 11 \cdot 9 \end{aligned}

Notice the order of the coefficients is applied in reverse order, from bottom to top, right to left, so to get the same filter using a convolution operation, we would instead define

H = \begin{bmatrix} 9 & 8 & 7\\ 6 & 5 & 4 \\ 3 & 2 & 1 \end{bmatrix}.

In practice, a lot of kernels have nice rotational symmetry (e.g., rotating the values by multiples of 90^\circ gives the same matrix), making it irrelevant whether you choose to apply a convolution or a correlation operation.

However, since convolutions have nice theoretical properties, there are good reasons why many people present these transformations using convolutions.

Separability

One of these nice properties is the idea of separability.  If a linear filter is separable, then the key information in the kernel is contained in a column (or row) matrix. This allows us to reduce the number of operations required to compute the filter from (2N+1)^2 at each point, to just 2 (N+1) by convolving the image with the column kernel H, followed by a convolution with its transpose H^T (H as a row).

For example, if we take the same image I as before, but now with

H = \displaystyle{\begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} \qquad H^T = \begin{bmatrix} 1 & 2 & 3 \end{bmatrix}} ,

then we can compute a separable filter F by first computing

\displaystyle{F^\prime(i,j) := \sum_{k=-1}^1 I(i-k, j) H(1+k, 1)},

and then

\displaystyle{F(i,j) := \sum_{\ell=-1}^1 F^\prime(i, j-\ell) H^T(1,1+k)}.

This is equivalent to the convolution with the 2D kernel

\displaystyle{ \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \\ 3 & 6 & 9 \end{bmatrix} = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} \begin{bmatrix} 1 & 2 & 3 \end{bmatrix} },

which is the matrix product (linear algebra) of H, and H^T.

Unfortunately, not all linear filters are separable. While we won’t go into the details, a few comments won’t go amiss.  Determining if your 2D kernel matrix is separable is done by checking whether it has matrix rank (linear algebra) one or that it only has one non-zero singular value. To actually calculate what the column vector is,  one could use the first column of one of the unitary matrices obtained by the singular value decomposition (SVD), scaled by its only non-zero singular value.

Source code

These examples assume a monochrome bitmap image stored in a 2D array.

Applying a 2D convolution to one position

# Returns a point with coordinates (i,j) of image 
# after the filter is applied.
def filter2D(image, kernel, i, j):
  result = 0.0
  kLen = len(kernel[0])
  wdth = len(image[0])
  hght = len(image)
  N = (kLen-1)/2
  for k in range(-N, N+1):
    for l in range(-N, N+1):
      if i-k>=0 and j-1>=0 and i-k<hght and j-l<wdth:
        result += image[i-k][j-l]*kernel[N+k][N+l]
  return result

Applying a separable filter to an image

def convCol(image, kernel):
  kLen = len(kernel[0])
  width = len(image[0])
  height = len(image)
  result = [height * [width * [0]]]
  N = (kLen-1)/2
  for i in range(height):
    for j in range(width):
      for k range(-N, N+1):
        if i-k>=0 and i-k<height:
          result[i][j] += image[i-k][j]*kernel[N+k]
  return result

def convRow(image, kernel):
 kLen = len(kernel[0])
 width = len(image[0])
 height = len(image)
 result = [height * [width * [0]]]
 N = (kLen-1)/2
 for i in range(height):
   for j in range(width):
     for k range(-N, N+1):
       if j-k>=0 and j-k<width:
         result[i][j] += image[i][j-k]*kernel[N+k]
 return result

# Returns the entire image with filter applied to it.
def separableFilter(image, kernel):
  fPrime = convCol(image, kernel)
  return convRow(fPrime, kernel)

Most of the time though, you will probably just want to use something like OpenCV, and get these access to these sorts of functions for free.

References

[1] Computer Vision: Algorithms and Applications, Richard Szeliski, Springer, 2011. Author homepage: http://szeliski.org/Book

[2] Computer vision: filtering (lecture notes), Raquel Urtasun: http://www.cs.toronto.edu/~urtasun/courses/CV/lecture02.pdf

[3] Separable convolutions, Steve Eddins: http://blogs.mathworks.com/steve/2006/10/04/separable-convolution

[4] Separable convolutions – Part 2, Steve Eddins: http://blogs.mathworks.com/steve/2006/11/28/separable-convolution-part-2

[5] Basics of convolutions: http://matlabtricks.com/post-3/the-basics-of-convolution

Leave a Reply

Your email address will not be published. Required fields are marked *