четвъртък, 10 септември 2015 г.

DWT element for Gstreamer


A few words

Wavelet transformations play important role in signal processing. They have broad range of applications including but not limited to audio/video compression, noise reduction (and blur effect), edge detection and many more.
There are many excellent resources on the web so here I omit the mathematics. In order to understand how DWTs work in practice it will be beneficial for us to have a live gstreamer element which can be hooked into a pipeline and this way we can see how it transforms images in real time. Here I demonstrate a simple Gstreamer element which uses the GNU scientific library (GSL) to perform discreet wavelet transform on an image. The complete source code can be found on github.

About the element

The element is based on the standard gstreamer's plugin template (boilerplate). At the time of writing it accepts only video/x-raw,format=GRAY8 format and outputs the same. Also the width and the height of the image should be the same size and that size should be \(2^N\) due to the restrictions of the gsl_wavelet routines.
The element supports a few properties which would allow us to experiment with different wavelet families- Haar, Daubechies, and biorthogonal B-spline, different member etc. The properties of the element are:
  • wavelet- Fully specifies the wavelet's family and member number. The wavelet is specified by a single letter 'h' (Haar), 'd' (Daubechies), 'b'(B-spline) followed by the number of the member, i.e. 'h2' means Haar wavelet number 2. The centered versions are available if we add 'c'- 'dc4' produces the Daubechies's centered wavelet number 4.
  • band- Specifies if the element is acting as low-pass or high-pass filter.
  • inverse- whether or not an inverse DWT should be applied after the filtering
  • cutoff- the number of modes in wavelet basis to be zeroed.
In the rest of the text we are going to use mainly the Haar wavelet with number \(2\) unless explicitly specified. The stucture of the data after transformations with the Daubechies and B-spline wavelets is simillar- the most important difference is that these wavelets as basis vectors overlap.

Plugin the element into a working pipeline

Now as we have some brief description of the element, let's see what we can do with it. First off we start with simply attaching the element to a pipeline just to make sure its CAPS are compatible with the rest of the gstreamer elements. We could achieve this with the pipeline:
gst-launch-1.0 videotestsrc ! videoconvert ! video/x-raw,width=512,height=512 ! dwtfilter wavelet=h2 band=high inverse=true cutoff=0 ! videoconvert ! ximagesink
The element takes the image data, transforms it to wavelet domain and then transforms it back. The net effect in the above pipeline is that our element serves just as a very computationaly expensive identity element. The output looks like
Fig 1. Original image

An image in wavelet domain

Now we can actually see the wavelet transformation our element is performing if we disable the inverse transformation
gst-launch-1.0 videotestsrc ! videoconvert ! video/x-raw,width=512,height=512 ! dwtfilter wavelet=h2 band=high inverse=false cutoff=0 ! videoconvert ! ximagesink
Now the element transforms in-place the input image into wavelet basis but does not transform it back. The output looks like
Fig 2. The image in wavelet domain
The above figure is an important demonstration of the way an image is stored after a wavelet transform. The modes to the top left are the "slow" modes further down and right are the "fast" modes. Let's assume we have a black and white image of size \(2^3 \times 2^3\). It can be represented as the matrix:
\(a_{1 1}\)\(a^{ }_{1 2}\) \(a_{1 3}\)\(a_{1 4}\) \(a_{1 5}\)\(a_{1 6}\) \(a_{1 7}\)\(a_{1 8}\)
\(a_{2 1}\)\(a_{2 2}\) \(a_{2 3}\)\(a_{2 4}\) \(a_{2 5}\)\(a_{2 6}\) \(a_{2 7}\)\(a_{2 8}\)
\(a_{3 1}\)\(a_{3 2}\) \(a_{3 3}\)\(a_{3 4}\) \(a_{3 5}\)\(a_{3 6}\) \(a_{3 7}\)\(a_{3 8}\)
\(a_{4 1}\)\(a_{4 2}\) \(a_{4 3}\)\(a_{4 4}\) \(a_{4 5}\)\(a_{4 6}\) \(a_{4 7}\)\(a_{4 8}\)
\(a_{5 1}\)\(a_{5 2}\) \(a_{5 3}\)\(a_{5 4}\) \(a_{5 5}\)\(a_{5 6}\) \(a_{5 7}\)\(a_{5 8}\)
\(a_{6 1}\)\(a_{6 2}\) \(a_{6 3}\)\(a_{6 4}\) \(a_{6 5}\)\(a_{6 6}\) \(a_{6 7}\)\(a_{6 8}\)
\(a_{7 1}\)\(a_{7 2}\) \(a_{7 3}\)\(a_{7 4}\) \(a_{7 5}\)\(a_{7 6}\) \(a_{7 7}\)\(a_{7 8}\)
\(a_{8 1}\)\(a_{8 2}\) \(a_{8 3}\)\(a_{8 4}\) \(a_{8 5}\)\(a_{8 6}\) \(a_{8 7}\)\(a_{8 8}\)
Where every entry \(a_{i j}\) represents the brightness of the corresponding point- i.e. the element \(a_{1 1}\) corresponds to the brightness of the top left corner. After the transformation, the data of the image in wavelet domain looks like:
\(s\) \(x^1_{1,1}\) \(x^2_{1,1}\) \(x^2_{1,2}\) \(x^3_{1,1}\) \(x^3_{1,2}\) \(x^3_{1,3}\) \(x^3_{1,4}\)
\(y^1_{1,1}\) \(d^1_{1,1}\) \(x^2_{2,1}\) \(x^2_{2,2}\) \(x^3_{2,1}\) \(x^3_{2,2}\) \(x^3_{2,3}\) \(x^3_{2,4}\)
\(y^2_{1,1}\) \(y^2_{1,2}\) \(d^2_{1,1}\) \(d^2_{1,2}\) \(x^3_{3,1}\) \(x^3_{3,2}\) \(x^3_{3,3}\) \(x^3_{3,4}\)
\(y^2_{2,1}\) \(y^2_{2,2}\) \(d^2_{2,1}\) \(d^2_{2,2}\) \(x^3_{4,1}\) \(x^3_{4,2}\) \(x^3_{4,3}\) \(x^3_{4,4}\)
\(y^3_{1,1}\) \(y^3_{1,2}\) \(y^3_{1,3}\) \(y^3_{1,4}\) \(d^3_{1,1}\) \(d^3_{1,2}\) \(d^3_{1,3}\) \(d^3_{1,4}\)
\(y^3_{2,1}\) \(y^3_{2,2}\) \(y^3_{2,3}\) \(y^3_{2,4}\) \(d^3_{2,1}\) \(d^3_{2,2}\) \(d^3_{2,3}\) \(d^3_{2,4}\)
\(y^3_{3,1}\) \(y^3_{3,2}\) \(y^3_{3,3}\) \(y^3_{3,4}\) \(d^3_{3,1}\) \(d^3_{3,2}\) \(d^3_{3,3}\) \(d^3_{3,4}\)
\(y^3_{4,1}\) \(y^3_{4,2}\) \(y^3_{4,3}\) \(y^3_{4,4}\) \(d^3_{4,1}\) \(d^3_{4,2}\) \(d^3_{4,3}\) \(d^3_{4,4}\)
Unlike the original form where every entry has the same role and it is equally important, in the above representation different elements have different importance. For example, the top-left element \(s\) plays a role similar to an average (althought it's not exactly an average), so it contributes to every pixel in the standard form. Also the elements \(x^1, y^1, d^1\) have global contribution and carry information about the gradient from left to right, top to bottom, and along the diagonal. Next are the \(x^2, y^2, d^2\) elements each of them carries information for only a half of the image (either left or right half, top or bottom half, or along the main diagonal. The pattern should be clear- as we move further from the origin \(s\), the modes start to carry more detailed or "higher frequency" information. Now as we know something about the meaning of the different modes in wavelet domain, we can start to experiment by removing some of the modes and seeing how it transforms the image.

Using the element as high-pass filter

So far we have only demonstrated how an image looks when transformed to wavelet domain. In the next two sections we will experiment with the filter capabilities of the element. The first use would be the high-pass filter. In order to do that we need to remove the modes residing in the top-left corner of the data- starting with \(s\), \(x^1, y^1, d^1\), and so on. We should keep in mind that in order to do the filtering uniformly across the whole image, we should manipulate all the entries up to a given level of detail.We can start with removing \(s\)- it is just one element, than we can remove the \(x^1, y^1, d^1\) entries- we need to remove a \( 2\times 2\) block. In order to filter-out the \(x^2, y^2, d^2\) entries as well, we would need to manipulate the top-left block of size \( 4\times 4\) and so on... First we transform the image and cut-off the first 64 modes. The pipeline that does that is
gst-launch-1.0 videotestsrc ! videoconvert ! video/x-raw,width=512,height=512 ! dwtfilter wavelet=hc2 band=high inverse=false cutoff=64 ! videoconvert ! ximagesink
The image in wavelet domain looks like
Fig 3. Image in wavelet domain after applying high-pass filter
We can clearly see that the square block that contains the lower frequency modes in the top left is been nullified. Now let's see what this filtering did to the image
gst-launch-1.0 videotestsrc ! videoconvert ! video/x-raw,width=512,height=512 ! dwtfilter wavelet=hc2 band=high inverse=true cutoff=64 ! videoconvert ! ximagesink
Fig 4. The inverse transformed image after applying high-pass filter
As we can see our filter preserved only high-frequency features of the image- i.e. edges and noise (the snowy block in the lower right corner). That way the element can be used as an imperfect edge detector. By removing even more of the lower frequency modes from the image, we can make the edge detection more accurate, but some of the initial edges are lost. Sometimes better results can be obtained if we switch to another wavelet family.
gst-launch-1.0 videotestsrc ! videoconvert ! video/x-raw,width=512,height=512 ! dwtfilter wavelet=hc2 band=high inverse=true cutoff=128 ! videoconvert ! ximagesink
Fig 5. The inverse transformed image after applying high-pass filter II

Using the element as low-pass filter

Let's now try to use the element as low-pass filter. If we take a look again in Fig 2 we can see that that there are 3 times more modes \(x^2, y^2, d^2\) than all the modes of lower order together. The same is valid for modes of order \(x^3, y^3, d^3\)- their number is 3 times the number of all lower frequency modes and so on... This means that if we want to remove only the finer details from the image, we need to remove \(75\%\) of the total modes in the image. In the next pipeline we start with an image of size \(512 \times 512 \) (having \(512 \times 512 \) modes). Then we remove the highest detail modes, so we end up with \(256 \times 256 \) modes which is only a quarter of the initial data.
gst-launch-1.0 videotestsrc ! videoconvert ! video/x-raw,width=512,height=512 ! dwtfilter wavelet=hc2 band=low inverse=false cutoff=256 ! videoconvert ! ximagesink
Fig 6. Image in wavelet domain after applying low-pass filter
The next thing is to apply the inverse transform to the image and see how well it was preserved with \(25\%\) of the data.
gst-launch-1.0 videotestsrc ! videoconvert ! video/x-raw,width=512,height=512 ! dwtfilter wavelet=hc2 band=low inverse=true cutoff=256 ! videoconvert ! ximagesink
Fig 7. The inverse transformed image after applying low-pass filter
If we compare with Fig 1 we see that the image was restored quite well from only of quarter of the initial data. Let's push and remove the next level of details. This means we preserve only \(128 \times 128 \) modes which reduces the size of the data by factor of \( 16 \).
gst-launch-1.0 videotestsrc ! videoconvert ! video/x-raw,width=512,height=512 ! dwtfilter wavelet=hc2 band=low inverse=false cutoff=128 ! videoconvert ! ximagesink
Fig 8. Image in wavelet domain after applying low-pass filter II
The the inverse transformed image is:
Fig 9. The inverse transformed image after applying low-pass filter II
Again the image is quite well preserved. Let's make the ultimate push and leave only block with \(16 \times 16 \) in wavelet domain. This is \( 0.1\% \) of the size of the initial data!
gst-launch-1.0 videotestsrc ! videoconvert ! video/x-raw,width=512,height=512 ! dwtfilter wavelet=hc2 band=low inverse=true cutoff=32 ! videoconvert ! ximagesink
The image is still recogniseble
Fig 10. The inverse transformed image after applying low-pass filter III
Although this strategy looks very promising, one can argue that the image we started with wasn't very complex. That's true- a quality photo would be ruined indeed if we preserve only \( 1\% \) of its data. Still probably removing only the highest details would be a good approximation. Another application of the low-pass filter is to remove noise from noisy images- we can see that by removing the high-frequency modes, the noisy part in the bottom right corner get smoothed.