Software Playgrounds

четвъртък, 4 октомври 2018 г.

Solving Atari's Lunar Lander with TensorFlow and Evolution

Overview

In this post we discuss the implementation of a very simple evolution machine learning algorithm which aims to solve the Atari's Lunar Lander game using OpenAI.

The algorithm is a very basic implementation of an Evolution strategy. The idea is inspired by Harmaru's A Visual Guide to Evolution Strategies and the article Donal Byrne's PyTorch Landing A Rocket With Simple Reinforcement Learning. Here we provide a very simple Tensorflow implementation. The complete source code can be found on Github. The requirements of this project are:

ipython-jupyter
numpy
Tensorflow
OpenAI Gym
matplotlib (optional)

Idea of the algorithm

The algorithm can be seen as a very "extreme" version of an evolution algorithm- only the best one survives. Here it is how it works:

Generate \( N_a \) agents.
Every agent plays \( N_g \) games. Calculate the average score per agent.
Only the agent with the highest score survives. Produce the next generation by mutating the best performing agent. In our case we use Gaussian noise centered around \( 0 \) and with dispersion \( \sigma \).
Repeat steps 2 and 3 until desired score is reached.

Implementation

The Agent class

The agents are implemented as a Python classes. Each agent is an instance of the Agent class. The class contains all the necessary infrastructure of an agent- it defines the agent's neural network together with the main operations- initialization, inheritance, mutation, and some helpers.
First off we define the structure of the neural network in the constructor __init__() of the class. It consists of an input layer, two hidden dense layer with ReLu activation functions, and a softmax output layer. The softmax layer always outputs non-negative values which sum to \( 1 \). We will interpret thise outputs as probabilities for taking a certain action from the action space of the game. Note that all Tensorflow variables of our agent are defined within the variable scope, which in turn is the name of the agent- we can later use that naming in order to address the training variables of an agent.

Next, we need some helpers: zero() method nullifies all the entries within the neural network of an agent. train_vars() method returns the Tensorflow trainable variables which represent an agent's neural network.

Notable methods are: from_population()- it replaces the data associated to an agent with a combination of the data of a collection of predecessor agents (previous generation). Each agent is assigned a weight. In this particular version of the algorithm we always supply only one predecessor- the best scoring agent with weight one.
from_agent()- initializes the agent with the data of another agent, with the option to specify a noise function. The noise function is our mutation mechanism. Different noise functions can be specified. In this particular version, we use a Gaussian distribution centered around zero with sigma specified as a hyper parameter.
compute()- takes an observation vector and returns action probabilities as calculated by the agent's internal neural network.

Play

The play() function accepts an OpenAI environment, an Agent object, number of games, max steps per game and a boolean render. The function returns the accumulated scores from all games. This routine uses Agent.compute() in order to choose agent's actions during a sequence of games.

The learning algorithm

Now as we have defined the behaviour of our agents, it is time to implement the learning routine. The learning routine follows closely the basic idea of the algorithm as described in the previous section. We use the play() routine to generate an accumulative score for each agent within a generation. After that we use the Agent.from_population() and Agent.from_agent() in order to inherit the best scoring agent as a seed for the next generation. We also store the average and the maximum score for the generation. The history of these provides a good benchmark of the learning process. The rest of the code is time keeping and some "interactive" control- i.e. we use a file called `1.ctl` in order to indicate when the learning is to be interrupted.

Learning curve

As we have all the infrastructure, neural networks and algorithms in place, the last ingredient are the hyper parameters of the training. For that particular run the following values were chosen:

num_agents = 10 (corresponds to \( N_a \) )
num_games_per_agent = 50 (corresponds to \( N_g \) )
max_steps = 300
hsizes = [24, 24] sizes of the two hidden layer within the neural network of an agent
sigma = 0.03 the dispersion of the Gaussian noise we use for mutations

With the above hyper-parameters the model was run for approx 300 generations. The progress of the learning is depicted on the figure.

The plot clearly shows that the agents `evolve` towards better performance. We demonstrate an agent in action in the animation below. It is visible that the agent is able to control the speed and the direction of the lander and to point it towards the landing zone.

Space for improvements

As mentioned before, this is a vanilla implementation of an evolution algorithm, so there are many places where the improvement. Here we list a few:

\( N_g \) is too big. We might increase the speed of the training if we decrease the number every agent plays. The parameter \( N_g > 1 \) only because we need to have statistical certainty about the relative performance of agents within the same generation
We throw away \( 90% \) of the information. We eliminate \( 9 \) out of \( 10 \) agents per generation. The second and third best agents also contain useful information. Even the worst performing agent contains information on what kind of behavior to be avoided. Harmaru describes some modifications of the algorithm which allow for generating the next generation as a mixture of multiple parents.
\( \sigma \) is kept constant during the whole training process. Maybe we can use adaptive \( \sigma \) during the training curve- keeping high values at the beginning of the training and decreasing it towards the saturation of our learning curve. This would allow for faster training at the beginning and more precise adjustments towards the end.
The algorithm might perform comparable with smaller hidden layers. That would make the training faster, but the only way to know for sure, is to experiment.

Final remarks

In comparison to other RL learning like policy gradient and deep Q-learning, this evolution strategy algorithm is easier to implement and test. This makes it a good candidate for starters. Another important advantage of the algorithm is that it is very easy to parallelize- each agent can run on its own process or a machine.

вторник, 8 август 2017 г.

Reinforcement learning: Demon Attack

Overview

In this post we discuss the inner workings of a simple RL agent which learns how to play the classic Atari game Demon Attack.

The algorithm is a basic implementation of Policy gradient, which is a type of a reinforcement learning algorithm. For more details: Karpathy and Juliani provide very good introductions to the Policy gradient. The full source code of the algorithm as a jupyter notebook can be found on Git. The agent operates within a OpenAI Gym environment and its policy is implemented in Tensorflow. Hence the requirements:

ipython-jupyter
numpy
Tensorflow
OpenAI Gym
matplotlib (optional)

Idea of the algorithm

On a top level, the algorithm can be represented by the following block diagram.

It has two main stages:

Dimensionality reduction
Training

Dimensionality reduction

This first stage is essentially a preprocessing stage. The goal is to detect the actual dimension of the input data. Every state of the game is represented by an array with 128 integer values. Our agent uses either 2 (or optionally 3) consecutive frames while making a decision. The reason for using more than one frame is to provide also dynamic information- i.e. the difference between two consecutive frames contains information about all moving objects- bullets, enemies, etc. In total 2 consecutive observations of the environment represent \( n_d = 256 \) dimensional states. We can use that collection of states as a single input to our neural network. However, we can use Principal Component Analysis in order to estimate the effective dimensionality of our input data. An observation of the environment contains some opaque to us representation of the state of the game. This state encodes information about the position in space of the agent, the enemies, current lives, bullets, etc. Intuitively we can suspect that not all possible states are observed- i.e. there are some restrictions with respect to the relative positions of the agents on the screen, for example.

In order to "measure" the dimensionality of an observation, we need to obtain a collection of many observations. The easiest way to do so is to let an uninitialized or a random walk agent to play several games while we collect the history of the encountered states. In our runs we have used typically \( n_g=200 \) games with maximum number of allowed steps \( 3000 \). These leads to a maximum of \( n_s=600000 \) collected states. We represent them as the matrix \( S \) of size \( 600000 \times 256 \). Each row of the matrix is a separate state and each column represents the evolution in time of the state entry enumerated by the number of the column. In what follows we will be more interested in the rows of the matrix.

After the data is collected, we perform Singular Value Decomposition (SVD) on the matrix \( S \). More detailed explanation of SVD can be found at www.deeplearningbook.org. It is essentially a decomposition of the matrix into three matrices with known properties:

\( S = U . s . V \)

\( U \) is a unitary matrix which is constructed by the eigenvectors of the matrix \( SS^{T} \). Its size is \( n_s \) by \( n_s \).

\( s \) is zero everywhere except the elements on the main diagonal. They have positive values in descending order- the so called singular values SV. We denote them with \( \lambda \).

\( V \) is a unitary matrix which is constructed by the eigenvectors of the matrix \( S^{T}S \). Its size is \( n_d \) by \( n_d \).

We are interested in the the relative magnitude of the singular values and their corresponding vectors from \( V \). The bigger the SV which corresponds to a vector from \( V \), the bigger is the contribution of the given mode. It turns out that the magnitudes of the different SVs differ by several orders of magnitude...

As we can see the difference between biggest and the smallest SV is \( 18 \) orders of magnitude. This is to be expected as intuitively we know that the not possible states are observed. So this allows us to approximate our observations with a very high degree of accuracy by using only vectors from \( V \) which correspond to the biggest SV's. In our experiments we choose to use the first \( 160 \) vectors. This allows us to reduce the dimension of the input data from \( 256 \) to \( 160 \) while still working with accuracy of about \( 10^{-6} \).

Training

After we have reduced the dimensionality of the input data, we can proceed with implementing the agent and the training algorithm.
The agent is implemented as a simple neural network with one hidden layer with \( 150 \) neurons. The degrees of freedom of our NN correspond to the parameters of the following matrices and vectors: \( W^{1}_{160, 150} \) , \( b^{1}_{150} \), \( W^2_{150, 6} \) , and \( b^2_{6} \). The matrices \( W^1 \) and \( W^2 \) correspond to the connections between the neurons of the input layer and the hidden layer and between the hidden layer and the output layer. The vectors \( b^1 \) and \( b^2 \) are the corresponding biases. The activation function of the hidden layer is ReLU. The output layer uses softmax activations- this way the output can be interpreted as a probability distribution over the action space. We start with an agent with a randomly initialized policy neural network. We need to make sure that our agent, even though randomly initialized, assigns nonzero probability for any of the \( 6 \) possible actions in most situations- i.e. an agent which always moves left and never shoots is doomed to never make any decision which generates points. The perfect initial condition would be if our algorithm assigns equal probabilities to any action in any situation. We estimate the probability distribution over the different actions of an agent before training from a history of a batch of 200 games:

The histogram shows a good agreement with a uniform distribution over the action space. Therefore our agent will explore all possible actions in many given conditions.

We start the training with \( 200 \) games played by the agent while storing in memory every single observation together with the action performed by the agent. For every game we also store the total points made during the game- for us this plays the role of the reward function. After that we compute the average reward generated by our agent- every game with higher score then the average is marked as "good" and every game with lower score is marked as "bad". For each game we define a least square loss function \( f(x, y, W^1, b^1, W^2, b^2) \) using the game history as a test set. Then we compute the gradients of the loss function with respect to the parameters of the neural network.

\( \frac{\partial f}{\partial W^1} \) \( \frac{\partial f}{\partial b^1} \)
\( \frac{\partial f}{\partial W^2} \) \( \frac{\partial f}{\partial b^2} \)

Then we update the weights of our neural network according to:

\( W^1 \to W^1 + c \frac{\partial f}{\partial W^1} \)
\( b^1 \to b^1 + c \frac{\partial f}{\partial b^1} \)
\( W^2 \to W^2 + c \frac{\partial f}{\partial W^2} \)
\( b^2 \to b^2 + c \frac{\partial f}{\partial b^2} \)

The parameter \( c \) is \( -1 \) for a good game and \( +1 \) for weak game. This way the probabilities for the decisions made during a good game increase and vice-versa. After we process all the games in the current batch, we plot the progress. We can make as many generations as possible and observe the results:

As we can see from the figure, statistically our trained agent starts to perform significantly better after a couple of thousands of games. While it still might happen that the trained agent performs worse occasionally, on average it is better as shown by the "moving averages" plot. Here we also demonstrate a game played by the initial generation agent and a game played by a later generation:

Further improvements

There are many ways we can improve the algorithm while preserving its basic idea. Here we state the ones that come to mind:

Changing the topology of the neural network. We can either increase the number of the neurons in the hidden layer or add more layers to the network altogether. While this looks promising, it will inevitably increase the number of played games needed to train our agent.
Introducing reward discount. We can introduce parameter \( \gamma < 1 \) which exponentially decreases the rewards taken in time. This is a very common technique and it might significantly increase the performance of the training algorithm.
Tuning the hyper-parameters of the model- size of the training batch, learning rate, maximum number of steps per game, etc. Finding of the right parameters is indeed very important as in many cases a good algorithm won't work until a sensible choice of the hyper parameters has been made.

Conclusions

We descried a very simple implementation of Policy gradient algorithm. The algorithm is far from perfect- most likely the convergence can be significantly improved by some of the techniques we already mentioned. The implementation is a generic RL setup and does not rely heavily on the particular game- one can adjust the same algorithm to a different game by simply changing the dimensions of the input and output layers of the neural network.

понеделник, 8 февруари 2016 г.

Spacial resolution of wavelets. Providing higher detail on a particular part of the image

In this second chapter we will use the properties of wavelets we saw previously in order to realize an image transmission format which allows for sending of an image with different levels of details in different regions.

Spacial resolution of wavelet modes

In the previous chapter we saw how an image can be transformed into wavelet modes. We also saw how we can apply consistently low-pass or high-pass filters so we can change the global level of detail of the image. The low-pass filters can be used for noise reduction or for data compression. The high-pass filters can be used for the extraction of higher order features of the image such as edges. In this chapter we will discuss more on the spacial resolution of the different modes \(s, x^i, y^i, d^i\).

We start with the the first four modes- \(s, x^1, y^1, d^1\). As we mentioned before they are global, i.e. their contribution extends over the whole image. The mode \(s \) plays role similar to an averaged background. The \(x^1\)-mode carries information about the average left-right gradient of the image. Similarly the \(y^1\)-mode carries information about the average top-bottom gradient of the image, and the \(d^1\)-mode carries information about the average diagonal gradient of the image. The important thing to note here is that all of the modes are completely non-localized in both directions of the image.

Now we move to the higher modes- \(x^2, y^2, d^2\). Any of these second order modes consists of four coefficients. For example \(x^2 \) is represented by the matrix \(x^2_{1, 1}, x^2_{1, 2}, x^2_{2, 1}, x^2_{2, 2} \). These modes are the first to show one important property that isn't present in \(s, x^1, y^1, d^1\), namely spacial-domain locality. In other words any modes with \( i \ge 2\) are spacialy localized, the value of \( i \) the finer the localization. Let's assume that all the wavelet coefficients are zero with exception of \(x^2_{1, 1} \). The image of \(x^2_{1, 1} \) in the spacial domain can be seen in the following diagram.

\(s\)	\(x^1_{1,1}\)	\(x^2_{1,1}\)	\(x^2_{1,2}\)	\(x^3_{1,1}\)	\(x^3_{1,2}\)	\(x^3_{1,3}\)	\(x^3_{1,4}\)
\(y^1_{1,1}\)	\(d^1_{1,1}\)	\(x^2_{2,1}\)	\(x^2_{2,2}\)	\(x^3_{2,1}\)	\(x^3_{2,2}\)	\(x^3_{2,3}\)	\(x^3_{2,4}\)
\(y^2_{1,1}\)	\(y^2_{1,2}\)	\(d^2_{1,1}\)	\(d^2_{1,2}\)	\(x^3_{3,1}\)	\(x^3_{3,2}\)	\(x^3_{3,3}\)	\(x^3_{3,4}\)
\(y^2_{2,1}\)	\(y^2_{2,2}\)	\(d^2_{2,1}\)	\(d^2_{2,2}\)	\(x^3_{4,1}\)	\(x^3_{4,2}\)	\(x^3_{4,3}\)	\(x^3_{4,4}\)
\(y^3_{1,1}\)	\(y^3_{1,2}\)	\(y^3_{1,3}\)	\(y^3_{1,4}\)	\(d^3_{1,1}\)	\(d^3_{1,2}\)	\(d^3_{1,3}\)	\(d^3_{1,4}\)
\(y^3_{2,1}\)	\(y^3_{2,2}\)	\(y^3_{2,3}\)	\(y^3_{2,4}\)	\(d^3_{2,1}\)	\(d^3_{2,2}\)	\(d^3_{2,3}\)	\(d^3_{2,4}\)
\(y^3_{3,1}\)	\(y^3_{3,2}\)	\(y^3_{3,3}\)	\(y^3_{3,4}\)	\(d^3_{3,1}\)	\(d^3_{3,2}\)	\(d^3_{3,3}\)	\(d^3_{3,4}\)
\(y^3_{4,1}\)	\(y^3_{4,2}\)	\(y^3_{4,3}\)	\(y^3_{4,4}\)	\(d^3_{4,1}\)	\(d^3_{4,2}\)	\(d^3_{4,3}\)	\(d^3_{4,4}\)

\( a_{1, 1} \)	\( a_{1, 2} \)	\( a_{1, 3} \)	\( a_{1, 4} \)	\( a_{1, 5} \)	\( a_{1, 6} \)	\( a_{1, 7} \)	\( a_{1, 8} \)
\( a_{2, 1} \)	\( a_{2, 2} \)	\( a_{2, 3} \)	\( a_{2, 4} \)	\( a_{2, 5} \)	\( a_{2, 6} \)	\( a_{2, 7} \)	\( a_{2, 8} \)
\( a_{3, 1} \)	\( a_{3, 2} \)	\( a_{3, 3} \)	\( a_{3, 4} \)	\( a_{3, 5} \)	\( a_{3, 6} \)	\( a_{3, 7} \)	\( a_{3, 8} \)
\( a_{4, 1} \)	\( a_{4, 2} \)	\( a_{4, 3} \)	\( a_{4, 4} \)	\( a_{4, 5} \)	\( a_{4, 6} \)	\( a_{4, 7} \)	\( a_{4, 8} \)
\( a_{5, 1} \)	\( a_{5, 2} \)	\( a_{5, 3} \)	\( a_{5, 4} \)	\( a_{5, 5} \)	\( a_{5, 6} \)	\( a_{5, 7} \)	\( a_{5, 8} \)
\( a_{6, 1} \)	\( a_{6, 2} \)	\( a_{6, 3} \)	\( a_{6, 4} \)	\( a_{6, 5} \)	\( a_{6, 6} \)	\( a_{6, 7} \)	\( a_{6, 8} \)
\( a_{7, 1} \)	\( a_{7, 2} \)	\( a_{7, 3} \)	\( a_{7, 4} \)	\( a_{7, 5} \)	\( a_{7, 6} \)	\( a_{7, 7} \)	\( a_{7, 8} \)
\( a_{8, 1} \)	\( a_{8, 2} \)	\( a_{8, 3} \)	\( a_{8, 4} \)	\( a_{8, 5} \)	\( a_{8, 6} \)	\( a_{8, 7} \)	\( a_{8, 8} \)

On the left table above we show the wavelet modes of an image of size \(2^{3}\times 2^3\). And on the right we have the same image in spacial domain. With the red colour we denote the element in which we are interested in wavelet domain and the corresponding coefficients in spacial domain. From the diagram we can see that the projection of the wavelet coefficient \(x^2_{1, 1} \) in spacial domain extends over the left half of the image. This means that it is localized in the horizontal direction but it is not localized in the vertical direction. The projection of the \(x^2_{2, 1} \) coincides with the one of \(x^2_{1, 1} \). And the coefficients \(x^2_{1, 2} \) and \(x^2_{2, 2} \) contribute to the right half of the image.

Similarly the wavelet coefficient \(y^2_{1, 1} \) is localized in the vertical but non-localized in the horizontal direction.

\(s\)	\(x^1_{1,1}\)	\(x^2_{1,1}\)	\(x^2_{1,2}\)	\(x^3_{1,1}\)	\(x^3_{1,2}\)	\(x^3_{1,3}\)	\(x^3_{1,4}\)
\(y^1_{1,1}\)	\(d^1_{1,1}\)	\(x^2_{2,1}\)	\(x^2_{2,2}\)	\(x^3_{2,1}\)	\(x^3_{2,2}\)	\(x^3_{2,3}\)	\(x^3_{2,4}\)
\(y^2_{1,1}\)	\(y^2_{1,2}\)	\(d^2_{1,1}\)	\(d^2_{1,2}\)	\(x^3_{3,1}\)	\(x^3_{3,2}\)	\(x^3_{3,3}\)	\(x^3_{3,4}\)
\(y^2_{2,1}\)	\(y^2_{2,2}\)	\(d^2_{2,1}\)	\(d^2_{2,2}\)	\(x^3_{4,1}\)	\(x^3_{4,2}\)	\(x^3_{4,3}\)	\(x^3_{4,4}\)
\(y^3_{1,1}\)	\(y^3_{1,2}\)	\(y^3_{1,3}\)	\(y^3_{1,4}\)	\(d^3_{1,1}\)	\(d^3_{1,2}\)	\(d^3_{1,3}\)	\(d^3_{1,4}\)
\(y^3_{2,1}\)	\(y^3_{2,2}\)	\(y^3_{2,3}\)	\(y^3_{2,4}\)	\(d^3_{2,1}\)	\(d^3_{2,2}\)	\(d^3_{2,3}\)	\(d^3_{2,4}\)
\(y^3_{3,1}\)	\(y^3_{3,2}\)	\(y^3_{3,3}\)	\(y^3_{3,4}\)	\(d^3_{3,1}\)	\(d^3_{3,2}\)	\(d^3_{3,3}\)	\(d^3_{3,4}\)
\(y^3_{4,1}\)	\(y^3_{4,2}\)	\(y^3_{4,3}\)	\(y^3_{4,4}\)	\(d^3_{4,1}\)	\(d^3_{4,2}\)	\(d^3_{4,3}\)	\(d^3_{4,4}\)

\( a_{1, 1} \)	\( a_{1, 2} \)	\( a_{1, 3} \)	\( a_{1, 4} \)	\( a_{1, 5} \)	\( a_{1, 6} \)	\( a_{1, 7} \)	\( a_{1, 8} \)
\( a_{2, 1} \)	\( a_{2, 2} \)	\( a_{2, 3} \)	\( a_{2, 4} \)	\( a_{2, 5} \)	\( a_{2, 6} \)	\( a_{2, 7} \)	\( a_{2, 8} \)
\( a_{3, 1} \)	\( a_{3, 2} \)	\( a_{3, 3} \)	\( a_{3, 4} \)	\( a_{3, 5} \)	\( a_{3, 6} \)	\( a_{3, 7} \)	\( a_{3, 8} \)
\( a_{4, 1} \)	\( a_{4, 2} \)	\( a_{4, 3} \)	\( a_{4, 4} \)	\( a_{4, 5} \)	\( a_{4, 6} \)	\( a_{4, 7} \)	\( a_{4, 8} \)
\( a_{5, 1} \)	\( a_{5, 2} \)	\( a_{5, 3} \)	\( a_{5, 4} \)	\( a_{5, 5} \)	\( a_{5, 6} \)	\( a_{5, 7} \)	\( a_{5, 8} \)
\( a_{6, 1} \)	\( a_{6, 2} \)	\( a_{6, 3} \)	\( a_{6, 4} \)	\( a_{6, 5} \)	\( a_{6, 6} \)	\( a_{6, 7} \)	\( a_{6, 8} \)
\( a_{7, 1} \)	\( a_{7, 2} \)	\( a_{7, 3} \)	\( a_{7, 4} \)	\( a_{7, 5} \)	\( a_{7, 6} \)	\( a_{7, 7} \)	\( a_{7, 8} \)
\( a_{8, 1} \)	\( a_{8, 2} \)	\( a_{8, 3} \)	\( a_{8, 4} \)	\( a_{8, 5} \)	\( a_{8, 6} \)	\( a_{8, 7} \)	\( a_{8, 8} \)

Finally the \( d^2 \) wavelets are localized in both horizontal and vertical direction. I.e. each of the four coefficients \( d^2_{1, 1}, d^2_{1, 2}, d^2_{2, 1}, d^2_{2, 2} \) contributes to a block of size \(4 \times 4 \).
We illlustrate that in the next diagram.

\(s\)	\(x^1_{1,1}\)	\(x^2_{1,1}\)	\(x^2_{1,2}\)	\(x^3_{1,1}\)	\(x^3_{1,2}\)	\(x^3_{1,3}\)	\(x^3_{1,4}\)
\(y^1_{1,1}\)	\(d^1_{1,1}\)	\(x^2_{2,1}\)	\(x^2_{2,2}\)	\(x^3_{2,1}\)	\(x^3_{2,2}\)	\(x^3_{2,3}\)	\(x^3_{2,4}\)
\(y^2_{1,1}\)	\(y^2_{1,2}\)	\(d^2_{1,1}\)	\(d^2_{1,2}\)	\(x^3_{3,1}\)	\(x^3_{3,2}\)	\(x^3_{3,3}\)	\(x^3_{3,4}\)
\(y^2_{2,1}\)	\(y^2_{2,2}\)	\(d^2_{2,1}\)	\(d^2_{2,2}\)	\(x^3_{4,1}\)	\(x^3_{4,2}\)	\(x^3_{4,3}\)	\(x^3_{4,4}\)
\(y^3_{1,1}\)	\(y^3_{1,2}\)	\(y^3_{1,3}\)	\(y^3_{1,4}\)	\(d^3_{1,1}\)	\(d^3_{1,2}\)	\(d^3_{1,3}\)	\(d^3_{1,4}\)
\(y^3_{2,1}\)	\(y^3_{2,2}\)	\(y^3_{2,3}\)	\(y^3_{2,4}\)	\(d^3_{2,1}\)	\(d^3_{2,2}\)	\(d^3_{2,3}\)	\(d^3_{2,4}\)
\(y^3_{3,1}\)	\(y^3_{3,2}\)	\(y^3_{3,3}\)	\(y^3_{3,4}\)	\(d^3_{3,1}\)	\(d^3_{3,2}\)	\(d^3_{3,3}\)	\(d^3_{3,4}\)
\(y^3_{4,1}\)	\(y^3_{4,2}\)	\(y^3_{4,3}\)	\(y^3_{4,4}\)	\(d^3_{4,1}\)	\(d^3_{4,2}\)	\(d^3_{4,3}\)	\(d^3_{4,4}\)

\( a_{1, 1} \)	\( a_{1, 2} \)	\( a_{1, 3} \)	\( a_{1, 4} \)	\( a_{1, 5} \)	\( a_{1, 6} \)	\( a_{1, 7} \)	\( a_{1, 8} \)
\( a_{2, 1} \)	\( a_{2, 2} \)	\( a_{2, 3} \)	\( a_{2, 4} \)	\( a_{2, 5} \)	\( a_{2, 6} \)	\( a_{2, 7} \)	\( a_{2, 8} \)
\( a_{3, 1} \)	\( a_{3, 2} \)	\( a_{3, 3} \)	\( a_{3, 4} \)	\( a_{3, 5} \)	\( a_{3, 6} \)	\( a_{3, 7} \)	\( a_{3, 8} \)
\( a_{4, 1} \)	\( a_{4, 2} \)	\( a_{4, 3} \)	\( a_{4, 4} \)	\( a_{4, 5} \)	\( a_{4, 6} \)	\( a_{4, 7} \)	\( a_{4, 8} \)
\( a_{5, 1} \)	\( a_{5, 2} \)	\( a_{5, 3} \)	\( a_{5, 4} \)	\( a_{5, 5} \)	\( a_{5, 6} \)	\( a_{5, 7} \)	\( a_{5, 8} \)
\( a_{6, 1} \)	\( a_{6, 2} \)	\( a_{6, 3} \)	\( a_{6, 4} \)	\( a_{6, 5} \)	\( a_{6, 6} \)	\( a_{6, 7} \)	\( a_{6, 8} \)
\( a_{7, 1} \)	\( a_{7, 2} \)	\( a_{7, 3} \)	\( a_{7, 4} \)	\( a_{7, 5} \)	\( a_{7, 6} \)	\( a_{7, 7} \)	\( a_{7, 8} \)
\( a_{8, 1} \)	\( a_{8, 2} \)	\( a_{8, 3} \)	\( a_{8, 4} \)	\( a_{8, 5} \)	\( a_{8, 6} \)	\( a_{8, 7} \)	\( a_{8, 8} \)

This logic generalises for the higher order modes- the \(x^i \) get more shrinked in the horizontal direction, the \(y^i \) get more shrinked in vertical, and the diagonal modes \(d^i \) get shrinked in both directions. In general, if we have an image with size \(2^N \times 2^N \), the mode \(x^i \) contributes to a region of size \(2^{N-i+1} \times 2^N \). Correspondingly \(y^i \) contributes to a region of size \(2^N \times 2^{N-i+1} \), and \(d^i \) contributes to a region of size \(2^{N-i+1} \times 2^{N-i+1} \).

Wavelet representation of spacial domain coefficients

In the previous section we showed how the wavelet modes get peojected in spacial domain. Here we will discuss the opposite: which wavelets cotribute to a given point \(a_{m, n} \)?
It is not hard to realise that the lowest modes: \(s, x^1, y^1, d^1 \) contribute to \(a_{m, n} \) for \( \forall m,n \) as they have non-localized projection in spacial domain.
Next we consider the modes \(x^2_{1, 1}, x^2_{1, 2}, x^2_{2, 1}, x^2_{2, 2}\). As we stated in the previous section, the modes \(x^2_{1, 1}, x^2_{2, 1}\) contribute to the left half of the image, and the \(x^2_{1, 2}, x^2_{2, 2}\) contribute to the right part of the image. So we choose \(x^2_{1, 1}, x^2_{2, 1}\) if \(1 \le m \le \frac{N}{2} \) and \(x^2_{1, 2}, x^2_{2, 2}\) otherwise. Similarly we need \(y^2_{1, 1}, y^2_{1, 2}\) if \(1 \le n \le \frac{N}{2} \) and \(y^2_{2, 1}, y^2_{2, 2}\) otherwise. From the diagonal term \(d^2_{a, b}\), we choose \(a=1\) if \(1 \le m \le \frac{N}{2} \) and \(a=2\) otherwise. And \(b=1\) if \(1 \le n \le \frac{N}{2} \) and \(b=2\) otherwise. Here it is worth nothing that although all of the different modes of second order \(x^2, y^2, d^2 \) contain 4 coefficients each, we need two coefficients from \(x^2, y^2\) and only one from \(d^2\). This is because the \(d^2\) mode is better localized in spacial domain- both in horizontal and vertical directions.

Having in mind the spacial extend of each mode, we can list all the modes that contribute to a given point or a region of an image. The needed modes are: \begin{equation} a_{m,n} : \{s, x^1, y^1, d^1 \} \cup \{x^i_{a_1, b_1}, y^i_{a_2, b_2}, d^i_{a_3, b_3} \} \end{equation} where \(i=2\ldots N\), \(a_1 = 1\ldots 2^{i-1}\), \(b_1 = m \div 2^{i-1} + 1 \), \(a_2 = n \div 2^{i-1} + 1\), \(b_2 = 1\ldots 2^{i-1}\), \(a_3 = m \div 2^{i-1} + 1 \), \(b_3 = n \div 2^{i-1} + 1\). The sign \(\div \) denotes integer division.
As an illustration, let's suppose that we have our \(2^{3}\times 2^3\) image and we want to preserve the full information of the point \(a_{5, 3}\) while keeping lower details (up to \(s, x^1, y^1, d^1 \) )for the rest of the image. So we keep the lower modes (up to \(s, x^1, y^1, d^1 \) ) together with all the higher modes that contribute to \(a_{5, 3}\). Our image in wavelet domain would look like:

\( a_{1, 1} \)	\( a_{1, 2} \)	\( a_{1, 3} \)	\( a_{1, 4} \)	\( a_{1, 5} \)	\( a_{1, 6} \)	\( a_{1, 7} \)	\( a_{1, 8} \)
\( a_{2, 1} \)	\( a_{2, 2} \)	\( a_{2, 3} \)	\( a_{2, 4} \)	\( a_{2, 5} \)	\( a_{2, 6} \)	\( a_{2, 7} \)	\( a_{2, 8} \)
\( a_{3, 1} \)	\( a_{3, 2} \)	\( a_{3, 3} \)	\( a_{3, 4} \)	\( a_{3, 5} \)	\( a_{3, 6} \)	\( a_{3, 7} \)	\( a_{3, 8} \)
\( a_{4, 1} \)	\( a_{4, 2} \)	\( a_{4, 3} \)	\( a_{4, 4} \)	\( a_{4, 5} \)	\( a_{4, 6} \)	\( a_{4, 7} \)	\( a_{4, 8} \)
\( a_{5, 1} \)	\( a_{5, 2} \)	\( a_{5, 3} \)	\( a_{5, 4} \)	\( a_{5, 5} \)	\( a_{5, 6} \)	\( a_{5, 7} \)	\( a_{5, 8} \)
\( a_{6, 1} \)	\( a_{6, 2} \)	\( a_{6, 3} \)	\( a_{6, 4} \)	\( a_{6, 5} \)	\( a_{6, 6} \)	\( a_{6, 7} \)	\( a_{6, 8} \)
\( a_{7, 1} \)	\( a_{7, 2} \)	\( a_{7, 3} \)	\( a_{7, 4} \)	\( a_{7, 5} \)	\( a_{7, 6} \)	\( a_{7, 7} \)	\( a_{7, 8} \)
\( a_{8, 1} \)	\( a_{8, 2} \)	\( a_{8, 3} \)	\( a_{8, 4} \)	\( a_{8, 5} \)	\( a_{8, 6} \)	\( a_{8, 7} \)	\( a_{8, 8} \)

\(s\)	\(x^1_{1,1}\)	\(x^2_{1,1}\)	\(x^2_{1,2}\)	\(x^3_{1,1}\)	\(x^3_{1,2}\)	\(x^3_{1,3}\)	\(x^3_{1,4}\)
\(y^1_{1,1}\)	\(d^1_{1,1}\)	\(x^2_{2,1}\)	\(x^2_{2,2}\)	\(x^3_{2,1}\)	\(x^3_{2,2}\)	\(x^3_{2,3}\)	\(x^3_{2,4}\)
\(y^2_{1,1}\)	\(y^2_{1,2}\)	\(d^2_{1,1}\)	\(d^2_{1,2}\)	\(x^3_{3,1}\)	\(x^3_{3,2}\)	\(x^3_{3,3}\)	\(x^3_{3,4}\)
\(y^2_{2,1}\)	\(y^2_{2,2}\)	\(d^2_{2,1}\)	\(d^2_{2,2}\)	\(x^3_{4,1}\)	\(x^3_{4,2}\)	\(x^3_{4,3}\)	\(x^3_{4,4}\)
\(y^3_{1,1}\)	\(y^3_{1,2}\)	\(y^3_{1,3}\)	\(y^3_{1,4}\)	\(d^3_{1,1}\)	\(d^3_{1,2}\)	\(d^3_{1,3}\)	\(d^3_{1,4}\)
\(y^3_{2,1}\)	\(y^3_{2,2}\)	\(y^3_{2,3}\)	\(y^3_{2,4}\)	\(d^3_{2,1}\)	\(d^3_{2,2}\)	\(d^3_{2,3}\)	\(d^3_{2,4}\)
\(y^3_{3,1}\)	\(y^3_{3,2}\)	\(y^3_{3,3}\)	\(y^3_{3,4}\)	\(d^3_{3,1}\)	\(d^3_{3,2}\)	\(d^3_{3,3}\)	\(d^3_{3,4}\)
\(y^3_{4,1}\)	\(y^3_{4,2}\)	\(y^3_{4,3}\)	\(y^3_{4,4}\)	\(d^3_{4,1}\)	\(d^3_{4,2}\)	\(d^3_{4,3}\)	\(d^3_{4,4}\)

Except for the wavelet modes in colour, we can set everything else to zero. This way we have the information for a low resolution image together with higher level details for the point \(a_{5, 3}\). The above recipe can be easily translated into an algorithm.

The algorithm in action

We have implemented the so far discussed algorithm in our gstreamer element from the previous chapter. The latest version is also available on Github . Let's start with the following pipeline

gst-launch-1.0 videotestsrc pattern=22 ! videoconvert ! video/x-raw,width=512,height=512 ! dwtfilter wavelet=h2 band=low inverse=1 cutoff=512 ! videoconvert ! autovideosink

The pipeline displays pattern 22 from the gstremear's videotestsrc element- spikes coming from the centre of the image.

We can also see the image in wavelet domain by switching the inverse property.

gst-launch-1.0 videotestsrc pattern=22 ! videoconvert ! video/x-raw,width=512,height=512 ! dwtfilter wavelet=h2 band=low inverse=0 cutoff=512 ! videoconvert ! autovideosink

So far we haven't demonstrated any new functionality apart from the one that was available before. The new functionality of the element can be accessed via the 5 new properties.

phof- boolean. Enable/disable a sub-region of the image where we want to preserve all the details.
phofx- The left border of the rectangular area in which higher order modes are preserved.
phofy- The upper border of the rectangular area in which higher order modes are preserved.
phofw- The width of the rectangular area in which higher order modes are preserved.
phofh- The height of the rectangular area in which higher order modes are preserved.

The above properties allow us to define a rectangular area in which we preserve all the information of the image, while keeping the level of the detail of the rest of the image lower. In order to demonstrate that let's first reduce the quality of the image by using our element as a low-band filter and preserve only the first \(128\times 128\) modes from the image.

gst-launch-1.0 videotestsrc pattern=22 ! videoconvert ! video/x-raw,width=512,height=512 ! dwtfilter wavelet=h2 band=low inverse=1 cutoff=128 ! videoconvert ! autovideosink

The resultant image has \(1 / 16\) of the information of the original image, and the loss of quality is noticeable. In wavelet domain it looks like

Now let's suppose that we want to preserve all the details of a square region of the image with top left corner at coordinates \(150, 100\), width \(200\), and height \(200 \). We can achieve that with the pipeline

gst-launch-1.0 videotestsrc pattern=22 ! videoconvert ! video/x-raw,width=512,height=512 ! dwtfilter wavelet=h2 band=low inverse=1 cutoff=128 phof=1 phofx=150 phofy=100 phofw=200 phofh=200 ! videoconvert ! autovideosink

We can see that our goal is achieved- we were able to preserve the finest details of an image in a region while keeping the resolution of the rest of the image low. Let's now look at the image wavelet domain.

gst-launch-1.0 videotestsrc pattern=22 ! videoconvert ! video/x-raw,width=512,height=512 ! dwtfilter wavelet=h2 band=low inverse=0 cutoff=128 phof=1 phofx=150 phofy=100 phofw=200 phofh=200 ! videoconvert ! autovideosink

As we can see from the picture above, there are still quite a few wavelet modes which are omitted (the black regions). We can pack all of the non-zero modes as one continuous array and we would be able to restore the original image without any extra information. That is because if we know the size of the image, and the coordinates of the preserved region, we know which wavelets we need in order to reconstruct it. We just need some sort of convention, let's say the \(x\) modes are packed before \( y \) and \( d\) modes.

Limitations

The one limitation which showed-up immediately is that if we keep the level of the image very lower than the level of the details of the region that we want to preserve, we notice some artifacts showing up. This can be demonstrated with the pipeline

gst-launch-1.0 videotestsrc pattern=22 ! videoconvert ! video/x-raw,width=512,height=512 ! dwtfilter wavelet=h2 band=low inverse=1 cutoff=16 phof=1 phofx=150 phofy=100 phofw=200 phofh=200 ! videoconvert ! autovideosink

And in wavelet domain:

Conclusion

The demonstrated element has the capability to reduce the overall quality of an image while preserving all details in some particular region of it. This could be achieved thanks to the localization properties of the wavelets- the higher the wavelet number, the better it is localized in the spacial domain, either in horizontal, vertical, or both directions.

четвъртък, 10 септември 2015 г.

DWT element for Gstreamer

A few words

Wavelet transformations play important role in signal processing. They have broad range of applications including but not limited to audio/video compression, noise reduction (and blur effect), edge detection and many more.
There are many excellent resources on the web so here I omit the mathematics. In order to understand how DWTs work in practice it will be beneficial for us to have a live gstreamer element which can be hooked into a pipeline and this way we can see how it transforms images in real time. Here I demonstrate a simple Gstreamer element which uses the GNU scientific library (GSL) to perform discreet wavelet transform on an image. The complete source code can be found on github.

About the element

The element is based on the standard gstreamer's plugin template (boilerplate). At the time of writing it accepts only video/x-raw,format=GRAY8 format and outputs the same. Also the width and the height of the image should be the same size and that size should be \(2^N\) due to the restrictions of the gsl_wavelet routines.
The element supports a few properties which would allow us to experiment with different wavelet families- Haar, Daubechies, and biorthogonal B-spline, different member etc. The properties of the element are:

wavelet- Fully specifies the wavelet's family and member number. The wavelet is specified by a single letter 'h' (Haar), 'd' (Daubechies), 'b'(B-spline) followed by the number of the member, i.e. 'h2' means Haar wavelet number 2. The centered versions are available if we add 'c'- 'dc4' produces the Daubechies's centered wavelet number 4.
band- Specifies if the element is acting as low-pass or high-pass filter.
inverse- whether or not an inverse DWT should be applied after the filtering
cutoff- the number of modes in wavelet basis to be zeroed.

In the rest of the text we are going to use mainly the Haar wavelet with number \(2\) unless explicitly specified. The stucture of the data after transformations with the Daubechies and B-spline wavelets is simillar- the most important difference is that these wavelets as basis vectors overlap.

Plugin the element into a working pipeline

Now as we have some brief description of the element, let's see what we can do with it. First off we start with simply attaching the element to a pipeline just to make sure its CAPS are compatible with the rest of the gstreamer elements. We could achieve this with the pipeline:

gst-launch-1.0 videotestsrc ! videoconvert ! video/x-raw,width=512,height=512 ! dwtfilter wavelet=h2 band=high inverse=true cutoff=0 ! videoconvert ! ximagesink

The element takes the image data, transforms it to wavelet domain and then transforms it back. The net effect in the above pipeline is that our element serves just as a very computationaly expensive identity element. The output looks like


Fig 1. Original image

An image in wavelet domain

Now we can actually see the wavelet transformation our element is performing if we disable the inverse transformation

gst-launch-1.0 videotestsrc ! videoconvert ! video/x-raw,width=512,height=512 ! dwtfilter wavelet=h2 band=high inverse=false cutoff=0 ! videoconvert ! ximagesink

Now the element transforms in-place the input image into wavelet basis but does not transform it back. The output looks like


Fig 2. The image in wavelet domain

The above figure is an important demonstration of the way an image is stored after a wavelet transform. The modes to the top left are the "slow" modes further down and right are the "fast" modes. Let's assume we have a black and white image of size \(2^3 \times 2^3\). It can be represented as the matrix:

\(a_{1 1}\)	\(a^{ }_{1 2}\)	\(a_{1 3}\)	\(a_{1 4}\)	\(a_{1 5}\)	\(a_{1 6}\)	\(a_{1 7}\)	\(a_{1 8}\)
\(a_{2 1}\)	\(a_{2 2}\)	\(a_{2 3}\)	\(a_{2 4}\)	\(a_{2 5}\)	\(a_{2 6}\)	\(a_{2 7}\)	\(a_{2 8}\)
\(a_{3 1}\)	\(a_{3 2}\)	\(a_{3 3}\)	\(a_{3 4}\)	\(a_{3 5}\)	\(a_{3 6}\)	\(a_{3 7}\)	\(a_{3 8}\)
\(a_{4 1}\)	\(a_{4 2}\)	\(a_{4 3}\)	\(a_{4 4}\)	\(a_{4 5}\)	\(a_{4 6}\)	\(a_{4 7}\)	\(a_{4 8}\)
\(a_{5 1}\)	\(a_{5 2}\)	\(a_{5 3}\)	\(a_{5 4}\)	\(a_{5 5}\)	\(a_{5 6}\)	\(a_{5 7}\)	\(a_{5 8}\)
\(a_{6 1}\)	\(a_{6 2}\)	\(a_{6 3}\)	\(a_{6 4}\)	\(a_{6 5}\)	\(a_{6 6}\)	\(a_{6 7}\)	\(a_{6 8}\)
\(a_{7 1}\)	\(a_{7 2}\)	\(a_{7 3}\)	\(a_{7 4}\)	\(a_{7 5}\)	\(a_{7 6}\)	\(a_{7 7}\)	\(a_{7 8}\)
\(a_{8 1}\)	\(a_{8 2}\)	\(a_{8 3}\)	\(a_{8 4}\)	\(a_{8 5}\)	\(a_{8 6}\)	\(a_{8 7}\)	\(a_{8 8}\)

Where every entry \(a_{i j}\) represents the brightness of the corresponding point- i.e. the element \(a_{1 1}\) corresponds to the brightness of the top left corner. After the transformation, the data of the image in wavelet domain looks like:

\(s\)	\(x^1_{1,1}\)	\(x^2_{1,1}\)	\(x^2_{1,2}\)	\(x^3_{1,1}\)	\(x^3_{1,2}\)	\(x^3_{1,3}\)	\(x^3_{1,4}\)
\(y^1_{1,1}\)	\(d^1_{1,1}\)	\(x^2_{2,1}\)	\(x^2_{2,2}\)	\(x^3_{2,1}\)	\(x^3_{2,2}\)	\(x^3_{2,3}\)	\(x^3_{2,4}\)
\(y^2_{1,1}\)	\(y^2_{1,2}\)	\(d^2_{1,1}\)	\(d^2_{1,2}\)	\(x^3_{3,1}\)	\(x^3_{3,2}\)	\(x^3_{3,3}\)	\(x^3_{3,4}\)
\(y^2_{2,1}\)	\(y^2_{2,2}\)	\(d^2_{2,1}\)	\(d^2_{2,2}\)	\(x^3_{4,1}\)	\(x^3_{4,2}\)	\(x^3_{4,3}\)	\(x^3_{4,4}\)
\(y^3_{1,1}\)	\(y^3_{1,2}\)	\(y^3_{1,3}\)	\(y^3_{1,4}\)	\(d^3_{1,1}\)	\(d^3_{1,2}\)	\(d^3_{1,3}\)	\(d^3_{1,4}\)
\(y^3_{2,1}\)	\(y^3_{2,2}\)	\(y^3_{2,3}\)	\(y^3_{2,4}\)	\(d^3_{2,1}\)	\(d^3_{2,2}\)	\(d^3_{2,3}\)	\(d^3_{2,4}\)
\(y^3_{3,1}\)	\(y^3_{3,2}\)	\(y^3_{3,3}\)	\(y^3_{3,4}\)	\(d^3_{3,1}\)	\(d^3_{3,2}\)	\(d^3_{3,3}\)	\(d^3_{3,4}\)
\(y^3_{4,1}\)	\(y^3_{4,2}\)	\(y^3_{4,3}\)	\(y^3_{4,4}\)	\(d^3_{4,1}\)	\(d^3_{4,2}\)	\(d^3_{4,3}\)	\(d^3_{4,4}\)

Unlike the original form where every entry has the same role and it is equally important, in the above representation different elements have different importance. For example, the top-left element \(s\) plays a role similar to an average (althought it's not exactly an average), so it contributes to every pixel in the standard form. Also the elements \(x^1, y^1, d^1\) have global contribution and carry information about the gradient from left to right, top to bottom, and along the diagonal. Next are the \(x^2, y^2, d^2\) elements each of them carries information for only a half of the image (either left or right half, top or bottom half, or along the main diagonal. The pattern should be clear- as we move further from the origin \(s\), the modes start to carry more detailed or "higher frequency" information. Now as we know something about the meaning of the different modes in wavelet domain, we can start to experiment by removing some of the modes and seeing how it transforms the image.

Using the element as high-pass filter

So far we have only demonstrated how an image looks when transformed to wavelet domain. In the next two sections we will experiment with the filter capabilities of the element. The first use would be the high-pass filter. In order to do that we need to remove the modes residing in the top-left corner of the data- starting with \(s\), \(x^1, y^1, d^1\), and so on. We should keep in mind that in order to do the filtering uniformly across the whole image, we should manipulate all the entries up to a given level of detail.We can start with removing \(s\)- it is just one element, than we can remove the \(x^1, y^1, d^1\) entries- we need to remove a \( 2\times 2\) block. In order to filter-out the \(x^2, y^2, d^2\) entries as well, we would need to manipulate the top-left block of size \( 4\times 4\) and so on... First we transform the image and cut-off the first 64 modes. The pipeline that does that is

gst-launch-1.0 videotestsrc ! videoconvert ! video/x-raw,width=512,height=512 ! dwtfilter wavelet=hc2 band=high inverse=false cutoff=64 ! videoconvert ! ximagesink

The image in wavelet domain looks like

Fig 3. Image in wavelet domain after applying high-pass filter

We can clearly see that the square block that contains the lower frequency modes in the top left is been nullified. Now let's see what this filtering did to the image

gst-launch-1.0 videotestsrc ! videoconvert ! video/x-raw,width=512,height=512 ! dwtfilter wavelet=hc2 band=high inverse=true cutoff=64 ! videoconvert ! ximagesink

Fig 4. The inverse transformed image after applying high-pass filter

As we can see our filter preserved only high-frequency features of the image- i.e. edges and noise (the snowy block in the lower right corner). That way the element can be used as an imperfect edge detector. By removing even more of the lower frequency modes from the image, we can make the edge detection more accurate, but some of the initial edges are lost. Sometimes better results can be obtained if we switch to another wavelet family.

gst-launch-1.0 videotestsrc ! videoconvert ! video/x-raw,width=512,height=512 ! dwtfilter wavelet=hc2 band=high inverse=true cutoff=128 ! videoconvert ! ximagesink

Fig 5. The inverse transformed image after applying high-pass filter II

Using the element as low-pass filter

Let's now try to use the element as low-pass filter. If we take a look again in Fig 2 we can see that that there are 3 times more modes \(x^2, y^2, d^2\) than all the modes of lower order together. The same is valid for modes of order \(x^3, y^3, d^3\)- their number is 3 times the number of all lower frequency modes and so on... This means that if we want to remove only the finer details from the image, we need to remove \(75\%\) of the total modes in the image. In the next pipeline we start with an image of size \(512 \times 512 \) (having \(512 \times 512 \) modes). Then we remove the highest detail modes, so we end up with \(256 \times 256 \) modes which is only a quarter of the initial data.

gst-launch-1.0 videotestsrc ! videoconvert ! video/x-raw,width=512,height=512 ! dwtfilter wavelet=hc2 band=low inverse=false cutoff=256 ! videoconvert ! ximagesink

Fig 6. Image in wavelet domain after applying low-pass filter

The next thing is to apply the inverse transform to the image and see how well it was preserved with \(25\%\) of the data.

gst-launch-1.0 videotestsrc ! videoconvert ! video/x-raw,width=512,height=512 ! dwtfilter wavelet=hc2 band=low inverse=true cutoff=256 ! videoconvert ! ximagesink

Fig 7. The inverse transformed image after applying low-pass filter

If we compare with Fig 1 we see that the image was restored quite well from only of quarter of the initial data. Let's push and remove the next level of details. This means we preserve only \(128 \times 128 \) modes which reduces the size of the data by factor of \( 16 \).

gst-launch-1.0 videotestsrc ! videoconvert ! video/x-raw,width=512,height=512 ! dwtfilter wavelet=hc2 band=low inverse=false cutoff=128 ! videoconvert ! ximagesink

Fig 8. Image in wavelet domain after applying low-pass filter II

The the inverse transformed image is:

Fig 9. The inverse transformed image after applying low-pass filter II

Again the image is quite well preserved. Let's make the ultimate push and leave only block with \(16 \times 16 \) in wavelet domain. This is \( 0.1\% \) of the size of the initial data!

gst-launch-1.0 videotestsrc ! videoconvert ! video/x-raw,width=512,height=512 ! dwtfilter wavelet=hc2 band=low inverse=true cutoff=32 ! videoconvert ! ximagesink

The image is still recogniseble

Fig 10. The inverse transformed image after applying low-pass filter III

Although this strategy looks very promising, one can argue that the image we started with wasn't very complex. That's true- a quality photo would be ruined indeed if we preserve only \( 1\% \) of its data. Still probably removing only the highest details would be a good approximation. Another application of the low-pass filter is to remove noise from noisy images- we can see that by removing the high-frequency modes, the noisy part in the bottom right corner get smoothed.

четвъртък, 4 октомври 2018 г.