The qualitative performance of neural networks relies on how well the training input data matches the real data encountered during inference. Especially networks focusing on image denoising, deblurring or superresolution will heavily depend on small scale information on the pixel level. A typical pipeline for generating synthetic training data for those tasks is to start out with a high quality image, degrade it and use that one as the input during training.
In this blog post I want to introduce my pipeline for taking a perfectly fine image and making it ugly. BUT in a realistic way.
The following explanation probably leaves out a few steps, but I’ve tried to link to the Wikipedia page of all technical terms and the source code using pytorch is available on the “degradr” GitHub repository.
The Theory
To start out imagine a photo before it even hit your lens. It’s perfect, no blur, no chromatic abberation, no JPEG artifacts and no noise. Well tough luck, shot noise is everywhere. It then gets refracted by your lens and focused onto the sensor. Along the way several optical aberrations start to degrade your image. Then the color filter array (CFA) of your camera filters out light of certain wavelengths. The remaining light hits your sensor causing photoelectrons to be emmited. At the end of the exposure all photoelectrons of a pixel get amplified according to the camera gain and then the ADC converts those into data numbers (DN). This step introduces both read noise and quantization noise.
Now you have a digital image, but probably not one you’d want to look at just yet. The camera white balance still needs to be applied followed by demosaicing and a color space transformation to the color space of your choice. If you don’t shoot RAW, your camera will even compress your image to a JPEG file.
Implementation
Coming up: how to simulate all that fast enough to be done live on the CPU while training a neural network.
Optical Abberations
The first and probably more complex step is to simulate the optical abberations. A fast way to do this is to convolve the image with a color dependent point spread function (PSF). This PSF should be as realistic as possible and fortunately devices exist for measuring this for a given lens. Unfortunately I’m not willing to spend a “request a quote” amount of money on one, so we’ll just approximate instead. For generating semi-realistic PSFs I’ve used the prysm library in combination with random zernike polynomials, these need to be generated as a preprocessing step before training, the convolution can be done live. Here are some examples:
CFA
After being refracted through the lens the light hits an antialiasing filter which introduces some more blur followed by the color filter array. Simulating that is easy: we convert the color image into a monochrome one and for each pixel we set the value to either the red, green or blue channel according to the specified pattern.
Noise
Now is the time for the different types of noise. First we apply the shot noise, which is simply a poisson distribution. Then we multiply the whole image by the gain followed by adding read noise which is modelled via two gaussian distributions. Keep in mind that in reality the amount of read noise will also depend on the selected gain, the exact relationship varies for different camera models. Lastly we simulate the quantization noise by rounding all values.
Camera White Balance
Given that the filters of the CFA have different sensivities, all pixels need to be multiplied by a certain factor depending on the color of their respective filter in order to correct this imbalance.
Demosaicing
Several different algorithms exist for demosaicing, three of those are implemented in the Intel Integrated Performance Primitives. Unfortunately there isn’t an available python wrapper for this, so I wrote one myself. The usage is as follows: pass in the monochrome image with the bayer pattern and get a color image back. Depending on the throughput of your network as well as your CPU, this can also be done live, but be aware that the AHD demosaicing is significantly slower than the other methods.
Color Space Transformation
Until now the image is still in the camera color space and would look weird if we display it on a monitor intended for a different color space like sRGB. The color space transformation is luckily very simple, we only have to multiply each rgb pixel (assumed as a vector) by the 3×3 transformation matrix. A list of realistic transformation matrices can be found in the LibRaw library.
JPEG Artifacts
Finally, if we want our neural network to be also trained on handling JPEG artifacts, we of course have to add those as well. This is implemented via a call to the imageio library.
Results
Here you can see an example of the library being tested on a small crop of an image of the Heart Nebula I took a while ago. (click to expand)
In summary, first the input is blurred by a random PSF which introduces blur and chromatic abberation. Then noise is added followed by the CFA and demosaicing. Note that the order of Noise and CFA doesn’t matter for the implementation and this way it’s easier to see the impact the CFA+Demosaic has on the noise distribution on a pixel scale. Lastly JPEG compression is applied further increasing the image degradation.
The python library is available here: https://github.com/nhauber99/degradr
So far I’ve only tested it for one machine learning project which is still under development, but I plan on running some tests with readily available neural networks like ESRGAN to make some comparisons with the previously used preprocessing steps.