Dizzy Dithering
TLDR: I wrote an error diffusion dithering algorithm that produces results similar to blue noise dithering. I call it dizzy dithering.
Dithering is a technique for emulating the appearance of a large color palette, when you only have access to a small palette. For example, if you only have access to black and white, you can simulate gray by using a checkerboard pattern. This article gives more background on dithering if you want more (and this post takes a lot of inspiration from it).
Throughout this post I’ll be converting this test image to a palette of just black and white:
The problem dithering is trying to solve is that naive quantization, where we simply set each pixel to the nearest palette color, looks terrible. In this case naive quantization just checks if the pixel brightness (from 0 to 1) is greater than 0.5: out(x, y) = in(x, y) > 0.5 ? 1 : 0
.
We want to simulate the grays by using some pattern of black and white pixels. There are many techniques for dithering, but they can be broadly divided into 2 categories: ordered dithering, and error diffusion.
Ordered Dithering
Ordered dithering is similar to naive quantization, but uses a different quantization threshold at each pixel. Equivalently, you can think of any of these algorithms as adding some offset value to each pixel, before naive quantization is applied.
Bayer dithering is probably the most common ordered dithering algorithm, and uses a fixed pattern of quantization thresholds that can be easily generated using some bitwise operations. You just generate this pattern in any power-of-two size, and tile it over the image (larger patterns give more gradient levels). Then instead of comparing the input image’s pixels to a fixed threshold of 0.5, you compare it to the corresponding pixel in the Bayer pattern: out(x, y) = in(x, y) > bayer(x % n, y % n) ? 1 : 0
where n is the size of the Bayer pattern.
The main downside of Bayer dithering is that the patterns it creates are very structured and look a bit artificial.
To combat this issue, we can used randomized thresholds instead. We can set the threshold of each pixel to a random value between 0 and 1. If we do this in the obvious way, we end up with white noise dithering. out(x, y) = in(x, y) > randf(0, 1) ? 1 : 0
White noise dithering has none of the artificial patterns of Bayer dithering, but it’s a lot more… noisy. Almost all of the details have been lost. Looking closely at the plainer parts of the image, like the sky, we can see random clumps and streaks in the noise. If we look at the thresholds directly, and apply a blur filter, we can see patchiness in the blurred result, which leads to patchiness in the dithering.
Blue noise dithering attempts to remove these low frequency patches from the noise. The name refers to the power distribution in the frequency domain. White noise has a uniform distribution in the frequency domain. Blue noise has a power distribution where the amplitude is proportional to the frequency. In either case, the pixel values should still uniformly distributed between 0 and 1.
Blue noise dithering is a bit of a misnomer because blue noise dithering doesn’t typically use a frequency distribution to generate its threshold map. Instead it uses a technique called void-and-cluster, which is quite an expensive algorithm.
I wasn’t interested in implementing void-and-cluster, so I tried generating actual blue noise using FFT (quite a mathematically interesting puzzle, a post for another day). The results were better than white noise dithering, but not as good as the examples of void-and-cluster I’ve seen. I have a hunch that there are better algorithms for generating void-and-cluster style noise, but that’s not the purpose of this post.
For comparison, I grabbed the example void-and-cluster noise from this article and applied it to the test image.
The visible horizontal and vertical streaks in the sky are an artifact of my crude screenshotting of the example noise. Ignoring that, the results look good. The only issue I can see is that some of the details are still being lost, like the masts of the boats, and the lines of the roof at the bottom are also a bit noisy. I think Bayer dither does a better job in those areas.
Error Diffusion
Error diffusion dithers by iterating over the image, quantizing pixels one at a time. When a pixel is quantized to the nearest palette color, the “error” (the difference between the original color and the palette color) is propagated to the neighboring pixels: error = input_color - quantized_color
, adjacent_color += weight * error
The downside of error diffusion is that, since it iterates over the pixels of the image one at a time, it can’t be implemented in a shader. So these algorithms are not suited to real-time rendering.
The upside is that error diffusion can easily be applied to arbitrary color palettes. Ordered dithering only really make sense when your restricted color palette forms a regular grid in your color space (for example, a 27 color palette where R, G, and B can be 0, 0.5, or 1 forms a 3x3x3 cube of points in RGB space). If your palette’s colors are scattered arbitrarily through your color space, it’s not clear how to apply the ordered dithering thresholds. (Maybe triangulate the space using the palette, find the 4 vertices of the simplex the input color is in, and use the dithering thresholds to pick a vertex? Might work, but it’d be tricky).
These irregular palettes come up most often when you’re doing palette optimization. If you choose palette colors that are optimized for your particular image, then they almost certainly won’t be arranged in the sort of grid you need for ordered dithering.
A few questions define an error diffusion algorithm:
- What order do we iterate over the pixels?
- Which neighboring pixels do we propagate the error to?
- How much error do we propagate to each of those neighbors?
What makes this trickier is that you can only propagate errors to pixels that haven’t been quantized yet.
The most common error diffusion algorithm is Floyd–Steinberg dithering. It iterates over the image line by line (either left to right like a book, or in a zig-zag pattern). The errors are propagated to the next pixel in the current row, as well as the 3 adjacent pixels in the following row. A weighting factor is applied to each of these error propagations, like so:
The results are quite good, and this has been my go-to dithering algorithm for years, because it’s fast and easy to implement, and preserves the details well (you can still see the masts of the boats). The downside is that there’s quite a lot of artificial structure, and that structure has a clear directional bias (diagonal when using left-to-right iteration, axial when using zig-zag iteration). For certain grays, you get very regular patterns. You can see patches of this effect near the top of the sky, and it makes the sky’s gradient look a bit banded and streaky. And very light (or dark) areas have strange looking lines of dots.
Another approach, Riemersma dithering, is to iterate over the pixels using a space filling curve, like a Hilbert curve. The errors are propagated to the pixels ahead of the current pixel, in the Hilbert iteration order (equivalently, the last few pixels of error are remembered, weighted, and added to the current pixel).
I implemented this in a bit of a simpler way, by just keeping a single error value, and updating it like this: error = decay * (error + input_color - quantized_color)
where decay is a constant you can tune (I chose 0.9). Also, since Hilbert curves are power-of-two sized squares, I chose the smallest curve that was larger than the image, and just skipped curve pixels outside the image. This isn’t great, because it means we jump around in a few places along the edge of the image. It would be better to use a space filling curve that’s correctly sized to the image, but one must eventually stop shaving yaks.
The result is alright, but there’s actually a lot more artificial structure in the sky than I expected. If you look closely, you can even spot some very symmetrical patterns. The detail preservation is not great, but not terrible (boat masts are barely visible).
Dizzy Dithering
Finally we get to my algorithm (I probably didn’t need to implement every other dithering algorithm for comparison, but I was in that sort of mood).
The idea is to iterate completely randomly through the image, and just propagate errors to all the neighboring pixels that haven’t been quantized yet. If two of your neighbors have been processed, propagate 1/6th of your error to the other six adjacent neighbors.
Actually the results are slightly better if you weight the error propagation towards the orthogonally adjacent pixels, and propagate less to the diagonal pixels. In my current implementation I’m propagating 10x more error to the orthogonally adjacent pixels. If you propagate equally I find there’s a bit more structure in the result.
The name comes from the way the iteration order jumps around the image at random.
I think the results are great. There’s essentially no artificial structure in this image, and it does a good job of preserving details (you can still see the boat masts). It’s also trivially simple to implement, compared to the other dithering algorithms that have zero structure (the two variants of blue noise dithering).
Another variant I tried was to decay the error a bit when propagating, to prevent errors from randomly propagating a long way. All this really did was increase the contrast of the image, as if I was smoothly blending between the dithered image and naive quantization.
The downside of this algorithm is that a straightforward implementation requires maintaining a lot more state than the other error diffusion algorithms. But since error diffusion algorithms are mainly suited for offline jobs, that might not be a big deal. In any case, it’s possible to optimize away all that state. Let’s go over the basic algorithm, then I’ll talk about optimizations.
The algorithm
There are 3 pieces of state to maintain: the iteration order, a bitmap tracking which pixels have been processed, and the propagated errors. The iteration order is just a shuffled list of all the pixel coordinates (optimization: use pixel indexes instead of coordinates). Iterate over that list, quantize the pixel, propagate the error to the neighbors, and mark that pixel as processed.
To propagate the error (error = input_color - quantized_color
), set denom = 0
, iterate over the adjacent pixels, and if the adjacent pixel hasn’t been processed yet, denom += weight
(weight is 1 for orthogonally adjacent pixels, and a tunable value (which I set to 0.1) for diagonally adjacent pixels). Then if denom > 0
, iterate over the adjacent pixels again and add error * weight / denom
to that pixel’s error total. In other words, we’re just propagating a weighted fraction of the error to each neighbor that hasn’t been processed yet.
Optimization
A clever implementation of Floyd–Steinberg or Riemersma only uses O(1) extra memory beyond the image itself. The same it true for Dizzy dithering.
Instead of storing the propagated errors in their own array, they can be added directly to the color of the neighboring pixels (this is how Floyd–Steinberg is usually implemented). If your color channels are 8-bit ints then this can cause rounding errors where the fractional part of the propagated error is dropped, but if you’re storing the image using floats/doubles then this isn’t an issue.
The bitmask is only 1 bit per pixel, so it doesn’t really matter. But it can also be eliminated if you’re storing the colors using floats/doubles. Just indicate the pixel is processed by, for example, adding 1000 to the color value. Or maybe your color data has an alpha channel that you’re not using, and you can stick the flag in there. In either case you’ll need to undo this change once the dithering is complete, requiring another iteration over all the pixels, so I probably wouldn’t bother.
The shuffled list of pixel coordinates can be eliminated by using a stateless pseudorandom iteration algorithm. We only use that list to iterate once randomly through the pixels, so really we just need a way of iterating over the pixels in a visually disordered fashion, such that we hit each pixel exactly once.
There are a few math operations that permute all the numbers below 2ⁿ without causing any collisions. Multiplying by an odd number (modulo 2ⁿ) and xoring by any number both work ok. Obviously this isn’t as good as a real shuffle, but a few rounds of multiply and xor are good enough for this use case. Each round looks like i = ((i * rand_odd) % (1 << n)) ^ rand_any
.
So take the total number of pixels (n = x_res * y_res
) and round it up to the next power of two (m = roundUp(n)
) randomly pick some numbers below m to support 3 or more rounds of permutation (I did 5). Then iterate i from 0 to m, and for each i, perform the rounds of permutation to get the permuted index, j. If j >= n
, skip it. Otherwise perform the dithering for this pixel, with x = j % x_res
and y = j / x_res
.
This is a stateless permutation algorithm, and we can actually query the permuted index of any i in O(1), which is over-powered for this use case. We’re only doing one forward iteration, so we don’t need random access. We just need it to have O(1) state. Let me know if you’ve heard of any other randomly permuted iteration algorithms that would work here (maybe an algorithm that sacrifices random access for better statistical properties?).
This image was generated using stateless random iteration and no error table (errors were added directly to the pixels), but I didn’t bother eliminating the bitmap for tracking processed pixels. The results are just as good as the basic algorithm.
Overall I think this algorithm produces results about as good as blue noise dithering (in fact it does a noticeably better job of preserving details). It’s a bit more complicated to implement than Floyd–Steinberg or Bayer dithering, but similar in complexity to Riemersma, and a lot simpler than blue noise dithering. The main downside is that error diffusion algorithms aren’t suited to real-time applications. The extra memory overhead is another downside, but that tends to not be an issue for offline use cases, and can mostly be mitigated by optimizations.