How DNG compresses raw data with lossless JPEG92

The Back Story

For the last year, I've been developing a cross-platform (i.e. Linux, Mac and Windows) Python-based desktop application called "MlRawViewer".

It was born out of the troubles I had working with the large raw video files produced using Magic Lantern with my Canon 7D (see Life Without Artifacts for more details).

The latest version, MlRawViewer 1.3.2, can now be used to review and convert in real time, Magic Lantern .RAW and .MLV formats, as well as the emerging industry standard CinemaDNG format (used by BlackMagic cameras and other) into either high quality video (ProRes) or CinemaDNG formats.

The quantity of data raw recording produces really can be overwhelming. I fill a 64GB CF card in 15 minutes. In a year I've collected over 6TB of files. I've started to treat 3TB HDs the way I treated DV tape a decade ago!

For practical reason, Magic Lantern cannot currently do any compression to the data it records. Literally the only thing it can do is to write it to disk as fast as possible.

But after recording, there is time and processing power available to do more. One big question for me has been how to archive the data in a sensible way without losing the benefits of the format. Because of that, I was interested to understand in detail how this problem is handled elsewhere.

Raw images - meaning images still in the "bayer" colour pattern of the image sensor - have been a feature of the digital photography world for a decade. Adobe tried (with quite little success) to standardise the wild west of file formats for raw photos with the DNG ("Digital Negative") format - itself an extension of the venerable TIFF structured image format - in 2004.

One of the features DNG has had from the beginning is a lossless compressed mode capable of squeezing raw images as much as 50% (though more often 60-70%) of their original uncompressed size without throwing away any of the detail.

I had implemented (uncompressed) DNG reading and writing for MlRawViewer a few months ago, so I knew that format already quite well. But the lossless compressed mode was a bit of a mystery to me. It is specified in the DNG stadard, but not really in much detail - only one paragraph.

The feature is based on "Lossless JPEG" - I'll explain more about that in a moment.

Getting to grips with how exactly DNG and Lossless JPEG fitted together took a bit of studying, and a fair bit of code writing. Since I thought those details might be of some interest to one or two people in the future, I wanted to write some of them down here.

What is Lossless JPEG?

JPEG is probably familiar to you as a lossy image format which suffers from blocky artifacts when you try and squeeze it too much. It has been in use since about 1992, when it was standardised by the Joint Photographic Experts Group - the J.P.E.G. At this point it doesn't show much sign of ever going away, despite repeated attempts by many groups to replace it.

What is less well known, is that the original JPEG (1992) spec actually included a lossless mode quite unlike the familiar blocky (8x8 DCT-based) lossy mode.

(That original Lossless JPEG (1992), is not to be confused with the more complex "JPEG-LS" which came a few years later. Or the lossless modes of wavelet-based JPEG-2000 which are even more complex. OR indeed the lossless mode of (Microsoft's) JPEG XR from 2007)

Despite appearing as a bit of an afterthought in the original JPEG spec, the lossless mode was actually (IMVHO) quite forward thinking. Unlike the 8bit and little used 12bit modes of lossy JPEG, lossless can handle any bit depth between 2 & 16 bits. It also includes a range of prediction functions to help the compression, and at least one of those turns out the be very appropriate to work with raw data when used in a specific way.

How does Lossless JPEG (1992) compress?

LJ92 - as I'm going to call it here - works only on grayscale images of arbitrary width and height.

The image is compressed one pixel at a time along each row, until all rows have been processed.

For each pixel, a prediction is made using a combination of the nearest pixel earlier in the row, and the two pixels at the same columns in the previous row.

There are 7 predefined prediction functions, and only one of them is used for all the pixels in each "scan" - meaning a block of pixels. It's possible and common for an entire image to be contained in a single scan.

The value of the predicted value is subtracted from the current pixel, and that (hopefully small) difference is what is encoded.

First the binary magnitude of the difference (called the SSSS in the spec) is checked and given a code from 0-16 corresponding in most cases to the number of extra bits which must be written to fully encode the difference. For example, 0 is SSSS=0, and no extra bits. -1 or +1 is SSSS=1 and 1 extra bit. -3,-2,+2,+3 are SSSS=2, and 2 extra bits. Etc. The only special case is 32768, which is SSSS=16 but NO extra bits.

The whole image must be scanned first, and then the frequency of these difference SSSS values is analysed. A huffman coding table is built from the frequencies, and used to assign variable bit-length codes to each SSSS value - short codes for the common values, long for the uncommon.

The information about the huffman table is encoded to a block.

The scan defines the predictor used, and then each pixel difference is written as first the huffman code for the SSSS, then the remaining number of bits.

There are a few tricky things.

One is that if a byte containing all 1 bits is written, a padding byte of zeros is inserted - this is needed because of the way JPEG format blocks work.

Another is that at the start of the image, there is no data for the predictors. For the first pixel, the value which is half the number of bits being encoded is used. For the first pixel of each row, the value of the pixel above is used.

A decoder starts by reading and reconstructing the huffman table, then using that to decode the pixels of the scan, observing the special rules of the first pixels for prediction, and discarding padding 0x00 bytes after each 0xFF byte.

That's it.

How does DNG use LJ92?

DNG contains raw bayer images, not grayscale images. It would have been possible to split the image into 4 components and encode each one as a grayscale LJ92 image. But that's not how it works.

Instead, it uses a couple of clever hacks.

The first is based on the observation that adjacent rows of bayer data are not similar, but every other row is similar. That means that if you take a bayer image, and interpret it as an image twice as wide but only half as high, while it will look like two copies of the original image side-by-side and squashed, now every row is similar to the next one. Red pixels are above red pixels, green above green, etc.

This can be achieved by simply telling the encoder that the image is a different size than it really is.

That reinterpreting step enables the LJ92 predictors to be used, since the pixels above are now the same colour as the current pixel. But the pixels one step earlier are still different. This is where one of the original LJ92 predictors becomes most useful.

LJ92 predictor type 6 is defined as:

Px = Rb + ((Ra – Rc)/2)

That means, the prediction is the pixel above, plus half the difference between the pixels to the left. But how does this work when the pixels to the left are a different colour - for example red instead of green?

The trick here is a similar one that is used when reconstructing (demosaicing) bayer images into full colour images of the same resolution. Hue is assumed not to change very often in an image. That means that nearby pixels of different colours will often change at about the same rate, so it is useful to predict the change in the current colour from the change in neighbouring colours.

One extra element on top of these two tricks is that the tiling feature of DNG (and TIFF originally) can be used to encode regions of the image - for instance side by side - as separate LJ92 images. This allows parallel encoding and decoding of the tiles which can speed up both operations.

Putting it all together

To find out those details, I had to read a few docs, and write some code.

I made an implementation of an LJ92 encoder and decoder in a small liberally licensed C library called "liblj92". It's currently inside the MlRawViewer git repo, but it has no dependencies to the rest of MlRawViewer.

Then I integrated liblj92 to MlRawViewer's existing DNG code so that it could read and write compressed DNGs. The result was released in MlRawViewer 1.3

But how does this help with my original archiving question? I'm not planning to convert all my Magic Lantern files to DNGs since those are not very convenient to work with.

My hope is that LJ92 could be added to the Magic Lantern MLV as a standard compression format, with the mlv_dump tool able to generate effectively compressed files for archiving.

MLVs using LJ92 would be 50-70% of the uncompressed originals, would still be playable in real time (actually, playing becomes easier when the files are smaller), would have no detail lost, and would be very easy to convert into compressed DNGs simply by copying the embedded raw LJ92 images as-is.

Then I would be able to free up a bunch of my backup 3TB HDs... for more raw