Lossless bug in old COM versions of DicomObjects
Incorrect Use of signed arithmetic in JPEG Lossless predictor calculations
Some images (very few, as described below) with compressed with DICOM transfer Syntax 1.2.840.10008.1.2.4.57 will decompress to give wrong pixel data by 3rd party software which correctly uses unsigned arithmetic. The converse (images compressed other systems not decompressing correctly using DicomObjects) would also be possible but has never been seen in practice. The bug is completely symmetrical, so images compressed and decompressed by faulty versions of DicomObjects will display completely normally.
All up to and including 4.2.199.x
The full JPEG Lossless specification as used by 1.2.840.10008.1.2.4.57 allows 7 different values (1 to 7) for the "Predictor Selection Value" (PSV) which controls how the predicted value of a pixel (which is used then to calculate a difference) is derived from the values of surrounding pixels. Values 1-3 are simple (an immediately adjacent pixel value), 4 is a little more complicated (2 pixels added together less a third) but 5-7 are more complex, involving the halving of a difference between 2 pixel values, and it is here that there is a difference between using signed arithmetic (as used incorrectly by affected versions of DicomObjects) and unsigned arithmetic as required by the standard, so only images with these PSV values are affected, and in practice, few images happen to be most efficiently compressed using this PSV. Moreover, due to the way that the calculations happen, only 16 bit data is vulnerable to this problem. In summary therefore, the only images affected are those where:
- Transfer syntax 1.2.840.10008.1.2.4.57 is used
- PSV = 5, 6 or 7 is selected by the compression software as being the most efficient
- The pixel data uses full 16 bits
- Decompression is done by a different (correct) implementation.
Correction of DicomObjects to use proper unsigned arithmetic was of course trivial - the harder problem is archiving interoperability with both other software, and existing badly compressed images, compounded by the fact that most compressed pixel data does not have a marker showing which implementation did the compression (there is of course the DICOM implementation name/UID in the Part 10 meta-header or passed during negotiation, but this only identifies the last application to write/send on the data, not necessarily the one which did the compression). A further complication is that the DicomObjects implementation was derived from a well-known existing 3rd party codec, so there is the likelihood that other DICOM software may suffer the same problem. The chosen solution has 3 components:
The default behaviour of DicomObjects (from 4.2.200.x onwards) will be to restrict the PSV value to the range 1 to 4, totally avoiding this issue for future compressed images, whether handled by DicomObjects or other software. There is the possibility that this will slightly reduce the compression efficiency, but in practice, experiments show that this will affect very few images, and even then only by less than 2%, so this is a small price to pay to improve interoperability. Developers will have controllable options, where they wish to choose either the old (PSV=1 to 7, signed arithmetic) or method, or to insist that the full range of PSV values be considered. In addition, the Version of DicomObjects compressing the data will be included as a comment in the JPEG data stream, and if the developer insists on using the old signed method, then there will also be a comment mentioning this, with a link to this web page.
The default behaviour (where PSV 5, 6 or 7, together with 16 bit data is found) will be to attempt decompression using both signed and unsigned arithmetic, and to use whichever produces the smaller range of pixel values. Whilst not guaranteed to identify the correct method in all circumstances, this has proved 100% accurate in all testing so far. As expected, there is also an override available to programmers to insist on signed or unsigned arithmetic (which could for instance be used to update an entire existing archive).
Normally, DicomObjects avoids decompression and re-compression wherever possible, but where any possibly affected images are identified (where PSV 5, 6 or 7, together with 16 bit data is found), then decompression and re-compression is forced, according to the rules above.
The likely true effects of this bug on images are actually very small, as evidenced by the fact that it had been present in DicomObjects for 7 years without being noticed until 2005. Although in theory any image with >8 bits could be affected when saved using the affected transfer syntax, it appears, in practice, that only images using a full 16 bits of data and which themselves are truly signed pixel data are ever assigned one of the problematic PSVs (5, 6 or 7). Moreover, as this compression bug is symmetrical, images can always be retrieved and re-compressed correctly. The larger impact is likely to be the work involved by system vendors to identify affected images (however few) in their archives and to manage transition to a correct version. The properties and options described below are designed to help in this respect.
New DicomObjects Methods and Properties:
A string including the text "PSV=x" where PSV is the PSV used (this sub-string will only be present for lossless JPEG compressed images). It will also include (separated by \ characters) any comment fields found in the JPEG data stream. Note that this accessing this property does not require reading of an entire image off disk (only the JPEG header is read providing that the normal memory mapping is used), so it should be possible to scan through an archive very quickly to find and correct affected images.
Integers specifying how to (de)compress the image if Lossless (de)compression is required:
- 0: Default behaviour as above
- 1: Always use unsigned (standard) arithmetic
- 2: Always use signed (old) arithmetic
The default value for these properties is 0, but this can be overridden globally using the LosslessCompressionMode & LosslessDecompressionMode registry values.
The 4th (quality) parameter may be used to specify an explicit PSV (this is useful mainly for testing)
- Mathieu Malaterre for finding the problem in the first place
- David Clunie for noticing my implementation name in a problematic image and bringing it to my attention
- Yves Martell for helping me to identify the fault, and for suggesting the "smallest pixel range" decompression method.