Why are these patterns in Huffman coding bitstreams in (Photoshop-produced?) JPG files?

2

This is a question, out of curiosity, about some patterns I see in JPG files when I look at them in a hex editor. I guess it is a question about the JPEG file format; why not this part is "random noise" like the rest, when it is supposed to be (Huffman coding and so on).

Here goes:

This 136-bit (17 bytes) pattern is showing up in some JPG files that are produced by Adobe Photoshop (I do not know if Photoshop is the only application that produces these):

F7 5E EB DE FD D7 BA F7 BF 75 EE BD EF DD 7B AF 7B

It is several places in one single file, sometimes it is just one iteration, other times it is repeated like 8 or 12 times, making up blocks of 1088 bits or 1632 bits blocks. Or to be precise, it is actually a 68-bit pattern, repeated 2 or more times:

F7 5E EB DE FD D7 BA F7 B

11110111010111101110101111011110111111011101011110111010111101111011

AFAIK from reading a bit about the JPG file structure, and also verifying this in hex, that the beginning of JPG file structures are marked with FF xx. There are no such FF xx structure markers neither immediately before nor after those 68-bit patterns.

By using Breakpoint Hex Workshop, it is very easy to spot those patterns in the "Data Visualizer" window; while the rest of Huffman bitstream looks like "noise", there are suddenly blocks showing clear patterns.

Also.. I am not sure how relevant this is, but..:

Earlier, I noticed such a type of patterns also in CR2 files, that is Canon RAW files; here the pattern was a much simpler 40-bit one, though:

73 9C E7 39 CE

0111 0011 1001 1100 1110 0111 0011 1001 1100 1110

If I adjust the spaces, it becomes this:

01110 01110 01110 01110 01110 01110 01110 01110

As you can see, this is actually a repeating 5-bit pattern, and it was repeated like several hundred times for each place it appeared in the CR2 files. The CR2 file format is also a compressed file, but lossless. Then again, the Huffman coding in JPG is also a kind of lossless "compression" if I have understood it correctly.

I find it very strange that in compressed streams, there are these patterns of (what to me seems to be) "wasted" bits..

I have uploaded one of the JPG files here http://i.imgur.com/t0mi7vo.jpg - it's just a simple screenshot of some files in a folder. The Huffman code bitstream goes from offset 0x0000027C to the end, and you may see one of the instances of the repeating pattern e.g. at offset 0x0001604A

jpeg
huffman-code
bitstream
asked on Stack Overflow Nov 2, 2014 by HackeyStack • edited Nov 2, 2014 by ForguesR

2 Answers

0

Correct me if i'm wrong but i'm thinking this could be some 'blueprint' for checking if photoshop has been used. Maybe all of this is piracy related

answered on Stack Overflow Nov 2, 2014 by Jelman
0

User3344003, thank you very, very much for your answer, it is 99.9% correct..! :-)

These patterns are, as you wrote, related to large areas of color!

However, it is actually the color black (0,0,0) that creates this particular pattern:

F75EEBDEFDD7BAF7BF75EEBDEFDD7BAF7B

..or, when split in 2 x 68-bit parts;

F75EEBDEFDD7BAF7B
F75EEBDEFDD7BAF7B

To see it in action:

1) Create a 32 pixel x 32 pixel image filled with pure black (0,0,0) in Photoshop.

2) Choose File -> Save for Web & Devices

3) Select JPEG, with Maximum (Quality = 100), Blur = 0, and with all the Progressive / Optimized / Embed Color Profile / Convert to sRGB options = OFF, Metadata = None.

Now when you look at the image in a hex editor, it will show this Huffman Coding bitstream:

FFDA
000C03010002110311003F00F9FF00FB
F75EEBDEFDD7BAF7B
F75EEBDEFDD7BAF7B
F75EEBDEFDD7BAF7B
F75EEBDEFDD7BAF7B
F75EEBDEFDD7BAF7B
F75EEBDEFDD7BAF7B
F75EEBDEFDD7BAF7B
F75EEBDEFDD7BAF
FFD9

As you can see, it contains nearly 8 instances of the 68-bit pattern.

Similarly, if you instead create a 32 pixels x 32 pixels image filled with pure white (255,255,255) (and save it as a JPEG in the same way as above), you get this Huffman Coding bitstream:

FFDA
000C03010002110311003F00DFE3DFBA
F75EF7EEBDD7BDFBA
F75EF7EEBDD7BDFBA
F75EF7EEBDD7BDFBA
F75EF7EEBDD7BDFBA
F75EF7EEBDD7BDFBA
F75EF7EEBDD7BDFBA
F75EF7EEBDD7BDFBA
F75EF7EEBDD7  F
FFD9

I also tried to create a 64 pixels x 64 pixels image, divided in the middle, with the left 32 pixels x 64 pixels pure black (0,0,0), and the right 32 pixels x 64 pixels pure white (255,255,255). Then saved as JPEG with Quality = 100 etc. etc. I then got this Huffman Coding bitstream:

FFDA
000C03010002110311003F00F9FF00FB
F75EEBDEFDD7BAF   7B
F75EEBDEFDD7BAFBFC7B
F75EEBDEFDD7BAF   7B
F75EEBDEFDD7BAF803FB
F75EEBDEFDD7BAF   7B
F75EEBDEFDD7BAFBFC7B
F75EEBDEFDD7BAF   7B
F75EEBDEFDD7BAF803FB
F75EEBDEFDD7BAF   7B
F75EEBDEFDD7BAFBFC7B
F75EEBDEFDD7BAF   7B
F75EEBDEFDD7BAF803FB
F75EEBDEFDD7BAF   7B
F75EEBDEFDD7BAFBFC7B
F75EEBDEFDD7BAF   7B
F75EEBDEFDD7BAF803FB
F75EEBDEFDD7BAF   7B
F75EEBDEFDD7BAFBFC7B
F75EEBDEFDD7BAF   7B
F75EEBDEFDD7BAF803FB
F75EEBDEFDD7BAF   7B
F75EEBDEFDD7BAFBFC7B
F75EEBDEFDD7BAF   7B
F75EEBDEFDD7BAF803FB
F75EEBDEFDD7BAF   7B
F75EEBDEFDD7BAFBFC7B
F75EEBDEFDD7BAF   7B
F75EEBDEFDD7BAF803FB
F75EEBDEFDD7BAF   7B
F75EEBDEFDD7BAFBFC7B
F75EEBDEFDD7BAF   7B
F75EEBDEFDD7BA
FFD9

When I found out, I first thought: "But isn't that Huffman Coding supposed to be more efficient than.. this..!? 8 identical patterns in the 32 pixels x 32 pixels pure colored ones, and 16 + 8 + 8 identical ones in the 64 pixels x 64 pixels half black / half white one..? Why not just use one, and then use pointers, like, use this particular pattern here, here, there and ..there."

Then, I remembered the fact that these JPEG's are actually pretty unusual in that they are all made with Quality = 100.

So that Quality = 100 seems to be the other factor which is needed for seeing these F75E.. patterns.

To verify this, I then again made a 32 pixels x 32 pixels pure black (0,0,0), but now I saved instead with Quality = 0. Now this image got a much shorter Huffman Coding bitstream, which indeed also showed a certain kind of pattern, but very different one:

FFDA
000C03010002110311003F00F99
55540555501
55540555503F
FFD9
answered on Stack Overflow Nov 2, 2014 by HackeyStack • edited Nov 2, 2014 by HackeyStack

User contributions licensed under CC BY-SA 3.0