is a jpeg with a bogus huffman table recoverable?

10

I have a JPEG that is un-openable in any program:

Opening in Ubuntu Image Viewer yields:

could not load image. bogus huffman table definition

Passing the photo through convert yields similar results:

$ convert corrupt.jpg out.jpg
convert.im6: Bogus Huffman table definition `corrupt.jpg' @ error/jpeg.c/JPEGErrorHandler/316.
convert.im6: no images defined `out.jpg' @ error/convert.c/ConvertImageCommand/3044.

Running the photo through exiftool yields:

ExifTool Version Number         : 9.46
File Name                       : corrupt.jpg
Directory                       : .
File Size                       : 47 kB
File Modification Date/Time     : 2015:04:11 01:31:14-07:00
File Access Date/Time           : 2018:05:04 10:26:04-07:00
File Inode Change Date/Time     : 2018:05:04 10:26:03-07:00
File Permissions                : r--------
File Type                       : JPEG
MIME Type                       : image/jpeg
Comment                         : Y�.�.�..2..Q.Q.
Image Width                     : 640
Image Height                    : 480
Encoding Process                : Baseline DCT, Huffman coding
Bits Per Sample                 : 8
Color Components                : 3
Y Cb Cr Sub Sampling            : YCbCr4:2:2 (2 1)
Image Size                      : 640x480

Un-corrupted photos containing similar image contents average 45-48k, so I reckon the photo data itself is inside this JPEG somewhere.

I hosted the photo on S3. You can download it w/ wget:

wget https://s3.amazonaws.com/jordanarseno.com/corrupt.jpg

I opened the file with hexedit and found the following:

  • the photo contents outside of the first few hundred bytes is randomly distributed enough to suggest it contains an image. i.e. I'm not seeing consecutive streams of 0's of F's.

  • it does in-fact start with the FF D8 file signature, as JPEGs ought to.

  • the next two bytes are not FF E0 or FF E1 like the list of file signatures says should correspond to JPEGs or JFIFs. Instead it isFF FE. Which, is in the table, but is listed as:

Byte-order mark for text file encoded in little-endian 16-bit Unicode Transfer Format

  • not long after the FF FE, I see bytes whose ascii representation is: &'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz. Seems rather strange for a JPEG. What is this?

  • likewise, the ASCII string &'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz appears about 100 bytes later.

  • FF D9 (the JPEG terminator string) is in the file, but characters do appear after this terminator:

    FF D9 5C 72 78 E0 7C 94 CD B2 9C FF 00 C4 BF 53 C0 E7 FE 41 D3 9C FF 00 E3 95 7C F1 B6 92 5F 7A 2B EB 54 AF BF E6 30 FD A0 7F CC 3B 53 E9 FF 00 40 F9 FF 00 F8 8A 4D F7 08 30

Switching over to Windows and using JPEGsnoop yields:

JPEGsnoop 1.8.0 by Calvin Hass
  http://www.impulseadventure.com/photo/
  -------------------------------------

  Filename: [C:\corrupt.jpg]
  Filesize: [47760] Bytes

Start Offset: 0x00000000
*** Marker: SOI (xFFD8) ***
  OFFSET: 0x00000000

*** Marker: COM (Comment) (xFFFE) ***
  OFFSET: 0x00000002
  Comment length = 36
    Comment=Y.Ò................à.....2..Q.Q...

*** Marker: DQT (xFFDB) ***
  Define a Quantization Table.
  OFFSET: 0x00000028
  Table length = 132
  ----
  Precision=8 bits
  Destination ID=0 (Luminance)
    DQT, Row #0:   3   2   2   3   4   7   9  10 
    DQT, Row #1:   2   2   2   3   4  10  10   9 
    DQT, Row #2:   2   2   3   4   7  10  12  10 
    DQT, Row #3:   2   3   4   5   9  15  14  11 
    DQT, Row #4:   3   4   6  10  12  19  18  13 
    DQT, Row #5:   4   6   9  11  14  18  19  16 
    DQT, Row #6:   8  11  13  15  18  21  21  17 
    DQT, Row #7:  12  16  16  17  19  17  18  17 
    Approx quality factor = 91.45 (scaling=17.09 variance=0.95)
  ----
  Precision=8 bits
  Destination ID=1 (Chrominance)
    DQT, Row #0:   3   3   4   8  17  17  17  17 
    DQT, Row #1:   3   4   4  11  17  17  17  17 
    DQT, Row #2:   4   4  10  17  17  17  17  17 
    DQT, Row #3:   8  11  17  17  17  17  17  17 
    DQT, Row #4:  17  17  17  17  17  17  17  17 
    DQT, Row #5:  17  17  17  17  17  17  17  17 
    DQT, Row #6:  17  17  17  17  17  17  17  17 
    DQT, Row #7:  17  17  17  17  17  17  17  17 
    Approx quality factor = 91.44 (scaling=17.11 variance=0.19)

*** Marker: COM (Comment) (xFFFE) ***
  OFFSET: 0x000000AE
  Comment length = 5
    Comment=...

*** Marker: SOF0 (Baseline DCT) (xFFC0) ***
  OFFSET: 0x000000B5
  Frame header length = 17
  Precision = 8
  Number of Lines = 480
  Samples per Line = 640
  Image Size = 640 x 480
  Raw Image Orientation = Landscape
  Number of Img components = 3
    Component[1]: ID=0x01, Samp Fac=0x21 (Subsamp 1 x 1), Quant Tbl Sel=0x00 (Lum: Y)
    Component[2]: ID=0x02, Samp Fac=0x11 (Subsamp 2 x 1), Quant Tbl Sel=0x01 (Chrom: Cb)
    Component[3]: ID=0x03, Samp Fac=0x11 (Subsamp 2 x 1), Quant Tbl Sel=0x01 (Chrom: Cr)

*** Marker: DHT (Define Huffman Table) (xFFC4) ***
  OFFSET: 0x000000C8
  Huffman table length = 418
  ----
  Destination ID = 0
  Class = 0 (DC / Lossless Table)
    Codes of length 01 bits (000 total): 
    Codes of length 02 bits (001 total): 00 
    Codes of length 03 bits (005 total): 01 02 03 04 05 
    Codes of length 04 bits (001 total): 06 
    Codes of length 05 bits (001 total): 07 
    Codes of length 06 bits (001 total): 08 
    Codes of length 07 bits (001 total): 09 
    Codes of length 08 bits (001 total): 0A 
    Codes of length 09 bits (001 total): 0B 
    Codes of length 10 bits (000 total): 
    Codes of length 11 bits (000 total): 
    Codes of length 12 bits (000 total): 
    Codes of length 13 bits (000 total): 
    Codes of length 14 bits (000 total): 
    Codes of length 15 bits (000 total): 
    Codes of length 16 bits (000 total): 
    Total number of codes: 012

  ----
  Destination ID = 1
  Class = 0 (DC / Lossless Table)
    Codes of length 01 bits (000 total): 
    Codes of length 02 bits (003 total): 13 0E 0F 
    Codes of length 03 bits (001 total): 10 
    Codes of length 04 bits (001 total): 11 
    Codes of length 05 bits (001 total): 12 
    Codes of length 06 bits (001 total): 12 
    Codes of length 07 bits (012 total): 12 0B 0D 13 15 13 11 15 10 11 12 11 
    Codes of length 08 bits (016 total): 01 03 03 03 04 04 04 08 04 04 08 11 0B 0A 0B 11 

    Codes of length 09 bits (013 total): 11 11 11 11 11 11 11 11 11 11 11 11 11 
    Codes of length 10 bits (011 total): 11 11 11 11 11 11 11 11 11 11 11 
    Codes of length 11 bits (012 total): 11 11 11 11 11 11 11 11 11 11 11 01 
    Codes of length 12 bits (015 total): 01 01 01 01 00 00 00 00 00 00 01 02 03 04 05 
    Codes of length 13 bits (012 total): 06 07 08 09 0A 0B 10 00 02 01 03 03 
    Codes of length 14 bits (009 total): 02 04 03 05 05 04 04 00 00 
    Codes of length 15 bits (010 total): 01 7D 01 02 03 00 04 11 05 12 
    Codes of length 16 bits (014 total): 21 31 41 06 13 51 61 07 22 71 14 32 81 91 
    Total number of codes: 131

  ----
  Destination ID = 1
  Class = 10 (AC Table)
ERROR: Invalid DHT Class (10). Aborting DHT Load.

ERROR: Expected marker 0xFF, got 0x73 @ offset 0x0000026C. Consider using [Tools->Img Search Fwd/Rev].

*** Searching Compression Signatures ***

  Signature:           01FF5BA518B453CC8F224A4C85505196
  Signature (Rotated): 01D13AFD01FF0B6EC46EA4081D25BB4D
  File Offset:         0 bytes
  Chroma subsampling:  2x1
  EXIF Make/Model:     NONE
  EXIF Makernotes:     NONE
  EXIF Software:       NONE

  Searching Compression Signatures: (3347 built-in, 0 user(*) )

          EXIF.Make / Software        EXIF.Model                            Quality           Subsamp Match?
          -------------------------   -----------------------------------   ----------------  --------------
     CAM:[NIKON                    ] [NIKON D40                          ] [FINE            ] Yes              

  Based on the analysis of compression characteristics and EXIF metadata:

  ASSESSMENT: Class 1 - Image is processed/edited

  This may be a new software editor for the database.
  If this file is processed, and editor doesn't appear in list above,
  PLEASE ADD TO DATABASE with [Tools->Add Camera to DB]


*** Additional Info ***
NOTE: Data exists after EOF, range: 0x00000000-0x0000BA90 (47760 bytes)

As a last note, the EXIF.Model identified by JPEGSnoop is incorrect. This photo would have been taken with a VC0706 UART Model: LCF - 23T 0V528


In summary: Is this JPEG recoverable?

jpeg
asked on Stack Overflow May 4, 2018 by Jordan Arseno

1 Answer

15

The approach used to get this back was more luck than judgement. I think I can explain, though be aware it involves a hex editor...

The Wikipedia page for the syntax of a JPEG file explains that it is made up of a series of segments each started by a two byte marker - 0xFF and another byte to indicate the type of segment.

The hope was that it was just the Huffman table segment of the file that was wrong - as suggested by the error message. Without needing to understand what a Huffman table is, it was enough to see that the same section on Wikipedia explains it is a 0xFF 0xC4 marker for a Huffman table segment.

Further down the page, it mentions:

The JPEG standard provides general-purpose Huffman tables; encoders may also choose to generate Huffman tables...

Opening up a few other JPEG files found what looks like a standard set of 4 consecutive Huffman table segments - each starting with that 0xFF 0xC4 marker. The sample corrupt.jpg however just had one Huffman table - from position 0x00c8 to 0x02bc below.

(Both contain that &'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz sequence you mentioned in their Huffman tables. In the corrupt file it appears twice in that single Huffman table, in the 'more conventional' JPEGs it appears in the second and fourth Huffman tables.)

From there, the fixed image is a copy and paste of the standard 4 Huffman tables, in place of that range of bytes in corrupt.jpg - now from 0x00c8 to 0x0278 in the fixed file.

Because the JPEG format is based around scanning for segments between those 0xff markers, you can just swap out the Huffman segments - there are no other pointers in the file to worry about. As you said, the rest of the file looked like a plausible JPEG.


Summary of the steps taken:

  • Hex search the corrupt.jpg for FF C4 and note the offset
  • Hex search for the next FF. If it's another FF C4 (so a second Huffman table) keep going
  • Delete the content from the first FF C4 (included) up to but not including the next FF
  • Instead replace it with the 'standard 4 Huffman tables'. These are the bytes in the last sample below, or can be copied from 0x00c8 to 0x0278 in the fixed file

Corrupt Huffman table:

0000-00d0:  xx xx xx xx xx xx xx xx-ff c4 01 a2-00 00 01 05  !....... ........
0000-00e0:  01 01 01 01-01 01 00 00-00 00 00 00-00 00 01 02  ........ ........
0000-00f0:  03 04 05 06-07 08 09 0a-0b 01 00 03-01 01 01 01  ........ ........
0000-0100:  0c 10 0d 0b-0c 0f 0c 09-0a 0e 13 0e-0f 10 11 12  ........ ........
0000-0110:  12 12 0b 0d-13 15 13 11-15 10 11 12-11 01 03 03  ........ ........
0000-0120:  03 04 04 04-08 04 04 08-11 0b 0a 0b-11 11 11 11  ........ ........
0000-0130:  11 11 11 11-11 11 11 11-11 11 11 11-11 11 11 11  ........ ........
0000-0140:  11 11 11 11-11 11 11 11-11 11 11 11-11 11 11 11  ........ ........
0000-0150:  01 01 01 01-01 00 00 00-00 00 00 01-02 03 04 05  ........ ........
0000-0160:  06 07 08 09-0a 0b 10 00-02 01 03 03-02 04 03 05  ........ ........
0000-0170:  05 04 04 00-00 01 7d 01-02 03 00 04-11 05 12 21  ......}. .......!
0000-0180:  31 41 06 13-51 61 07 22-71 14 32 81-91 a1 08 23  1A..Qa." q.2....#
0000-0190:  42 b1 c1 15-52 d1 f0 24-33 62 72 82-09 0a 16 17  B...R..$ 3br.....
0000-01a0:  18 19 1a 25-26 27 28 29-2a 34 35 36-37 38 39 3a  ...%&'() *456789:
0000-01b0:  43 44 45 46-47 48 49 4a-53 54 55 56-57 58 59 5a  CDEFGHIJ STUVWXYZ
0000-01c0:  63 64 65 66-67 68 69 6a-73 74 75 76-77 78 79 7a  cdefghij stuvwxyz
0000-01d0:  83 84 85 86-87 88 89 8a-92 93 94 95-96 97 98 99  ........ ........
0000-01e0:  9a a2 a3 a4-a5 a6 a7 a8-a9 aa b2 b3-b4 b5 b6 b7  ........ ........
0000-01f0:  b8 b9 ba c2-c3 c4 c5 c6-c7 c8 c9 ca-d2 d3 d4 d5  ........ ........
0000-0200:  d6 d7 d8 d9-da e1 e2 e3-e4 e5 e6 e7-e8 e9 ea f1  ........ ........
0000-0210:  f2 f3 f4 f5-f6 f7 f8 f9-fa 11 00 02-01 02 04 04  ........ ........
0000-0220:  03 04 07 05-04 04 00 01-02 77 00 01-02 03 11 04  ........ .w......
0000-0230:  05 21 31 06-12 41 51 07-61 71 13 22-32 81 08 14  .!1..AQ. aq."2...
0000-0240:  42 91 a1 b1-c1 09 23 33-52 f0 15 62-72 d1 0a 16  B.....#3 R..br...
0000-0250:  24 34 e1 25-f1 17 18 19-1a 26 27 28-29 2a 35 36  $4.%.... .&'()*56
0000-0260:  37 38 39 3a-43 44 45 46-47 48 49 4a-53 54 55 56  789:CDEF GHIJSTUV
0000-0270:  57 58 59 5a-63 64 65 66-67 68 69 6a-73 74 75 76  WXYZcdef ghijstuv
0000-0280:  77 78 79 7a-82 83 84 85-86 87 88 89-8a 92 93 94  wxyz.... ........
0000-0290:  95 96 97 98-99 9a a2 a3-a4 a5 a6 a7-a8 a9 aa b2  ........ ........
0000-02a0:  b3 b4 b5 b6-b7 b8 b9 ba-c2 c3 c4 c5-c6 c7 c8 c9  ........ ........
0000-02b0:  ca d2 d3 d4-d5 d6 d7 d8-d9 da e2 e3-e4 e5 e6 e7  ........ ........
0000-02c0:  e8 e9 ea f2-f3 f4 f5 f6-f7 f8 f9 fa-xx xx xx xx  ........ ........

Then the next two bytes are ff dd for the start of the next segment:

0000-02c0:  xx xx xx xx-xx xx xx xx-xx xx xx xx-ff dd 00 04  ........ ........

This was replaced with the standard 4 general-purpose Huffman tables instead - look for the ff c4 markers:

0000-00d0:  xx xx xx xx xx xx xx xx-ff c4 00 1f-00 00 01 05  !....... ........
0000-00e0:  01 01 01 01-01 01 00 00-00 00 00 00-00 00 01 02  ........ ........
0000-00f0:  03 04 05 06-07 08 09 0a-0b ff c4 00-b5 10 00 02  ........ ........
0000-0100:  01 03 03 02-04 03 05 05-04 04 00 00-01 7d 01 02  ........ .....}..
0000-0110:  03 00 04 11-05 12 21 31-41 06 13 51-61 07 22 71  ......!1 A..Qa."q
0000-0120:  14 32 81 91-a1 08 23 42-b1 c1 15 52-d1 f0 24 33  .2....#B ...R..$3
0000-0130:  62 72 82 09-0a 16 17 18-19 1a 25 26-27 28 29 2a  br...... ..%&'()*
0000-0140:  34 35 36 37-38 39 3a 43-44 45 46 47-48 49 4a 53  456789:C DEFGHIJS
0000-0150:  54 55 56 57-58 59 5a 63-64 65 66 67-68 69 6a 73  TUVWXYZc defghijs
0000-0160:  74 75 76 77-78 79 7a 83-84 85 86 87-88 89 8a 92  tuvwxyz. ........
0000-0170:  93 94 95 96-97 98 99 9a-a2 a3 a4 a5-a6 a7 a8 a9  ........ ........
0000-0180:  aa b2 b3 b4-b5 b6 b7 b8-b9 ba c2 c3-c4 c5 c6 c7  ........ ........
0000-0190:  c8 c9 ca d2-d3 d4 d5 d6-d7 d8 d9 da-e1 e2 e3 e4  ........ ........
0000-01a0:  e5 e6 e7 e8-e9 ea f1 f2-f3 f4 f5 f6-f7 f8 f9 fa  ........ ........
0000-01b0:  ff c4 00 1f-01 00 03 01-01 01 01 01-01 01 01 01  ........ ........
0000-01c0:  00 00 00 00-00 00 01 02-03 04 05 06-07 08 09 0a  ........ ........
0000-01d0:  0b ff c4 00-b5 11 00 02-01 02 04 04-03 04 07 05  ........ ........
0000-01e0:  04 04 00 01-02 77 00 01-02 03 11 04-05 21 31 06  .....w.. .....!1.
0000-01f0:  12 41 51 07-61 71 13 22-32 81 08 14-42 91 a1 b1  .AQ.aq." 2...B...
0000-0200:  c1 09 23 33-52 f0 15 62-72 d1 0a 16-24 34 e1 25  ..#3R..b r...$4.%
0000-0210:  f1 17 18 19-1a 26 27 28-29 2a 35 36-37 38 39 3a  .....&'( )*56789:
0000-0220:  43 44 45 46-47 48 49 4a-53 54 55 56-57 58 59 5a  CDEFGHIJ STUVWXYZ
0000-0230:  63 64 65 66-67 68 69 6a-73 74 75 76-77 78 79 7a  cdefghij stuvwxyz
0000-0240:  82 83 84 85-86 87 88 89-8a 92 93 94-95 96 97 98  ........ ........
0000-0250:  99 9a a2 a3-a4 a5 a6 a7-a8 a9 aa b2-b3 b4 b5 b6  ........ ........
0000-0260:  b7 b8 b9 ba-c2 c3 c4 c5-c6 c7 c8 c9-ca d2 d3 d4  ........ ........
0000-0270:  d5 d6 d7 d8-d9 da e2 e3-e4 e5 e6 e7-e8 e9 ea f2  ........ ........
0000-0280:  f3 f4 f5 f6-f7 f8 f9 fa-xx xx xx xx xx xx xx xx  ........ .....(..
answered on Stack Overflow May 8, 2018 by df778899 • edited Oct 25, 2018 by Cœur

User contributions licensed under CC BY-SA 3.0