I have a JPEG that is un-openable in any program:
Opening in Ubuntu Image Viewer yields:
Passing the photo through convert
yields similar results:
$ convert corrupt.jpg out.jpg
convert.im6: Bogus Huffman table definition `corrupt.jpg' @ error/jpeg.c/JPEGErrorHandler/316.
convert.im6: no images defined `out.jpg' @ error/convert.c/ConvertImageCommand/3044.
Running the photo through exiftool
yields:
ExifTool Version Number : 9.46
File Name : corrupt.jpg
Directory : .
File Size : 47 kB
File Modification Date/Time : 2015:04:11 01:31:14-07:00
File Access Date/Time : 2018:05:04 10:26:04-07:00
File Inode Change Date/Time : 2018:05:04 10:26:03-07:00
File Permissions : r--------
File Type : JPEG
MIME Type : image/jpeg
Comment : Y�.�.�..2..Q.Q.
Image Width : 640
Image Height : 480
Encoding Process : Baseline DCT, Huffman coding
Bits Per Sample : 8
Color Components : 3
Y Cb Cr Sub Sampling : YCbCr4:2:2 (2 1)
Image Size : 640x480
Un-corrupted photos containing similar image contents average 45-48k
, so I reckon the photo data itself is inside this JPEG somewhere.
I hosted the photo on S3. You can download it w/ wget
:
wget https://s3.amazonaws.com/jordanarseno.com/corrupt.jpg
I opened the file with hexedit
and found the following:
the photo contents outside of the first few hundred bytes is randomly distributed enough to suggest it contains an image. i.e. I'm not seeing consecutive streams of 0
's of F
's.
it does in-fact start with the FF D8
file signature, as JPEGs ought to.
FF E0
or FF E1
like the list of file signatures says should correspond to JPEGs or JFIFs. Instead it isFF FE
. Which, is in the table, but is listed as: Byte-order mark for text file encoded in little-endian 16-bit Unicode Transfer Format
not long after the FF FE
, I see bytes whose ascii representation is: &'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz
. Seems rather strange for a JPEG. What is this?
likewise, the ASCII string &'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz
appears about 100 bytes later.
FF D9
(the JPEG terminator string) is in the file, but characters do appear after this terminator:
FF D9 5C 72 78 E0 7C 94 CD B2 9C FF 00 C4 BF 53 C0 E7 FE 41 D3 9C FF 00 E3 95 7C F1 B6 92 5F 7A 2B EB 54 AF BF E6 30 FD A0 7F CC 3B 53 E9 FF 00 40 F9 FF 00 F8 8A 4D F7 08 30
Switching over to Windows and using JPEGsnoop yields:
JPEGsnoop 1.8.0 by Calvin Hass
http://www.impulseadventure.com/photo/
-------------------------------------
Filename: [C:\corrupt.jpg]
Filesize: [47760] Bytes
Start Offset: 0x00000000
*** Marker: SOI (xFFD8) ***
OFFSET: 0x00000000
*** Marker: COM (Comment) (xFFFE) ***
OFFSET: 0x00000002
Comment length = 36
Comment=Y.Ò................à.....2..Q.Q...
*** Marker: DQT (xFFDB) ***
Define a Quantization Table.
OFFSET: 0x00000028
Table length = 132
----
Precision=8 bits
Destination ID=0 (Luminance)
DQT, Row #0: 3 2 2 3 4 7 9 10
DQT, Row #1: 2 2 2 3 4 10 10 9
DQT, Row #2: 2 2 3 4 7 10 12 10
DQT, Row #3: 2 3 4 5 9 15 14 11
DQT, Row #4: 3 4 6 10 12 19 18 13
DQT, Row #5: 4 6 9 11 14 18 19 16
DQT, Row #6: 8 11 13 15 18 21 21 17
DQT, Row #7: 12 16 16 17 19 17 18 17
Approx quality factor = 91.45 (scaling=17.09 variance=0.95)
----
Precision=8 bits
Destination ID=1 (Chrominance)
DQT, Row #0: 3 3 4 8 17 17 17 17
DQT, Row #1: 3 4 4 11 17 17 17 17
DQT, Row #2: 4 4 10 17 17 17 17 17
DQT, Row #3: 8 11 17 17 17 17 17 17
DQT, Row #4: 17 17 17 17 17 17 17 17
DQT, Row #5: 17 17 17 17 17 17 17 17
DQT, Row #6: 17 17 17 17 17 17 17 17
DQT, Row #7: 17 17 17 17 17 17 17 17
Approx quality factor = 91.44 (scaling=17.11 variance=0.19)
*** Marker: COM (Comment) (xFFFE) ***
OFFSET: 0x000000AE
Comment length = 5
Comment=...
*** Marker: SOF0 (Baseline DCT) (xFFC0) ***
OFFSET: 0x000000B5
Frame header length = 17
Precision = 8
Number of Lines = 480
Samples per Line = 640
Image Size = 640 x 480
Raw Image Orientation = Landscape
Number of Img components = 3
Component[1]: ID=0x01, Samp Fac=0x21 (Subsamp 1 x 1), Quant Tbl Sel=0x00 (Lum: Y)
Component[2]: ID=0x02, Samp Fac=0x11 (Subsamp 2 x 1), Quant Tbl Sel=0x01 (Chrom: Cb)
Component[3]: ID=0x03, Samp Fac=0x11 (Subsamp 2 x 1), Quant Tbl Sel=0x01 (Chrom: Cr)
*** Marker: DHT (Define Huffman Table) (xFFC4) ***
OFFSET: 0x000000C8
Huffman table length = 418
----
Destination ID = 0
Class = 0 (DC / Lossless Table)
Codes of length 01 bits (000 total):
Codes of length 02 bits (001 total): 00
Codes of length 03 bits (005 total): 01 02 03 04 05
Codes of length 04 bits (001 total): 06
Codes of length 05 bits (001 total): 07
Codes of length 06 bits (001 total): 08
Codes of length 07 bits (001 total): 09
Codes of length 08 bits (001 total): 0A
Codes of length 09 bits (001 total): 0B
Codes of length 10 bits (000 total):
Codes of length 11 bits (000 total):
Codes of length 12 bits (000 total):
Codes of length 13 bits (000 total):
Codes of length 14 bits (000 total):
Codes of length 15 bits (000 total):
Codes of length 16 bits (000 total):
Total number of codes: 012
----
Destination ID = 1
Class = 0 (DC / Lossless Table)
Codes of length 01 bits (000 total):
Codes of length 02 bits (003 total): 13 0E 0F
Codes of length 03 bits (001 total): 10
Codes of length 04 bits (001 total): 11
Codes of length 05 bits (001 total): 12
Codes of length 06 bits (001 total): 12
Codes of length 07 bits (012 total): 12 0B 0D 13 15 13 11 15 10 11 12 11
Codes of length 08 bits (016 total): 01 03 03 03 04 04 04 08 04 04 08 11 0B 0A 0B 11
Codes of length 09 bits (013 total): 11 11 11 11 11 11 11 11 11 11 11 11 11
Codes of length 10 bits (011 total): 11 11 11 11 11 11 11 11 11 11 11
Codes of length 11 bits (012 total): 11 11 11 11 11 11 11 11 11 11 11 01
Codes of length 12 bits (015 total): 01 01 01 01 00 00 00 00 00 00 01 02 03 04 05
Codes of length 13 bits (012 total): 06 07 08 09 0A 0B 10 00 02 01 03 03
Codes of length 14 bits (009 total): 02 04 03 05 05 04 04 00 00
Codes of length 15 bits (010 total): 01 7D 01 02 03 00 04 11 05 12
Codes of length 16 bits (014 total): 21 31 41 06 13 51 61 07 22 71 14 32 81 91
Total number of codes: 131
----
Destination ID = 1
Class = 10 (AC Table)
ERROR: Invalid DHT Class (10). Aborting DHT Load.
ERROR: Expected marker 0xFF, got 0x73 @ offset 0x0000026C. Consider using [Tools->Img Search Fwd/Rev].
*** Searching Compression Signatures ***
Signature: 01FF5BA518B453CC8F224A4C85505196
Signature (Rotated): 01D13AFD01FF0B6EC46EA4081D25BB4D
File Offset: 0 bytes
Chroma subsampling: 2x1
EXIF Make/Model: NONE
EXIF Makernotes: NONE
EXIF Software: NONE
Searching Compression Signatures: (3347 built-in, 0 user(*) )
EXIF.Make / Software EXIF.Model Quality Subsamp Match?
------------------------- ----------------------------------- ---------------- --------------
CAM:[NIKON ] [NIKON D40 ] [FINE ] Yes
Based on the analysis of compression characteristics and EXIF metadata:
ASSESSMENT: Class 1 - Image is processed/edited
This may be a new software editor for the database.
If this file is processed, and editor doesn't appear in list above,
PLEASE ADD TO DATABASE with [Tools->Add Camera to DB]
*** Additional Info ***
NOTE: Data exists after EOF, range: 0x00000000-0x0000BA90 (47760 bytes)
As a last note, the EXIF.Model
identified by JPEGSnoop is incorrect. This photo would have been taken with a VC0706 UART Model: LCF - 23T 0V528
In summary: Is this JPEG recoverable?
The approach used to get this back was more luck than judgement. I think I can explain, though be aware it involves a hex editor...
The Wikipedia page for the syntax of a JPEG file explains that it is made up of a series of segments each started by a two byte marker - 0xFF
and another byte to indicate the type of segment.
The hope was that it was just the Huffman table segment of the file that was wrong - as suggested by the error message. Without needing to understand what a Huffman table is, it was enough to see that the same section on Wikipedia explains it is a 0xFF
0xC4
marker for a Huffman table segment.
Further down the page, it mentions:
The JPEG standard provides general-purpose Huffman tables; encoders may also choose to generate Huffman tables...
Opening up a few other JPEG files found what looks like a standard set of 4 consecutive Huffman table segments - each starting with that 0xFF
0xC4
marker. The sample corrupt.jpg
however just had one Huffman table - from position 0x00c8
to 0x02bc
below.
(Both contain that &'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz
sequence you mentioned in their Huffman tables. In the corrupt file it appears twice in that single Huffman table, in the 'more conventional' JPEGs it appears in the second and fourth Huffman tables.)
From there, the fixed image is a copy and paste of the standard 4 Huffman tables, in place of that range of bytes in corrupt.jpg
- now from 0x00c8
to 0x0278
in the fixed file.
Because the JPEG format is based around scanning for segments between those 0xff
markers, you can just swap out the Huffman segments - there are no other pointers in the file to worry about. As you said, the rest of the file looked like a plausible JPEG.
Summary of the steps taken:
corrupt.jpg
for FF C4
and note the offsetFF
. If it's another FF C4
(so a second Huffman table) keep goingFF C4
(included) up to but not including the next FF
0x00c8
to 0x0278
in the fixed fileCorrupt Huffman table:
0000-00d0: xx xx xx xx xx xx xx xx-ff c4 01 a2-00 00 01 05 !....... ........
0000-00e0: 01 01 01 01-01 01 00 00-00 00 00 00-00 00 01 02 ........ ........
0000-00f0: 03 04 05 06-07 08 09 0a-0b 01 00 03-01 01 01 01 ........ ........
0000-0100: 0c 10 0d 0b-0c 0f 0c 09-0a 0e 13 0e-0f 10 11 12 ........ ........
0000-0110: 12 12 0b 0d-13 15 13 11-15 10 11 12-11 01 03 03 ........ ........
0000-0120: 03 04 04 04-08 04 04 08-11 0b 0a 0b-11 11 11 11 ........ ........
0000-0130: 11 11 11 11-11 11 11 11-11 11 11 11-11 11 11 11 ........ ........
0000-0140: 11 11 11 11-11 11 11 11-11 11 11 11-11 11 11 11 ........ ........
0000-0150: 01 01 01 01-01 00 00 00-00 00 00 01-02 03 04 05 ........ ........
0000-0160: 06 07 08 09-0a 0b 10 00-02 01 03 03-02 04 03 05 ........ ........
0000-0170: 05 04 04 00-00 01 7d 01-02 03 00 04-11 05 12 21 ......}. .......!
0000-0180: 31 41 06 13-51 61 07 22-71 14 32 81-91 a1 08 23 1A..Qa." q.2....#
0000-0190: 42 b1 c1 15-52 d1 f0 24-33 62 72 82-09 0a 16 17 B...R..$ 3br.....
0000-01a0: 18 19 1a 25-26 27 28 29-2a 34 35 36-37 38 39 3a ...%&'() *456789:
0000-01b0: 43 44 45 46-47 48 49 4a-53 54 55 56-57 58 59 5a CDEFGHIJ STUVWXYZ
0000-01c0: 63 64 65 66-67 68 69 6a-73 74 75 76-77 78 79 7a cdefghij stuvwxyz
0000-01d0: 83 84 85 86-87 88 89 8a-92 93 94 95-96 97 98 99 ........ ........
0000-01e0: 9a a2 a3 a4-a5 a6 a7 a8-a9 aa b2 b3-b4 b5 b6 b7 ........ ........
0000-01f0: b8 b9 ba c2-c3 c4 c5 c6-c7 c8 c9 ca-d2 d3 d4 d5 ........ ........
0000-0200: d6 d7 d8 d9-da e1 e2 e3-e4 e5 e6 e7-e8 e9 ea f1 ........ ........
0000-0210: f2 f3 f4 f5-f6 f7 f8 f9-fa 11 00 02-01 02 04 04 ........ ........
0000-0220: 03 04 07 05-04 04 00 01-02 77 00 01-02 03 11 04 ........ .w......
0000-0230: 05 21 31 06-12 41 51 07-61 71 13 22-32 81 08 14 .!1..AQ. aq."2...
0000-0240: 42 91 a1 b1-c1 09 23 33-52 f0 15 62-72 d1 0a 16 B.....#3 R..br...
0000-0250: 24 34 e1 25-f1 17 18 19-1a 26 27 28-29 2a 35 36 $4.%.... .&'()*56
0000-0260: 37 38 39 3a-43 44 45 46-47 48 49 4a-53 54 55 56 789:CDEF GHIJSTUV
0000-0270: 57 58 59 5a-63 64 65 66-67 68 69 6a-73 74 75 76 WXYZcdef ghijstuv
0000-0280: 77 78 79 7a-82 83 84 85-86 87 88 89-8a 92 93 94 wxyz.... ........
0000-0290: 95 96 97 98-99 9a a2 a3-a4 a5 a6 a7-a8 a9 aa b2 ........ ........
0000-02a0: b3 b4 b5 b6-b7 b8 b9 ba-c2 c3 c4 c5-c6 c7 c8 c9 ........ ........
0000-02b0: ca d2 d3 d4-d5 d6 d7 d8-d9 da e2 e3-e4 e5 e6 e7 ........ ........
0000-02c0: e8 e9 ea f2-f3 f4 f5 f6-f7 f8 f9 fa-xx xx xx xx ........ ........
Then the next two bytes are ff dd
for the start of the next segment:
0000-02c0: xx xx xx xx-xx xx xx xx-xx xx xx xx-ff dd 00 04 ........ ........
This was replaced with the standard 4 general-purpose Huffman tables instead - look for the ff c4
markers:
0000-00d0: xx xx xx xx xx xx xx xx-ff c4 00 1f-00 00 01 05 !....... ........
0000-00e0: 01 01 01 01-01 01 00 00-00 00 00 00-00 00 01 02 ........ ........
0000-00f0: 03 04 05 06-07 08 09 0a-0b ff c4 00-b5 10 00 02 ........ ........
0000-0100: 01 03 03 02-04 03 05 05-04 04 00 00-01 7d 01 02 ........ .....}..
0000-0110: 03 00 04 11-05 12 21 31-41 06 13 51-61 07 22 71 ......!1 A..Qa."q
0000-0120: 14 32 81 91-a1 08 23 42-b1 c1 15 52-d1 f0 24 33 .2....#B ...R..$3
0000-0130: 62 72 82 09-0a 16 17 18-19 1a 25 26-27 28 29 2a br...... ..%&'()*
0000-0140: 34 35 36 37-38 39 3a 43-44 45 46 47-48 49 4a 53 456789:C DEFGHIJS
0000-0150: 54 55 56 57-58 59 5a 63-64 65 66 67-68 69 6a 73 TUVWXYZc defghijs
0000-0160: 74 75 76 77-78 79 7a 83-84 85 86 87-88 89 8a 92 tuvwxyz. ........
0000-0170: 93 94 95 96-97 98 99 9a-a2 a3 a4 a5-a6 a7 a8 a9 ........ ........
0000-0180: aa b2 b3 b4-b5 b6 b7 b8-b9 ba c2 c3-c4 c5 c6 c7 ........ ........
0000-0190: c8 c9 ca d2-d3 d4 d5 d6-d7 d8 d9 da-e1 e2 e3 e4 ........ ........
0000-01a0: e5 e6 e7 e8-e9 ea f1 f2-f3 f4 f5 f6-f7 f8 f9 fa ........ ........
0000-01b0: ff c4 00 1f-01 00 03 01-01 01 01 01-01 01 01 01 ........ ........
0000-01c0: 00 00 00 00-00 00 01 02-03 04 05 06-07 08 09 0a ........ ........
0000-01d0: 0b ff c4 00-b5 11 00 02-01 02 04 04-03 04 07 05 ........ ........
0000-01e0: 04 04 00 01-02 77 00 01-02 03 11 04-05 21 31 06 .....w.. .....!1.
0000-01f0: 12 41 51 07-61 71 13 22-32 81 08 14-42 91 a1 b1 .AQ.aq." 2...B...
0000-0200: c1 09 23 33-52 f0 15 62-72 d1 0a 16-24 34 e1 25 ..#3R..b r...$4.%
0000-0210: f1 17 18 19-1a 26 27 28-29 2a 35 36-37 38 39 3a .....&'( )*56789:
0000-0220: 43 44 45 46-47 48 49 4a-53 54 55 56-57 58 59 5a CDEFGHIJ STUVWXYZ
0000-0230: 63 64 65 66-67 68 69 6a-73 74 75 76-77 78 79 7a cdefghij stuvwxyz
0000-0240: 82 83 84 85-86 87 88 89-8a 92 93 94-95 96 97 98 ........ ........
0000-0250: 99 9a a2 a3-a4 a5 a6 a7-a8 a9 aa b2-b3 b4 b5 b6 ........ ........
0000-0260: b7 b8 b9 ba-c2 c3 c4 c5-c6 c7 c8 c9-ca d2 d3 d4 ........ ........
0000-0270: d5 d6 d7 d8-d9 da e2 e3-e4 e5 e6 e7-e8 e9 ea f2 ........ ........
0000-0280: f3 f4 f5 f6-f7 f8 f9 fa-xx xx xx xx xx xx xx xx ........ .....(..
User contributions licensed under CC BY-SA 3.0