Coding an enhanced LSB reverser

-5

I'm stumbling upon a steganographied image with a divided IDAT structure of 12 blocks (the last LSB slightly smaller) (.PNG). I'll elaborate a bit on the structure of the issue before I get to the real point of my question since I need to clarify some of the things so please do not mark it as off-topic since it is not. I just have to explain the notion behind the script so that I may get to the issue itself. It definitely has embedded data into itself. The data seems to have been concealed by altering the enhanced LSB values eliminating the high-level bits for each pixel except for the last least significant bit. So all bytes are going to be 0 or 1 since 0 or 1 on a 256 values range won't give any visible color. Basically, a 0 stays at 0, and a 1 becomes maximum value, or 255. I've been analyzing this image in many different ways, but don't see anything odd beyond the utter lack of one value in any of the three color values (RGB) and the heightened presence of another value in 1/3 of the color values. Studying these and replacing bytes has given me nothing, however, and I am at a loss to whether this avenue is even worth pursuing.

Hence, I'm looking into developing a script in rather Python, PHP or C/C++ that would reverse the process and 'restore' the enhanced LSBs.

I've converted it to a 24-bit .BMP and tracking down the red curve from a chi-square steganalysis, it's certain that there is a steganographied data within the file.

enter image description here enter image description here

First, there is a little bit more than 8 vertical zones. Which means that there is hidden data little bit more than 8kB. One pixel can be used to hide three bits (one in the LSB of each RGB color tone). So we can hide (98x225)x3 bits. To get the number of kilobytes, we divide by 8 and by 1024: ((98x225)x3)/(8x1024). Well, that should be around 8.1 kilobytes. But that ain't the case here.

The analisys of the APPO and APP1 markers of a .JPG extension of the file also give some awkward outputs:

Start Offset: 0x00000000
*** Marker: SOI (xFFD8) ***
  OFFSET: 0x00000000

*** Marker: APP0 (xFFE0) ***
  OFFSET: 0x00000002
  length     = 16
  identifier = [JFIF]
  version    = [1.1]
  density    = 96 x 96 DPI (dots per inch)
  thumbnail  = 0 x 0

*** Marker: APP1 (xFFE1) ***
  OFFSET: 0x00000014
  length          = 58
  Identifier      = [Exif]
  Identifier TIFF = x[4D 4D 00 2A 00 00 00 08 ]
  Endian          = Motorola (big)
  TAG Mark x002A  = x[002A]

  EXIF IFD0 @ Absolute x[00000026]
    Dir Length = x[0003]
    [IFD0.x5110                          ] = 
    [IFD0.x5111                          ] = 0
    [IFD0.x5112                          ] = 0
    Offset to Next IFD = [00000000]

*** Marker: DQT (xFFDB) ***
  Define a Quantization Table.
  OFFSET: 0x00000050
  Table length = 67
  ----
  Precision=8 bits
  Destination ID=0 (Luminance)
    DQT, Row #0:   2   1   1   2   3   5   6   7 
    DQT, Row #1:   1   1   2   2   3   7   7   7 
    DQT, Row #2:   2   2   2   3   5   7   8   7 
    DQT, Row #3:   2   2   3   3   6  10  10   7 
    DQT, Row #4:   2   3   4   7   8  13  12   9 
    DQT, Row #5:   3   4   7   8  10  12  14  11 
    DQT, Row #6:   6   8   9  10  12  15  14  12 
    DQT, Row #7:   9  11  11  12  13  12  12  12 
    Approx quality factor = 94.02 (scaling=11.97 variance=1.37)

I'm nearly convinced that there is no encryption algorithm applied therefore no key implementation follows the concealment. My notion is that of coding a script that would shift the LSB values and return the originals. I've ran the file under several structure analyses, statistical attacks, BPCS,

The histogram of the image shows a specific color with an unusual spike to it. I've manipulated that as best I can to try and view any hidden data, but to no avail. Those are the histograms of the RGB values as follows:

enter image description here

Then there are the multiple IDAT chunks. But, I've put together a similar image by defining random color values at each pixel location, and I too wound up with several of these. So far, I've also found very little inside them. Even more interesting, is the way that color values are repeated in the image. It seems, that the frequency of reused colors could hold some clue. But, I have yet to fully understand that relationship, if one exists. Additionally, there is only a single column and a single row of pixels that do not possess a full value of 255 on their alpha channel. I've even interpreted the X, Y, A, R, G, and B values of every pixel in the image as ASCII, but wound up with nothing too legible. Even the green curve of the average of LSBs cannot tell us anything. There is no evident break. Here are several other histograms which show the weird curve of the blue value from the RGB:

enter image description here

But the red curve, the output of the chi-square analysis, shows some difference. It can see something that we cannot see. Statistical detection is more sensitive than our eyes, and I guess that was my final point. However, there is also a sort of latency in the red curve. Even without hidden data, it starts at maximum and stays like that for some time. It's close to a false positive. It looks like the LSB in the image and is very close to random, and the algorithm needs a large population (remember the analysis is done on an incrementing population of pixels) before reaching a threshold where it can decide that actually, they are not random after all, and the red curve starts to go down. The same sort of latency happens with hidden data. You hide 1 or 2 kb, but the red curve does not go down right after this amount of data. It waits a little bit, here respectively at around 1.3 kb and 2.6 kb. Here is a representation of the data types from a hex editor:

byte = 166
signed byte = -90
word = 40,358
signed word = -25,178
double word = 3,444,481,446
signed double word = -850,485,850
quad = 3,226,549,723,063,033,254
signed quad = 3,226,549,723,063,033,254
float = -216652384.
double = 5.51490063721e-093
word motorola = 42,653
double word motorola = 2,795,327,181
quad motorola = 12,005,838,827,773,085,484

Here's another spectrum to confirm the behavior of the blue (RGB) value.

enter image description here

Please note that I needed to go through all of this in order to clarify the situation and the programming matter that I'm in pursuit of. This by itself makes my question NOT off-topic so I'd be glad if it doesn't get marked as such. Thank you.

c++
python
image-processing
steganography
asked on Stack Overflow Sep 1, 2013 by Keeper • edited Sep 3, 2013 by Keeper

1 Answer

0

In case of an image with LSB enhancement applied, I cannot think of a way to reverse it back to its original state because there is no clue about the original values of RGBs. They are set to either 255 or 0 depending on their Least Significant Bit. The other option I see round here is if this is some sort of protocol to include quantum steganography.

Matlab and some steganalysis techniques could be the key to your issue though.

Here's a Java chi-square class for some statistical analysis:

private long[] pov = new long[256];
and three methods as

public double[] getExpected() {
        double[] result = new double[pov.length / 2];
        for (int i = 0; i < result.length; i++) {
            double avg = (pov[2 * i] + pov[2 * i + 1]) / 2;
            result[i] = avg;
        }
        return result;
}
public void incPov(int i) {
        pov[i]++;
}
public long[] getPov() {
        long[] result = new long[pov.length / 2];
        for (int i = 0; i < result.length; i++) {
            result[i] = pov[2 * i + 1];
        }
        return result;

or try with some bitwise shift operations as:

int pRGB = image.getRGB(x, y);
int alpha = (pRGB >> 24) & 0xFF;
int blue = (pRGB >> 16) & 0xFF;
int green = (pRGB >> 8) & 0xFF;
int red = pRGB & 0xFF;
answered on Stack Overflow Sep 22, 2013 by Lester Gifness

User contributions licensed under CC BY-SA 3.0