I am trying to store an XMM register into a certain location such as address line 4534342.
Example:
What I have done?
I know that the xmm registers contain 128 bit values. So, my program has already produced allocated 16 bytes of memory. In addition, since the registers are aligned data, I have allocated 31 bytes and then found an aligned address within it. That should prevent any exceptions from being thrown.
What I am trying to do visually?
Mem Adr | Contents (binary)
4534342 | 0000 0000 0000 0000 ; I want to pass in address 4534342 and the
4534346 | 0000 0000 0000 0000 ; the program inline-assembly will store it
4534348 | 0000 0000 0000 0000 ; its contents straight down to address 45-
4534350 | 0000 0000 0000 0000 ; 34350
4534352 | 0000 0000 0000 0000
4534354 | 0000 0000 0000 0000
4534356 | 0000 0000 0000 0000
4534358 | 0000 0000 0000 0000
Setup
cyg_uint8 *start_value; //pointer to the first value of the allocated block
cyg_uint32 end_address; //aligned address location value
cyg_uint32 *aligned_value; //pointer to the value at the end_address
start_value = xmm_container; //get the pointer to the allocated block
end_address = (((unsigned int) start_value) + 0x0000001f) & 0xFFFFFFF8; //find aligned memory
aligned_value = ((cyg_uint32*)end_address); //create a pointer to get the first value of the block
Debug statements BEFORE assembly call to ensure function
printf("aligned_value: %d\n", (cyg_uint32) aligned_value);
printf("*aligned_value: %d\n", *aligned_value);
Assembly Call
__asm__("movdqa %%xmm0, %0\n" : "=m"(*aligned_value)); //assembly call
Debug statements AFTER assembly call to ensure function
printf("aligned_value: %d\n", (cyg_uint32) aligned_value);
printf("*aligned_value: %d\n", *aligned_value);
The output from printf [FAILURE]
aligned_value: 1661836 //Looks good!
*aligned_value: 0 //Looks good!
aligned_value: -1 //Looks wrong :(
//then program gets stuck
Basically, am I doing this process correctly? Why do you think it is getting stuck?
Thank you for your time and effort.
I don't think your alignment logic is correct if you want a 16-byte aligned address.
Just do the math, it's easy!:
(0 + 0x1f) & 0xFFFFFFF8 = 0x18 ; 0x18-0=0x18 unused bytes, 0x1F-0x18=7 bytes left
(1 + 0x1f) & 0xFFFFFFF8 = 0x20 ; 0x20-1=0x1F unused bytes, 0x1F-0x1F=0 bytes left
...
(8 + 0x1f) & 0xFFFFFFF8 = 0x20 ; 0x20-8=0x18 unused bytes, 0x1F-0x18=7 bytes left
(9 + 0x1f) & 0xFFFFFFF8 = 0x28 ; 0x28-9=0x1F unused bytes, 0x1F-0x1F=0 bytes left
...
(0xF + 0x1f) & 0xFFFFFFF8 = 0x28 ; 0x28-0xF=0x19 unused bytes, 0x1F-0x19=6 bytes left
(0x10 + 0x1f) & 0xFFFFFFF8 = 0x28 ; 0x28-0x10=0x18 unused bytes, 0x1F-0x18=7 bytes left
(0x11 + 0x1f) & 0xFFFFFFF8 = 0x30 ; 0x30-0x11=0x1F unused bytes, 0x1F-0x1F=0 bytes left
...
(0x18 + 0x1f) & 0xFFFFFFF8 = 0x30 ; 0x30-0x18=0x18 unused bytes, 0x1F-0x18=7 bytes left
(0x19 + 0x1f) & 0xFFFFFFF8 = 0x38 ; 0x38-0x19=0x1F unused bytes, 0x1F-0x1F=0 bytes left
...
(0x1F + 0x1f) & 0xFFFFFFF8 = 0x38 ; 0x38-0x1F=0x19 unused bytes, 0x1F-0x19=6 bytes left
First, to get all zeroes in the 4 least significant bits the mask should be 0xFFFFFFF0.
Next, your overflowing the 31-byte buffer if you calculate the aligned address in this way. Your math leaves you with 0 to 7 bytes of space, which isn't sufficient to store 16 bytes.
For correct 16-byte alignment you should write this:
end_address = (((unsigned int)start_value) + 0xF) & 0xFFFFFFF0;
User contributions licensed under CC BY-SA 3.0