Migrating from XMM to YMM

1

Consider:

movdqa xmm0, xmmword ptr [rcx]
movdqa xmm1, xmmword ptr [rcx + 16]
movdqa xmm2, xmmword ptr [rcx + 32]
movdqa xmm3, xmmword ptr [rcx + 48]

The above code works fine. rcx is an address of the first element of an array of 32-bit uints.

However, when trying to use ymm in a similiar fashion:

vmovdqa ymm0, ymmword ptr [rcx]
vmovdqa ymm1, ymmword ptr [rcx + 32]

The code randomly throws sigsegvs: Exception thrown at 0x00007FF95ACC102C (Asm.dll) in Asm.exe: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF..

Why, how can I make it work?

My CPU is i5-10210u (supports AVX-256). Running in X64 Release/Debug.

assembly
masm
sse
simd
avx
asked on Stack Overflow Nov 4, 2020 by weno

2 Answers

4

Is rcx aligned to 32 bytes? movdqa xmm, m128 requires 16 byte alignment but vmovdqa ymm, m256 requires 32 byte alignment, so if you just port the code to AVX2 without increasing the alignment, it won't work.

Either increase the alignment to 32 byte or use vmovdqu to sidestep all alignment issues instead. Contrary to SSE instructions, memory operands to AVX instructions generally do not have alignment requirements (vmovdqa is one of the few exceptions). It is still a good idea to align your input data if possible as memory accesses crossing cache lines incur extra penalties.

answered on Stack Overflow Nov 11, 2020 by fuz
1

Your memory is aligned on a 64 byte boundary for 256-bit AVX operations?

answered on Stack Overflow Nov 11, 2020 by Ron

User contributions licensed under CC BY-SA 3.0