SIMD min slower than normal scalar

0

I'm trying to find the minimum of an array which has exactly 4 elements. Each element is a signed int type, but only non-negative numbers are used, and -1 is used to represent an invalid value.

The instructions generated for the 2nd version is using SSE which uses SIMD shuffle and compare correctly. I expected this to run faster on Broadwell and Skylake, but when microbenchmarking it, SIMD version runs slower around 3.5ns on Skylake and 2.7ns on Broadwell.

Could you help me explaining why?

int example(int* values) {
  int min_value = 0x7FFFFFFF;
  for (int n = 0; n < 4; ++n) {
    if (values[n] != -1 &&
        values[n] < min_value) {
      min_value = values[n];
    }
  }
  return min_value;
}

https://gcc.godbolt.org/z/iWC26S

int example(int* values) {
  uint min_0 = (uint)values[0] < (uint)values[1] ? (uint)values[0] : (uint)values[1];
  uint min_1 = (uint)values[2] < (uint)values[3] ? (uint)values[2] : (uint)values[3];

  return min_0 < min_1 ? min_0 : min_1;
}

https://gcc.godbolt.org/z/b7JhNZ

My whole program https://gcc.godbolt.org/z/bUXBqd

c++
assembly
sse
simd
avx
asked on Stack Overflow Jul 4, 2020 by daole • edited Jul 4, 2020 by daole

0 Answers

Nobody has answered this question yet.


User contributions licensed under CC BY-SA 3.0