I'm trying to find the minimum of an array which has exactly 4 elements. Each element is a signed int type, but only non-negative numbers are used, and -1 is used to represent an invalid value.

The instructions generated for the 2nd version is using SSE which uses SIMD shuffle and compare correctly. I expected this to run faster on Broadwell and Skylake, but when microbenchmarking it, SIMD version runs slower around 3.5ns on Skylake and 2.7ns on Broadwell.

Could you help me explaining why?

```
int example(int* values) {
int min_value = 0x7FFFFFFF;
for (int n = 0; n < 4; ++n) {
if (values[n] != -1 &&
values[n] < min_value) {
min_value = values[n];
}
}
return min_value;
}
```

https://gcc.godbolt.org/z/iWC26S

```
int example(int* values) {
uint min_0 = (uint)values[0] < (uint)values[1] ? (uint)values[0] : (uint)values[1];
uint min_1 = (uint)values[2] < (uint)values[3] ? (uint)values[2] : (uint)values[3];
return min_0 < min_1 ? min_0 : min_1;
}
```

https://gcc.godbolt.org/z/b7JhNZ

My whole program https://gcc.godbolt.org/z/bUXBqd

User contributions licensed under CC BY-SA 3.0