SIMD min slower than normal scalar


I'm trying to find the minimum of an array which has exactly 4 elements. Each element is a signed int type, but only non-negative numbers are used, and -1 is used to represent an invalid value.

The instructions generated for the 2nd version is using SSE which uses SIMD shuffle and compare correctly. I expected this to run faster on Broadwell and Skylake, but when microbenchmarking it, SIMD version runs slower around 3.5ns on Skylake and 2.7ns on Broadwell.

Could you help me explaining why?

int example(int* values) {
  int min_value = 0x7FFFFFFF;
  for (int n = 0; n < 4; ++n) {
    if (values[n] != -1 &&
        values[n] < min_value) {
      min_value = values[n];
  return min_value;

int example(int* values) {
  uint min_0 = (uint)values[0] < (uint)values[1] ? (uint)values[0] : (uint)values[1];
  uint min_1 = (uint)values[2] < (uint)values[3] ? (uint)values[2] : (uint)values[3];

  return min_0 < min_1 ? min_0 : min_1;

My whole program

asked on Stack Overflow Jul 4, 2020 by daole • edited Jul 4, 2020 by daole

0 Answers

Nobody has answered this question yet.

User contributions licensed under CC BY-SA 3.0