I'm currently using the intel SIMD function: `_mm_cmplt_ps( V1, V2 )`

.
The function returns a vector containing the results of each component test. Based on if V1 components are less than V2 components, example:

```
XMVECTOR Result;
Result.x = (V1.x < V2.x) ? 0xFFFFFFFF : 0;
Result.y = (V1.y < V2.y) ? 0xFFFFFFFF : 0;
Result.z = (V1.z < V2.z) ? 0xFFFFFFFF : 0;
Result.w = (V1.w < V2.w) ? 0xFFFFFFFF : 0;
return Result;
```

However is there a function like this that returns 1 or 0 instead? A function that uses SIMD and no workarounds because it is supposed to be optimized + vectorized.

You can write that function yourself. It’s only 2 instructions:

```
// 1.0 for lanes where a < b, zero otherwise
inline __m128 compareLessThan_01( __m128 a, __m128 b )
{
const __m128 cmp = _mm_cmplt_ps( a, b );
return _mm_and_ps( cmp, _mm_set1_ps( 1.0f ) );
}
```

Here’s more generic version which returns either of the 2 values. It requires SSE 4.1 which is almost universally available by now with 97.94% of users, if you have to support SSE2-only, emulate with _mm_and_ps, _mm_andnot_ps, and _mm_or_ps.

```
// y for lanes where a < b, x otherwise
inline __m128 compareLessThan_xy( __m128 a, __m128 b, float x, float y )
{
const __m128 cmp = _mm_cmplt_ps( a, b );
return _mm_blendv_ps( _mm_set1_ps( x ), _mm_set1_ps( y ), cmp );
}
```

The DirectXMath no-intrinsics version of `_mm_cmplt_ps`

is actually:

```
XMVECTORU32 Control = { { {
(V1.vector4_f32[0] < V2.vector4_f32[0]) ? 0xFFFFFFFF : 0,
(V1.vector4_f32[1] < V2.vector4_f32[1]) ? 0xFFFFFFFF : 0,
(V1.vector4_f32[2] < V2.vector4_f32[2]) ? 0xFFFFFFFF : 0,
(V1.vector4_f32[3] < V2.vector4_f32[3]) ? 0xFFFFFFFF : 0
} } };
return Control.v;
```

`XMVECTOR`

is the same as `__m128`

which is 4 floats so it needs the alias to make sure it's writing integers.

I use `_mm_movemask_ps`

for the "Control Register" version of DirectXMath functions. It just collects the top-most bit of each SIMD value.

```
int result = _mm_movemask_ps(_mm_cmplt_ps( V1, V2 ));
```

The lower nibble of `result`

will contain bit patterns. A 1 bit for each value that passes the test, and a 0 bit for each value that fails the test. This could be used to reconstruct 1 vs. 0.

answered on Stack Overflow Jul 8, 2020 by Chuck Walbourn

User contributions licensed under CC BY-SA 3.0