# Is there a Intel SIMD comparison function that returns 0 or 1 instead of 0 or 0xFFFFFFFF?

0

I'm currently using the intel SIMD function: `_mm_cmplt_ps( V1, V2 )`. The function returns a vector containing the results of each component test. Based on if V1 components are less than V2 components, example:

``````XMVECTOR Result;

Result.x = (V1.x < V2.x) ? 0xFFFFFFFF : 0;
Result.y = (V1.y < V2.y) ? 0xFFFFFFFF : 0;
Result.z = (V1.z < V2.z) ? 0xFFFFFFFF : 0;
Result.w = (V1.w < V2.w) ? 0xFFFFFFFF : 0;

return Result;
``````

However is there a function like this that returns 1 or 0 instead? A function that uses SIMD and no workarounds because it is supposed to be optimized + vectorized.

intel
sse
simd
intrinsics

1

You can write that function yourself. It’s only 2 instructions:

``````// 1.0 for lanes where a < b, zero otherwise
inline __m128 compareLessThan_01( __m128 a, __m128 b )
{
const __m128 cmp = _mm_cmplt_ps( a, b );
return _mm_and_ps( cmp, _mm_set1_ps( 1.0f ) );
}
``````

Here’s more generic version which returns either of the 2 values. It requires SSE 4.1 which is almost universally available by now with 97.94% of users, if you have to support SSE2-only, emulate with _mm_and_ps, _mm_andnot_ps, and _mm_or_ps.

``````// y for lanes where a < b, x otherwise
inline __m128 compareLessThan_xy( __m128 a, __m128 b, float x, float y )
{
const __m128 cmp = _mm_cmplt_ps( a, b );
return _mm_blendv_ps( _mm_set1_ps( x ), _mm_set1_ps( y ), cmp );
}
``````
0

The DirectXMath no-intrinsics version of `_mm_cmplt_ps` is actually:

``````    XMVECTORU32 Control = { { {
(V1.vector4_f32 < V2.vector4_f32) ? 0xFFFFFFFF : 0,
(V1.vector4_f32 < V2.vector4_f32) ? 0xFFFFFFFF : 0,
(V1.vector4_f32 < V2.vector4_f32) ? 0xFFFFFFFF : 0,
(V1.vector4_f32 < V2.vector4_f32) ? 0xFFFFFFFF : 0
} } };
return Control.v;
``````

`XMVECTOR` is the same as `__m128` which is 4 floats so it needs the alias to make sure it's writing integers.

I use `_mm_movemask_ps` for the "Control Register" version of DirectXMath functions. It just collects the top-most bit of each SIMD value.

``````int result = _mm_movemask_ps(_mm_cmplt_ps( V1, V2 ));
``````

The lower nibble of `result` will contain bit patterns. A 1 bit for each value that passes the test, and a 0 bit for each value that fails the test. This could be used to reconstruct 1 vs. 0.

User contributions licensed under CC BY-SA 3.0