Is there a Intel SIMD comparison function that returns 0 or 1 instead of 0 or 0xFFFFFFFF?

0

I'm currently using the intel SIMD function: _mm_cmplt_ps( V1, V2 ). The function returns a vector containing the results of each component test. Based on if V1 components are less than V2 components, example:

XMVECTOR Result;

Result.x = (V1.x < V2.x) ? 0xFFFFFFFF : 0;
Result.y = (V1.y < V2.y) ? 0xFFFFFFFF : 0;
Result.z = (V1.z < V2.z) ? 0xFFFFFFFF : 0;
Result.w = (V1.w < V2.w) ? 0xFFFFFFFF : 0;

return Result;

However is there a function like this that returns 1 or 0 instead? A function that uses SIMD and no workarounds because it is supposed to be optimized + vectorized.

intel
sse
simd
intrinsics
asked on Stack Overflow Jul 8, 2020 by Xardas110 • edited Jul 8, 2020 by chtz

2 Answers

1

You can write that function yourself. It’s only 2 instructions:

// 1.0 for lanes where a < b, zero otherwise
inline __m128 compareLessThan_01( __m128 a, __m128 b )
{
    const __m128 cmp = _mm_cmplt_ps( a, b );
    return _mm_and_ps( cmp, _mm_set1_ps( 1.0f ) );
}

Here’s more generic version which returns either of the 2 values. It requires SSE 4.1 which is almost universally available by now with 97.94% of users, if you have to support SSE2-only, emulate with _mm_and_ps, _mm_andnot_ps, and _mm_or_ps.

// y for lanes where a < b, x otherwise
inline __m128 compareLessThan_xy( __m128 a, __m128 b, float x, float y )
{
    const __m128 cmp = _mm_cmplt_ps( a, b );
    return _mm_blendv_ps( _mm_set1_ps( x ), _mm_set1_ps( y ), cmp );
}
answered on Stack Overflow Jul 9, 2020 by Soonts • edited Jul 9, 2020 by Soonts
0

The DirectXMath no-intrinsics version of _mm_cmplt_ps is actually:

    XMVECTORU32 Control = { { {
            (V1.vector4_f32[0] < V2.vector4_f32[0]) ? 0xFFFFFFFF : 0,
            (V1.vector4_f32[1] < V2.vector4_f32[1]) ? 0xFFFFFFFF : 0,
            (V1.vector4_f32[2] < V2.vector4_f32[2]) ? 0xFFFFFFFF : 0,
            (V1.vector4_f32[3] < V2.vector4_f32[3]) ? 0xFFFFFFFF : 0
        } } };
    return Control.v;

XMVECTOR is the same as __m128 which is 4 floats so it needs the alias to make sure it's writing integers.

I use _mm_movemask_ps for the "Control Register" version of DirectXMath functions. It just collects the top-most bit of each SIMD value.

int result = _mm_movemask_ps(_mm_cmplt_ps( V1, V2 ));

The lower nibble of result will contain bit patterns. A 1 bit for each value that passes the test, and a 0 bit for each value that fails the test. This could be used to reconstruct 1 vs. 0.

answered on Stack Overflow Jul 8, 2020 by Chuck Walbourn

User contributions licensed under CC BY-SA 3.0