Right inverting movemask
May 20, 2024
Introduction
movemask
instructions take a vector register of values, and construct an integer where the ith bit is set precisely when the higher order byte of the ith lane is set.
For example, the pmovmskb
instruction (_mm_movemask_epi8
intrinsic) from the SSE2 instruction set takes a register of 16 bytes, and constructs an integer where the ith bit is set when the high-order bit of the ith byte is set.
For example:
|0xFF|0xFF|0x00|0x00|0x80|0x80|0x00|0x08|0x00|0x00|0x00|0x00|0xFF|0xFF|0xFF|0xFF|
1 1 0 0 1 1 0 0 0 0 0 0 1 1 1 1
=> 1100110000001111
Sometimes it is useful to go the other way around, and starting from an integer, construct a vector mask where the ith byte is populated when the ith bit is set.
1100110000001111
becomes
|0xFF|0xFF|0x00|0x00|0xFF|0xFF|0x00|0x00|0x00|0x00|0x00|0x00|0xFF|0xFF|0xFF|0xFF|
In other words, construct the right inverse of movemask (why the right inverse? if f
denotes this operation, then movemask . f = id
, or f
is the inverse of movemask
on the right; note in general that there is no g
with g . movemask = id
).
I had wondered how to do this before, and saw a brief description of how to do it in this article, from which I worked out how I’d do it. The article links to a stack overflow post which has other variations. This article aims to condense and explain this information, with a simplification or two thrown in as my own small contribution. These simplifications might come at a performance cost, or they might not - I’ve not measured.
Since this is really about developing techniques and know-how, lets just focus on right inverting pmovmskb
.
Right inverting pmovmskb
SSE comparison instructions produce masks across the lanes they compare - for bytewise comparisons, truth-y comparisons produce the mask 0xff
, while false-y comparisons produce a mask 0x00
.
Our example will work by checking the ith bit is set in byte i of an SSE register.
To do this, we want two registers, one with the correct byte from the 16-bit mask loaded in each lane (spread
), and one register with the correct bit set for that byte (select
).
And-ing thes two, and then comparing lanewise that this is equal select
produces the mask we want.
select
in the above can be gotten with
_mm_set_epi8(1U << 7, 1U << 6, 1U << 5, 1U << 4,
1U << 3, 1U << 2, 1U << 1, 1U << 0,
1U << 7, 1U << 6, 1U << 5, 1U << 4,
1U << 3, 1U << 2, 1U << 1, 1U << 0);
which is independent of the mask we’re computing.
For spread
, if our 16-bit mask is comprised of two bytes AB, we want the register to look like:
AAAAAAAABBBBBBBB
which we can obtain by shuffling bytes in the 16-bit mask with shufflin indices obtained with
_mm_set_epi8(1, 1, 1, 1, 1, 1, 1, 1,
0, 0, 0, 0, 0, 0, 0, 0);
once the mask has been moved into an SSE register (_mm_cvtsi32_si128
).
Putting this together, we get our inverse:
/*
* makemask.c
*
* Example of right inverting movemask_epi8 using SSE instructions.
*/
#include <emmintrin.h>
#include <tmmintrin.h>
#include <stdint.h>
__m128i makemask_epi8(uint16_t bits)
{
__m128i lower = _mm_cvtsi32_si128(bits);
__m128i spread_bytes = _mm_set_epi8(1, 1, 1, 1, 1, 1, 1, 1,
0, 0, 0, 0, 0, 0, 0, 0);
__m128i spread = _mm_shuffle_epi8(lower, spread_bytes);
__m128i select = _mm_set_epi8(1U << 7, 1U << 6, 1U << 5, 1U << 4,
1U << 3, 1U << 2, 1U << 1, 1U << 0,
1U << 7, 1U << 6, 1U << 5, 1U << 4,
1U << 3, 1U << 2, 1U << 1, 1U << 0);
__m128i set = _mm_and_si128(spread, select);
return _mm_cmpeq_epi8(set, select);
}
There aren’t that many masks to check, so let’s do a basic check of all possible input values.
#include <assert.h>
int
main()
{
/* Simple check of all possible values. */
for (uint16_t ix = 0; ix < (uint16_t)-1; ++ix)
{
uint16_t iy = _mm_movemask_epi8(makemask_epi8(ix));
assert(ix == iy);
}
return 0;
}