Packing small sparse arrays for extension data
November 15, 2025
This post documents a small technique I invented (in the sense that I
haven’t seen if before, although I wouldn’t be surprised if there is
prior art) for packing small sparse arrays into dense arrays. It’s
part of the class of techniques using bit operations (on x86_64 for
example tzcnt, popcnt) not natively supported in C/C++ and which
require use of compiler builtins or intrinsics.
Motivating example
The problem that got me thinking about this involved adding attributes to procedures/global data in my compiler. Most procedures don’t have any special attributes attached to them,
f :: (x : i32) -> i32
x : i32 = 42;
for which defaults should be assumed. Others may have a variety of optional attributes which impact code generation in a variety of ways.
@inline
@section(".text.procs")
f :: (x : i32) -> i32
@align(64)
y : i32 = -42;
Some of these attributes are effectively booleans (@inline), while
others have additional data specified (@section(".text.procs"),
align(64)).
Procedures/global data are described by fixed size structures, so attributes need to either all be bundled into that structure (while mostly unused), or there needs to be a reference to lookup these attributes when present.
It might be completely reasonable in practice to keep all of this data right there in the structure (depending on the dominant access pattern), but I was contemplating the approaches that could be taken to keep it aside (assuming its access is cold), and not use more storage than really necessary.
The most basic approach might be to have an array of attributes attached to each procedure (or even a linked list if there is merit to each attribute occupying different amounts of space), and then iterating these attributes when they need looking up (there are unlikely to be many attributes). To save hitting this memory for attributes that aren’t there, flags can be set in the core structure to indicate which attributes are present. First check the flags, and then search the array for the correct attribute and data.
Note that using flags has the bonus of being ideal for storing boolean
attributes (like @inline).
We’ll be pursuing the flags approach much further - for the purposes of this writeup, assume flags are stored in a 32-bit integer. This can be extended quite naturally.
However, it would be nice to jump straight to a piece of attribute data, without having to search an array for the correct piece. This also has the advantage that the kind of attribute doesn’t need to be stored with the data (in other words, attributes don’t need to be tagged).
One way to do this is to make the array sparse
struct attr_data attrs[32];
Where attrs[i] contains the data corresponding to the ith flag set
in the attributes flags. This allows lookup without needing to
iterate the array (and so fewer data dependencies), but wastes space
on boolean attributes which are never used. It’s also all or nothing;
the entire array is needed whenever any attribute data needs to be
tracked.
Flag indexing
We continue with the approach of using flags to track presence of attributes.
For the moment, let’s assume all attributes have associated data we
need to keep. For a set of flags flags, if some number n of bits
are set, it would be ideal if we could allocate an array of n pieces
of attribute data, where the ith piece corresponds to the ith set
bit in flags.
The key to achieving this is knowing there are instructions which count set bits on most modern architectures.
The following procedure helps us do the job.
#define popcount32 __builtin_popcount
int
flag_idx(uint32_t flags, uint32_t pick)
{
assert((pick & (pick - 1)) == 0); /* Pick is a single set bit */
assert(flags & pick); /* pick is set in flags */
uint32_t mask = pick + (pick - 1);
return popcount32(flags & mask) - 1;
}
flags is our collection of bits indicating which procedures are
present, and pick is an individual one of those flags which is
set. If pick is the ith set bit, this procedure returns i.
A few assertions provide some examples of what to expect, and make everything more concrete.
{
uint32_t flags = 1 << 0 | 1 << 3 | 1 << 5;
assert(flag_idx(flags, 1 << 0) == 0);
assert(flag_idx(flags, 1 << 3) == 1);
assert(flag_idx(flags, 1 << 5) == 2);
assert(flag_idx(flags | 1 << 2, 1 << 5) == 3);
}
The core of flag_idx compiles to a sequence of very basic
instructions (gcc will need -mpopcnt to avoid synthesizing the
operation popcnt instruction):
lea -0x1(%rsi,%rsi,1),%eax
and %edi,%eax
popcnt %eax,%eax
sub $0x1,%eax
ret
Better still, it’s likely the compiler will inline flag_idx and
reduce the above assembly even futher when static values are used.
We can use this to store atttribute data packed in an array. If bit
i is set in flags, then the data corresponding to bit i will be
found in
attr_data[flag_idx(flags, 1 << i)]
We started with the assumption that all attributes had data associated
with them. We can now remove that assumption by masking off only those
flags which have data associated. For example if DATA_MASK is a mask
of a bits whose attributes have data, we can locate the data by
adapting the above to
attr_data[flag_idx(flags & DATA_MASK, 1 << i)]
A worked example of lookup
This example shows a mocked up version of the motivating example. In real life this would be encoded differently – it’s just intended to show the idea. There are various ways the attribute data could be stored; here we use an index into a global array to indicate where to start looking for attributes, if there are any.
We start by defining the data types and tables, as well as the flags
we use to mark presence of the different attributes (whether the
procedure should be force inlined, whether it has a non-default
section, and whether it has a non-default ABI). Only two of these have
any associated data which is captured by ATTR_DATA_MASK.
struct global {
const char *name;
uint32_t attr_start_idx;
uint32_t flags;
};
struct attr {
union {
const char *section;
const char *abi;
};
};
#define ATTRS_MAX 64
#define GLOBALS_MAX 32
struct global globals[GLOBALS_MAX] = { 0 };
struct attr attrs[ATTRS_MAX] = { 0 };
int global_count = 0;
int attr_count = 0;
enum {
FLAG_FORCE_INLINE,
FLAG_SECTION,
FLAG_ABI,
};
static const uint32_t ATTR_DATA_MASK = FLAG_SECTION | FLAG_ABI;
Now lets put some examples in the tables (encoded by hand).
/* Example illustrating the default case - no attributes. */
{
struct global *g = &globals[global_count++];
g->name = "basic";
}
/* Two attributes. */
{
struct global *g = &globals[global_count++];
g->name = "interrupt_handler";
g->flags = FLAG_ABI|FLAG_SECTION;
g->attr_start_idx = attr_count;
attrs[attr_count++].section = ".text.interrupt";
attrs[attr_count++].abi = "interrupt abi";
}
/* No attributes with data. */
{
struct global *g = &globals[global_count++];
g->name = "inline";
g->flags = FLAG_FORCE_INLINE;
}
/* Single attribute with data. */
{
struct global *g = &globals[global_count++];
g->name = "custom_abi";
g->flags = FLAG_ABI;
g->attr_start_idx = attr_count;
attrs[attr_count++].abi = "custom abi";
}
Finally we loop over the globals, and show how to retrieve data from
the attributes (here we just look at the ABI). Note in particular that
the “interrupt handler”, and the “custom abi” globals will have
the abi data stored at a differed offset from the start index in the
attribute array (1 and 0 respectively).
printf("globals: %d, attrs: %d\n", global_count, attr_count);
for (int i = 0; i < global_count; ++i)
{
const struct global *g = &globals[i];
printf("%s: ", g->name);
if (g->flags & FLAG_ABI)
{
uint32_t attr_idx = g->attr_start_idx + flag_idx(g->flags & ATTR_DATA_MASK, FLAG_ABI);
printf("%s\n", attrs[attr_idx].abi);
}
else
{
printf("default abi\n");
}
}
The following is the output:
globals: 4, attrs: 3
basic: default abi
interrupt_handler: interrupt abi
inline: default abi
custom_abi: custom abi
Note that we’ve only used three attribute slots – exactly the number of attributes with associated data – and that whatever the number of attributes used for a global, only 64-bits is consumed in the main structure. We only look in the attributes array when there is data actually present there that we need to read.
It’s also worth noting that the “type” of the attribute is determined precisely by the set bits in flags.
Addendum: Packing/unpacking of small sparse arrays with flags
A useful utility for packing/unpacking a sparse array into a
packed/unpacked array uses tzcnt on x86_64, accessed through the
__builtin_ctz builtin.
#define ctz32 __builtin_ctz
int
indices_by_flags(uint32_t flags, uint8_t index[32])
{
int count = 0;
for (uint32_t live = flags; live != 0; live &= live - 1)
{
index[count++] = ctz32(live);
}
return count;
}
indices_by_flags returns the number of bits set in flags, and the
ith slot of the indices array is populated with the ith bit number.
The following example shows how this can be used
{
uint8_t indices[32];
int count = 0;
count = indices_by_flags((1 << 3) | (1 << 4), indices);
assert(count == 2);
assert(indices[0] == 3);
assert(indices[1] == 4);
/* Creating a packed version of an array. */
for (int i = 0; i < count; ++i)
{
packed[i] = unpacked[indices[i]];
}
/* Unpacking a packed array. */
for (int i = 0; i < count; ++i)
{
unpacked[indices[i]] = packed[i];
}
}
where packed and unpacked are suggestively named variants of some
array-of-structures type.