Wirecode - simple TLV decoding

August 22, 2025

Wirecode is a library and wire format I’ve been working on for simple serialization and deserialization of data in C. It can be used as a single header library (provided the compiler supports the non-standard mandatory tail call optimization – in practice clang is the only compiler I’m aware of at the moment that does), or used with an object file of optimized assembly routines.

Aside from the library, the techniques are simple enough that they can be used on the fly in many projects.

I discovered this technique from upb, an experimental protobuf decoder. Protobuf has not traditionally been very well suited for use in C (and in other languages the generated interfaces often feel clunky to me) – upb seems to more or less solve this. I realised however, that most of the time where a wire format is needed, there is quite a limited set of things that need encoding. This prompted the creation of a minimal wire format which I could trivially include in projects for cheap serialization/deserialization. The format is heavily influenced by/based on protobuf, but I intend to experiment with it further to strike a good balance between performance, compactness (of encoded data) and simplicity. As such, it shouldn’t yet be considered stable!

My first use cases will probably be dumping compiler diagnostics and reading them back in to tools for visualisation. This is a good use case - it’s not specialized or important enough to warrant a custom format crafted for each use, but at the same time drops the cost of reading and writing the data, which is good for experimentation. Of course, I expect I’ll also find use for this as a network format.

A snapshot of the header file can be found here.

Wirecode format

Wirecode is a tag-length-value encoding. Tag-length-value encodings are composed of three things - a tag to indicate what a piece of data is, a length capturing the size of the (encoded) value, and the actual encoded value.

Many kinds of data have a fixed known length (integers, bytes), so instead of requiring an explicitly encoded length every time, we use a wire format field (as with protobuf), but which we bundle alongside the tag.

enum
{
    WC_WIRE_FORMAT_BYTE,
    WC_WIRE_FORMAT_I32,
    WC_WIRE_FORMAT_SUBMESSAGE,
};

WC_WIRE_FORMAT_BYTE and WC_WIRE_FORMAT_I32 have a predetermined length of the encoded value (1 byte and 4 bytes respectively). For other fields, or those with variable length, WC_WIRE_FORMAT_SUBMESSAGE can be used, requiring the tag/wire format to be followed by a 32-bit unsigned integer, little-endian encoded (to match modern hardware) length.

As with protobuf, having a small set of wire formats allows us to skip fields we don’t recognize for backward compatibility.

The tag and wire format are bundled into a single byte - we use 3 bits for the wire format, and 5 bits for the tag,

(tag << 3) | wire_format

meaning wire code supports decoding 32 fields in a single decoder, and up to 8 wire formats. This alone is sufficient for most use cases. Nonetheless, decoders can be layered, so a tree of decoders can be used if we want to represent more than 32 fields.

Shifting the tag 3 to the left has a payoff which is only clear in the assembly version - logically tag << 3 is 8 * tag, tag scaled by quadword size - this makes indexing into decoding structures easier and saves a multiplication / shift. It turns out protofbuf also does this, but I don’t know for what reason (the initial version of wirecode put the tag in the lower bits).

The “meaning” of encoding fields is captured purely by the decoding procedures. The wire format only describes how big the encoded data is. Users can provide their own decoding procedures should they wish to do something a bit different. For example, WC_WIRE_FORMAT_I32 can be used to pack two 16-bit big-endian encoded integers with the right encoding/decoding procedure.

Decoding

Encoding is always easier than decoding. The method for encoding wirecode is very similar to the decoding, so we focus on decoding.

The core of the decoding loop essentially consists of two parts

Extract the tag, and lookup the decoding procedure and wire format in a table.
If the tag is valid (has a decoding), check the wire format is correct, and jump to a decoding procedure for that tag.

To avoid procedure call overhead, the entire process makes use of tail calls, and state is passed as parameters which are guaranteed to be passed in registers.

This is (approximately) the main decoder:

int
wc__decode(struct wc_dvm *vm, void *out, wc_u64 data, const wc_u8 *pb, const wc_u8 *pe, wc_u64 dbits)
{
	if (pb == pe)
		wc_tail_call(wc__decode_complete(vm, out, data, pb, pe, dbits));

	wc_u8 tag = *pb >> 3;
	const struct wc_field_decoder *f = &vm->d->field[tag];

	if ((void *)f->decode == WC_NULL)    /* This is not a recognized tag */
		return WC_DECODE_ERR;

	if (f->lbyte != *pb)                 /* Wrong wire format */
		return WC_DECODE_ERR;

	if (dbits & (1ULL << tag))           /* Have we seen this tag before? */
		return WC_DECODE_ERR;

	wc_tail_call(f->decode(vm, out, f->data, pb, pe, dbits | (1ULL << tag)));
}

The vm structure holds the decoding table, and helps with decoding submessages. The name is suggestive - the processing here treats tags like opcodes, and submessges create new decoding “frames”, somewhat like a procedure call in an interpreter. Decoding can be thought of as an interpreter whose operations are determined by a table.

pb and pe point to the beginning and end of the message to be decoded. Various pieces of data are packed into the data field - for simple fields this will be a 32-bit memory offset from the out pointer, corresponding to where a field should be written once decoded. More complicated values, for example arrays, might have two offsets bundled in here - one for writing the length output, and one for writing the array base pointer. The interpretation of this data is left to the decoder.

dbits stores (as a bitfield) the tags which have been seen already. Unlike protobuf, wirecode doesn’t tolerate repeated fields. dbits is also checked in wc__decode_complete for missing tags - the decoding table contains a bitfield of required fields/tags. Limiting to 32 tags means we’re only using 32-bits of this field.

With this, the entire process of decoding can be handled by writing small and specialized decoders. A decoder for a simple byte looks like the following:

int
wc__decode_byte(struct wc_dvm *vm, void *out, wc_u64 data, const wc_u8 *pb, const wc_u8 *pe, wc_u64 dbits)
{
	wc_assert((*pb & 0x7) == WC_WIRE_FORMAT_BYTE, "wire-type should be WC_WIRE_FORMAT_BYTE");
	pb++;

	wc_u8 *pout = (wc_u8 *)out + WC_DATA_UPCKI32_0(data);

	if (pb == pe)
		return WC_DECODE_ERR;

	*pout = *pb++;

	wc_tail_call(wc__decode(vm, out, 0, pb, pe, dbits));
}

The library also has (encoders and) decoders for 32-bit integer values, submessages, and arrays of submessages.

Decoding arrays requires an arena for memory allocation, which can be provided by the library user.

Interface

The interface is a bit primitive at the moment, but will be given a procedural overhall at some point. At the moment, the tables need to be built by hand (with a few helper macros). Here is an example decoding table for a simple structure.

struct s
{
	uint8_t b[2];
	int32_t i32;
};

static const struct wc_decoder s_decoder = {
	.align  = alignof(struct s),
	.size   = sizeof(struct s),
	.field  = {
		[0] = {
			.decode         = &wc__decode_byte,
			.data           = WC_DATA_PCKI32(offsetof(struct s, b[0]), 0),
			.lbyte          = WC_LBYTE(0, WC_WIRE_FORMAT_BYTE)
		},
		[1] = {
			.decode         = &wc__decode_byte,
			.data           = WC_DATA_PCKI32(offsetof(struct s, b[1]), 0),
			.lbyte          = WC_LBYTE(1, WC_WIRE_FORMAT_BYTE)
		},
		[2] = {
			.decode         = &wc__decode_i32,
			.data           = WC_DATA_PCKI32(offsetof(struct s, i32), 0),
			.lbyte          = WC_LBYTE(2, WC_WIRE_FORMAT_4BYTE)
		},
	}
};

Entries in the decoding table can point to other decoding tables to decode substructures.

A buffer buf of length len can be decoded with

	struct s *v = wc_decode(&s_decoder, len, buf, &a);

where a is an arena to allocate the structure into. There is also an “in place” variant which can be used to deserialize directly into a block of memory (for example, a stack allocated structure). Passing a NULL pointer as the arena prevents any allocation (and will cause a decode error if any is needed).

Next steps

I want to see how this finds real use before enhancing it too much further. There are clearly a range of improvements and enhancements that could be made, but this is already quite a useful little library, and real world use will shape it best. I’m not considering it done just yet; once I’ve tried it in a few things (and made any tweaks), I’ll solidify the format.

This also shows it’s quite easy to create small custom formats for data (especially if the format doesn’t need to be optimized too much). The technique is probably more valuable than the library!