The System V x86-64 ABI
April 12, 2025
Sometimes its useful to understand how a procedure receives its arguments, and returns values it computes. This protocol, known as the Application Binary Interface (ABI) is usually standardized for a given operating system and architecture. This standardization allows libraries to be linked and/or loaded into a binary (mostly) without explicitly declaring how the procedures in that library need to be called; the ABI – if not specified – is assumed to adhere to the standard.
In practice, most Unixes share the same ABI (XNU included), known as the System V ABI. Today we unpack this ABI for x86-64. While the ABI is in principle fully specified here, the description is a) algorithmic in nature, making it more awkward to understand the main principles than necessary and b) elides clear treatment of unions and arrays. As part of the orion compiler project (specifically its backend), I’ve had to get much more concrete about the ABI, which required enough investigation and unpacking of generated assembly code that it warranted a (hopefully) more useful writeup.
In the end, the ABI is not that difficult, but can be a bit fiddly at first. We’ll work through it in stages,
- Passing primitive (integer, floating point) parameters.
- Simple structure packing - how to pass structure values.
- Unions - when there are multiple ways to see things.
- Some examples.
- A small classifier - code to figure out the above.
But first…
x86_64
architecture overview
x86_64
or AMD64
comes with 16 general purpose 64-bit (integer) registers (r1
-r15
), and 16 128-bit floating point registers (xmm0
- xmm15
).
The integer registers can be specified by different names to indicate they should be treated as bytes, words (16-bit integers) double words (32-bit integers) or quadword (64-bit) integers. For the most part we’ll just refer to the 64-bit variant.
The first eight integer registers also come with more convenient names:
rax = r0
rcx = r1
rdx = r2
rbx = r3
rsp = r4
rbp = r5
rsi = r6
rdi = r7
There is also an instruction point rip
, but this cannot be referenced directly.
The floating point registers can be used to store 32-bit IEEE 754 floating point values, and 64-bit IEEE 754 floating point values - the same register name is used for both, but the instructions vary according to the sizes they operate on.
Some of the general purpose registers are dedicated to managing a stack (rsp
, rbp
- although also see -fomit-frame-pointer
for e.g. gcc
), indeed “the stack” is more or less defined by where rsp
points.
Pointers are integers at the level of instructions, so are treated as 64-bit integer values.
The registers xmm0
-xmm16
(and indeed larger variants) can also be used for vector arguments (e.g. packed sequences of integers or floating point numbers).
We ignore vector arguments for now, although their treatment is not overly complicated (the standard documentation makes this less clear by describing everything in terms of quadwords, and therefore needing to reconstruct the rules for vector arguments in these terms).
We also ignore x87 floating point handling, and MMX registers.
Primitive argument passing
Let’s first start with how simple primitives are to be passed. While this is straightforward, many of the rules extend to more complicated cases. We also introduce some notation to make later examples easier to describe.
The primitives are
i8
,i16
,i32
,i64
- the integer primitives of sizes 8-bits, 16-bits, 32-bits and 64-bits respectively. Signedness is either a) irrelevant because x86-64 is twos complement, or handled at the level of instructions.f32
andf64
- IEEE 754 floating point values of sizes 32-bits and 64-bits respectively (in other wordsfloat
anddouble
).
The System V ABI uses a sequence of 6 integer registers to pass integer arugments, and 6 floating point registers for floating point arguments.
In addition, the bottom byte of rax
(known as al
) is used to indicate how many floating point registers were used (or more technically, is an upper bound on the number of floating point registers used).
These sequences are
rdi
,rsi
,rcx
,rdx
,r8
,r9
for integersxmm0
-xmm7
for floating point arguments
The basic rule for passing a parameter is, working left to righ through the parameters, assign the next available register for the type of parameter, if one is available, otherwise, pass the argument on the stack.
For example, if
f : (x : i32, y : i64) -> ...
then we pass x
and y
by putting x
in rdi
, and y
in rsi
.
Here are some more examples
// Example 1
(x : f32, y : f32) -> ...
x -> xmm0
y -> xmm1
2 -> al
// Example 2
(x : i32, y : f32) -> ...
x -> rdi
y -> xmm0
1 -> al
// Example 3
(a : i8, b : i8, c : i8, d : i8, e : i8, f : i8, g : i8, h : f32, i : i8) -> ...
a -> rdi
b -> rsi
c -> rcx
d -> rdx
e -> r8
f -> r9
g -> stack
h -> xmm0
i -> stack
1 -> al
What about return values?
Integer return values are placed in rax
, and floating point values are placed in xmm0
.
How are arguments passed on the stack?
As well as passing arugments into a procedure, it’s also important to be able to locate values when inside a procedure (i.e. receiving is as important as sending). It’s clear where the arguments are when they’re placed in registers, but stack arguments need a bit more detail.
The call
instruction in x86_64 requires rsp
to be aligned to a 16-byte boundary when call
is issued.
It then pushes the return address onto the stack (the address of the first byte after the call
instruction)
This means the aligned top of the stack from the last call frame is at rsp + 8
(the stack grows downwards) after the call is made.
If a
and b
are arguments (in that order), and are to be pushed onto the stack, a
will be located at rsp + 8
, and b
at rsp + 16
.
All arguments are given a full quadword, regardless of size (or more generally, are quadword aligned).
Simple structure packing
The most common mental model for passing structure values as arguments, is that structures are placed on the stack (indeed, this is a common model even for primitive arguments, and is true of some ABIs).
This is true for the System V x86_64 ABI for structures which exceed 16-bytes in size1 – they are passed on the stack in the next quadword aligned chunk (relative to preceeding arguments also passed on the stack).
Smaller structures (not exceeding 16-bytes) however can be packed into registers. The basic idea is to flatten out the structure into a sequence of primitives, group them into quadword sized chunks, and then allocate the quadwords as if they were primitives. If there aren’t enough registers for all (both) quadwords in a structure, spill the entire structure to the stack.
The grouping needs a little more explanation. The reference gives a lot more detail and covers a broader class of primitives, but the key points are this
- A primitive is classified in the same way as a primitive argument, e.g. an integer primitive is considered to be an integer “group” and a floating point primitive a floating point “group”.
- A floating point group grouped with a floating point primitive is a floating point group.
- A floating point group grouped with an integer, or an integer group grouped with a floating point primitive remains an integer group.
- An integer group grouped with an integer primitive remains an integer group.
In other words, integer “groups” are stronger than floating point “groups”.
Some examples probably best illustrate this.
Examples
p :: (x : struct { a: i64, b: i64 }) -> ...
For a procedure p
with this signature, we’d place a
in one integer register, and b
in a second integer register.
So a
would be passed in rdi
, and b
in rsi
.
p' :: (x : struct { a: i32, b: i32 }) -> ...
For p'
, our argument x
consists of a structure of two primitives, i32
, i32
.
i32
(a
) is classified as an integer, and b
is in the same quadword, so combines to form an integer group.
So the argument x
is copied into a single integer register rdi
.
Note that if this structure is in memory, this is a single 64-bit load to bind this argument.
p'' :: (x : struct { a: f32, b: f32 }) -> ...
p''
is similar to p'
, except our arguments are primitives are floating point - the entire argument x
is passed in xmm0
by similar reasoning.
p''' :: (x : struct { a: f32, b: i16, c: i16, d: f32 }) -> ...
p'''
serves to illustrate the most interesting points.
Let’s break the structure down
{
// quadword 1
a : f32
b : i16
c : i16
// quadword 2
d : f32
}
Quadword 2 is straightforward and is passed in a floating point register.
What about quadword 1?
Let’s classify it.
a
is floating point, but then is grouped with b
which is an integer, forming an integer group.
c
is also integral, so combines into the integer group.
x
therefore is passed in a single integer register, and a single floating point register, i.e. is split across rdi
and xmm0
.
We haven’t described arrays (and they aren’t explicitly described in the reference), but these are handled by considering them as a sequence of primitives. By way of example
p'''' :: (x : struct { a: i32, b: [2]f32 } -> ...
passes x
in rdi
and xmm0
, with the array b
split across the two registers (b[0]
in rdi
, and b[1]
in xmm0
).
This is clear from the flattened structure
{
// quadword 1
a : i32
b[0] : f32
// quadword 2
b[1] : f32
}
The only thing left to describe is unions, but first let’s introduce a little notation, and give a bit more detail on returning structure values.
A bit of notation
At this point it’s useful for us to reduce the problem to the most interesting part. In general, if there aren’t enough registers for a value to pass or receive a value completely in registers, then that value is passed on the stack. Therefore, the only thing we really need to describe is how a value is passed in registers, assuming those registers are available. Indeed, the only thing we need is the classification of each quadword register that’s used.
We’ll denote
- integer registers by I
- floating point registers by F
and for completeness
- values on the stack by M
For example, we had an example above
x : struct { a: i32, b: i32 }
which was passed in a single integer register.
We’ll classify x
as having type I.
Our more complicated example:
x' : struct { a: f32, b: i16, c: i16, d: f32 }
had classification IF.
Returning structures
For small (16-byte or less) structures, the same classification for arguments is used.
The integer return registers are (allocated in order) rax
, and rdx
, and the floating point registers xmm0
, xmm1
.
For structures exceeding 16-bytes, the return structure is allocated on the stack (before any arguments).
As part of calling the procedure, a pointer to this return value is passed in as if it were the first argument, i.e. in rdi
(shunting all other arguments along).
In addition, the procedure places this pointer in rax
on return.
A reasonable question is, if the return value is allocated space on the stack (and we know where), why pass in the pointer to it? Placing the return value on the stack is just a useful mental model (and indeed what a compiler might do on a first pass). If the return value is to be assigned to a (local or global) variable however, and not used anywhere else, we can avoid a copy by just passing in the variables address as the first argument. In other words, taking a pointer argument for the return address lets us elide a copy.
Packing unions
The only argument kind left to discuss is unions. These are not really treated in the reference explicitly, although examining C compiler output confirms a fairly predictable handling.
The idea for unions is that each union variant is classified individually, and these are then combined using the same rules as for combining groups/primitives for structures. As a trivial example
union
{
x : f32;
y : i32;
};
would classify x
as floating point, and y
as integer, so an argument of this union type would be classified as an integer register argment, i.e. I.
As with structures, a pre-condition for packing unions into registers is that all of the primitives in all of the variants (once flattened out) are naturally aligned. One of the side-effects of this constraint is that no primitive can span a quadword boundary, so we can classify arguments in quadword chunks.
As a more complicated example, consider
union
{
x : struct {
a : f64;
a : i64;
};
y : [4]f32;
};
Variant x
classifies as FI, while y
classifies as FF.
The union is therefore passed as FI.
Examples
These rules are further clarified by analysing plenty of examples. I’ve elided the names of members/variants in the below since they’re irrelevant for classification.
Type | Classification |
---|---|
struct { i32, i32 } |
I |
struct { i32, i32, i32 } |
II |
struct { i32, i32, f32 } |
IF |
struct { i32, f32 } |
I |
struct { i32, struct { i32, f32 } } |
IF |
union { i32, f32 } |
I |
f32[2] |
F |
f32[5] |
M |
struct { i32, f32[2] } |
IF |
struct { f32, f32, i32 } |
FI |
struct { f32, union { f32[2], i32[2] } } |
II |
struct { f32, union { f32[2], i32 } } |
IF |
union { f32[3], i16[3] } |
IF |
union { f32[3], i16[5] } |
II |
A classifier
Once an argument / return type can be classified, registers can be allocated as per the ABI, so we’ve focus on classifying a single argument in isolation. Let’s have some code to compute this classification.
This will be in C, rather than the pseudocode I’ve been using up to this point to describe interfaces. While this is adapted from my compiler, some of the procedures have been replaced to make the implementation a little clearer, and remove dependencies on compiler infrastructure that wouldn’t benefit the explanation here. These procedures lack defintions, but it’s hopefully clear what they do.
As above, ee’ll also elide vector types - the complete set of primitives for our purposes are i8
, i16
, i32
and i64
for integer types, and f32
and f64
for floating point types.
Extending to vector types need not be difficult, but it reduces clarity.
First some data types to represent the result of this classification:
enum
{
ABI_SYSV_CLASS_NO_CLASS,
ABI_SYSV_CLASS_SSE,
ABI_SYSV_CLASS_INTEGER,
ABI_SYSV_CLASS_MEMORY,
};
struct abi_sysv_class
{
uint8_t class[2];
};
The enum represents the variants which describe a 2-quadword sequence for an argument.
ABI_SYSV_CLASS_SSE
is another way of saying floating point, so
(struct abi_sysv_class){
.class[0] = ABI_SYSV_CLASS_SSE,
.class[1] = ABI_SYSV_CLASS_INTEGER,
}
encodes an FI argument.
An argument to be passed on the stack (i.e. M) is described by
(struct abi_sysv_class){
.class[0] = ABI_SYSV_CLASS_MEMORY,
.class[1] = ABI_SYSV_CLASS_NO_CLASS,
}
To help with merging groups of primitives, we define a small helper:
static inline uint8_t
abi__sysv_merge(uint8_t a, uint8_t b)
{
return (a > b ? a : b);
}
The uint8_t
values in here are expected to be values from the ABI_SYSV_CLASS
enumeration.
Now we get to the classifier.
We’ll examine the details afterward, but the core idea is to iterate along each of the primitives, group them, and merge the groups.
Classifying primitives (integers, floating point) is straightforward, so we handle those first.
For more complex types, proceed in two stages, 1) find the next primitive n
, and then 2) merge n
into the current classification state.
Since structures unions and arrays can all nest, we must sometimes descend multiple layers to find the next primitive, and in the case of unions, rewind to the beginning to work on the next variant. To keep track of this work, we use a small stack, saving progress at each level so we can resume when subtypes have been handled. In this case, the stack is statically sized, and the procedure panics if the stack is filled - this can happen on deeply nested structs/unions, but is probably so rare in practice that this is a reasonable way to handle it. A dynamic array could be used if you really want to handle this case.
#define QUADWORD_SIZE 8
typedef uint32_t typeid;
static struct abi_sysv_class
abi_sysv_classify(typeid t)
{
struct abi_sysv_class c = { 0 };
static uint8_t prim_class[] = {
[TYPE_VOID] = ABI_SYSV_CLASS_NO_CLASS,
[TYPE_I8] = ABI_SYSV_CLASS_INTEGER,
[TYPE_I16] = ABI_SYSV_CLASS_INTEGER,
[TYPE_I32] = ABI_SYSV_CLASS_INTEGER,
[TYPE_I64] = ABI_SYSV_CLASS_INTEGER,
[TYPE_F32] = ABI_SYSV_CLASS_SSE,
[TYPE_F64] = ABI_SYSV_CLASS_SSE,
};
if (type_is_primitive(t))
{
assert(t < static_array_length(prim_class));
c.class[0] = prim_class[t];
return c;
}
struct type_info type_info = type_read_info(t);
if (type_info.size > 2 * QUADWORD_SIZE || type_info.align > QUADWORD_SIZE)
{
c.class[0] = ABI_SYSV_CLASS_MEMORY;
return c;
}
uint8_t used = 0;
struct item
{
typeid type;
uint32_t done;
uint8_t base;
};
int sp = 0;
struct item stack[16];
stack[sp].type = t;
stack[sp].done = 0;
stack[sp].base = 0;
sp++;
while (sp > 0)
{
struct item *next = &stack[sp - 1];
struct type_info ti = type_read_info(next->type);
typeid n = TYPE_VOID;
if (ti.kind == TYPE_STRUCT)
{
struct type_list members = type_read_struct(next->type);
assert(members.count <= 2 * QUADWORD_SIZE);
if (next->done == members.count)
{
sp--;
continue;
}
n = members.type[next->done++];
}
else if (ti.kind == TYPE_UNION)
{
struct type_list variants = type_read_union(next->type);
if (next->done == variants.count)
{
sp--;
continue;
}
/* Consider the next variant - rewind to the beginning of where
* we started looking at the union, and consider the next variant */
used = next->base;
n = variants.type[next->done++];
}
else if (ti.kind == TYPE_ARRAY)
{
uint64_t count = type_read_array_length(next->type);
assert(count <= 2 * QUADWORD_SIZE);
if (next->done == count)
{
sp--;
continue;
}
next->done++;
n = type_read_array_elements(next->type);
}
else
{
INVALID_BRANCH;
}
assert(n != AMR_TYPE_VOID);
/* Next to emit is n - align used to n's alignment */
struct type_info n_info = type_read_info(n);
assert(n_info.align <= QUADWORD_SIZE);
used = (used + (n_info.align - 1)) & ~(n_info.align - 1);
if (!type_is_primitive(n))
{
panic_on(sp == (int)static_array_length(stack),
"overly nested structure for register packing");
stack[sp].type = n;
stack[sp].done = 0;
stack[sp].base = used;
sp++;
continue;
}
/* n is the next primitive to merge. */
assert(type_is_primitive(n));
assert(n_info.size <= QUADWORD_SIZE);
assert(used <= 2 * QUADWORD_SIZE - n_info.size);
if (used < QUADWORD_SIZE)
{
c.class[0] = abi__sysv_merge(c.class[0], prim_class[n]);
}
else
{
c.class[1] = abi__sysv_merge(c.class[1], prim_class[n]);
}
used += info.size;
}
return c;
}
-
Structures containing vector arguments may still be passed in registers despite exceeding 16-bytes. Structures of less than 16-bytes containing unaligned members will be passed on the stack. ↩︎