Inlinable pinned OpenBSD syscalls

March 8, 2025

OpenBSD has the capability of restricting permissible syscall origins addresses. Over the last few days there have been some interesting demos here and here, demonstrating how to do this with statically linked binaries. This is nice if you want to make use of this security feature (if you believe it has value) but don’t want to make use of libc 1.

The way this feature appears to work is that if your OpenBSD binary contains a section called .openbsd.syscalls, then OpenBSD will verify that every syscall comes from one of the sites listed in this section. The section itself is a list of address and syscall number pairs.

Wellons’ example is already great, and all that’s needed for practically, but I wanted to document how to make these syscall procedures inlinable, as a way of demonstrating an inline assembly feature I learned while studying/trying to recreate the SystemTap tracepoint macros. It’s debateable whether there is any real value inlining system calls - they are already expensive versus the procedure call overhead.

In any case, this allows us to let the compiler decide what to do.

Wellons version of write for example, is as follows:

__attribute((noinline))
long w(void *what, long len)
{
	char err;
	long rax = 4;  // SYS_write

	asm volatile (
		"_w: syscall"
		: "+a"(rax), "+d"(len), "=@ccc"(err)
		: "D"(1), "S"(what)
		: "rcx", "r11", "memory"
	);

	return err ? -rax : rax;
}

with an explicit registration of this syscall globally:

asm (
	".pushsection .openbsd.syscalls\n"
	".long  _w, 4\n"
	".popsection\n"
);

We can do this all in one using a feature of inline assembly that lets us generate a unique label for that particular use of the assembly block. This looks as follows:

long
w(void *what, long len)
{
        char err;
        long rax = 4;
        __asm__ volatile (
                "_w%=: syscall\n"
                ".pushsection .openbsd.syscalls\n"
                ".long _w%=, 4\n"
                ".popsection\n"
                : "+a"(rax), "+d"(len), "=@ccc"(err)
                : "D"(1), "S"(what)
                : "rcx", "r11", "memory");
        return err ? -rax : rax;
}

The %= directive tells the assembler to generate a unique lable for that snippet.

We can look at the labels generated with readelf. A small test program with two calls to write (along with a directive to force them to be inlined), contains the following in its symbol table:

   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS pinsyscall.c
     2: 000000000020143f     0 NOTYPE  LOCAL  DEFAULT    6 _w0
     3: 000000000020149a     0 NOTYPE  LOCAL  DEFAULT    6 _w1
     4: 00000000002014d0    17 FUNC    LOCAL  DEFAULT    6 exit
     5: 00000000002014df     0 NOTYPE  LOCAL  DEFAULT    6 _x2
     6: 00000000002024e8     8 OBJECT  LOCAL  HIDDEN     7 __retguard_1869
     7: 00000000002024f0     8 OBJECT  LOCAL  HIDDEN     7 __retguard_3534
     8: 00000000002024f8     8 OBJECT  LOCAL  HIDDEN     7 __retguard_1545
     9: 00000000002013d0    48 FUNC    GLOBAL DEFAULT    6 start
    10: 00000000002013e3     0 NOTYPE  GLOBAL DEFAULT    6 _start
    11: 0000000000201400   208 FUNC    GLOBAL DEFAULT    6 main

Note the _w0 and _w1 labels corresponding to the each of these calls. This particular file is based on the original demo, so also has the exit syscall implemented - that’s the label _x2 in the above.

In practice, I’d probably write these sorts of syscall stubs in pure assembly which would prevent inlining anyway, and register the labels in much the same way as Wellons’ did in his example. I however thought this was a good chance to document this feature of inline assembly, which from time to time is quite useful, and I forget the details of quite readily.

A note on the general technique

What this technique really shows is that addresses in the instruction stream can be captured and inserted into a table by using an inline assembly block. Given inline assembly can also bind specific values to specific registers, tracepoints follow quite naturally - bind the values you want to trace to the registers you want to read them from, emit a compiler generated label which you insert into a tracepoint table, and then insert a nop instruction which can be patched to jump to tracing code at runtime (by ptrace, for example).


  1. Pinned syscalls work quite nicely with static libraries because of a feature called selective linking. It’s not so much the syscall location being limited which is useful, so much as the reduction in available syscalls (although pledge also does a lot here). ↩︎