Inlinable pinned OpenBSD syscalls
March 8, 2025
OpenBSD has the capability of restricting permissible syscall
origins addresses.
Over the last few days there have been some interesting demos here and here, demonstrating how to do this with statically linked binaries.
This is nice if you want to make use of this security feature (if you believe it has value) but don’t want to make use of libc
1.
The way this feature appears to work is that if your OpenBSD binary contains a section called .openbsd.syscalls
, then OpenBSD will verify that every syscall
comes from one of the sites listed in this section.
The section itself is a list of address and syscall number pairs.
Wellons’ example is already great, and all that’s needed for practically, but I wanted to document how to make these syscall procedures inlinable, as a way of demonstrating an inline assembly feature I learned while studying/trying to recreate the SystemTap tracepoint macros. It’s debateable whether there is any real value inlining system calls - they are already expensive versus the procedure call overhead.
In any case, this allows us to let the compiler decide what to do.
Wellons version of write
for example, is as follows:
__attribute((noinline))
long w(void *what, long len)
{
char err;
long rax = 4; // SYS_write
asm volatile (
"_w: syscall"
: "+a"(rax), "+d"(len), "=@ccc"(err)
: "D"(1), "S"(what)
: "rcx", "r11", "memory"
);
return err ? -rax : rax;
}
with an explicit registration of this syscall globally:
asm (
".pushsection .openbsd.syscalls\n"
".long _w, 4\n"
".popsection\n"
);
We can do this all in one using a feature of inline assembly that lets us generate a unique label for that particular use of the assembly block. This looks as follows:
long
w(void *what, long len)
{
char err;
long rax = 4;
__asm__ volatile (
"_w%=: syscall\n"
".pushsection .openbsd.syscalls\n"
".long _w%=, 4\n"
".popsection\n"
: "+a"(rax), "+d"(len), "=@ccc"(err)
: "D"(1), "S"(what)
: "rcx", "r11", "memory");
return err ? -rax : rax;
}
The %=
directive tells the assembler to generate a unique lable for that snippet.
We can look at the labels generated with readelf
.
A small test program with two calls to write
(along with a directive to force them to be inlined), contains the following in its symbol table:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS pinsyscall.c
2: 000000000020143f 0 NOTYPE LOCAL DEFAULT 6 _w0
3: 000000000020149a 0 NOTYPE LOCAL DEFAULT 6 _w1
4: 00000000002014d0 17 FUNC LOCAL DEFAULT 6 exit
5: 00000000002014df 0 NOTYPE LOCAL DEFAULT 6 _x2
6: 00000000002024e8 8 OBJECT LOCAL HIDDEN 7 __retguard_1869
7: 00000000002024f0 8 OBJECT LOCAL HIDDEN 7 __retguard_3534
8: 00000000002024f8 8 OBJECT LOCAL HIDDEN 7 __retguard_1545
9: 00000000002013d0 48 FUNC GLOBAL DEFAULT 6 start
10: 00000000002013e3 0 NOTYPE GLOBAL DEFAULT 6 _start
11: 0000000000201400 208 FUNC GLOBAL DEFAULT 6 main
Note the _w0
and _w1
labels corresponding to the each of these calls.
This particular file is based on the original demo, so also has the exit
syscall implemented - that’s the label _x2
in the above.
In practice, I’d probably write these sorts of syscall
stubs in pure assembly which would prevent inlining anyway, and register the labels in much the same way as Wellons’ did in his example.
I however thought this was a good chance to document this feature of inline assembly, which from time to time is quite useful, and I forget the details of quite readily.
A note on the general technique
What this technique really shows is that addresses in the instruction stream can be captured and inserted into a table by using an inline assembly block.
Given inline assembly can also bind specific values to specific registers, tracepoints follow quite naturally - bind the values you want to trace to the registers you want to read them from, emit a compiler generated label which you insert into a tracepoint table, and then insert a nop
instruction which can be patched to jump to tracing code at runtime (by ptrace
, for example).
-
Pinned syscalls work quite nicely with static libraries because of a feature called selective linking. It’s not so much the syscall location being limited which is useful, so much as the reduction in available syscalls (although
pledge
also does a lot here). ↩︎