Today I learned
January 1, 0001
This is an account of things I've learned day-by-day. Some are things I've looked up and maybe even used before, and subsequently forgot.
Some entries are mostly links to good articles that I've read.
date can be used to convert between timezones
$ date --date='TZ="US/Pacific" 09:00 next Fri' Fri 15 May 17:00:00 BST 2020
Assembly and C linkage
A clarification, more than a TIL. Given a label
foo in assembly
foo: db 'This is a string',0
and a declaration in
extern char foo;
C will refer to the data at label
foo, in this case
&foo will provide the address of the label
If the string
foo is desired, declare
extern char foo;
gdb can connect to a remote server for debugging
gdb can connect over tcp/unix socket etc to a remote binary (even a gdb server, which can launch and/or attach to any process). Debug symbols are only needed in a local binary.
qemu has some integration with this functionality to help debug kernels etc.
org-mode can insert dates with C-c !
Handy for writing the date next to items in org-mode. Uses a calendar for selection.
iret needs special instruction to return from an interrupt handler in 64-bit mode (it defaults to a 32-bit return).
iretq does this.
Linux calling convention stack alignment
Linux expects the stack to be 16-byte aligned when a function is entered.
call pushes the return address to the stack (8 bytes), so before call, the stack needs to be odd height (assuming it is aligned to 16 bytes to begin with).
Linux allows multiple returns in registers
More precisely, integers are returned in
rdx:rax, while floating point numbers are returned in
x86_64 has conditional jump, mov and set instructions
Conditional jump is well known and standard, but (the prefix)
cmov provides a conditional move, and (prefix)
set provides a conditional set.
These operations can be conditioned on (s)sign, (z)zero, (c)carry or (o)overflow. (n) can be added to negate.
cmovno rax, 10 moves 10 into
rax if the previous instruction did not overflow.
NASM has a nice tutorial
NASM has a tutorial.
hlt waits for an interrupt to come in
hlt waits for an interrupt. Kernels often boil down to installing a bunch of interrupt handlers, and then entering a very simple event loop:
loop: hlt jmp loop
Handlers for double faults can't return
Implementing handlers for double faults is useful nonetheless, for outputting diagnostic information. Care needs to be taken for stack overflows, since the interrupt handler won't be able to push to the stack. For this reason it's a good idea to have a separate stack for the double fault handler, and use long modes interrupt stack switching mechanism to handle it.
This requires setting up a TSS (Task state segment).
nm can be used to inspect symbols in an object file
This is a nice alternative to
objdump -h can be used to look at the sections in a binary
It's interesting to compare the sections added by
gcc versus an executable built with
nasm and linked directly with
The basics of linker scripts
This page gives a nice overview of the basics. Linker scripts are handy for controlling how executables are laid out in memory.
The GNU linker allows symbols to be versioned
The linker scripts introductory article gives the basics, but there is a GNU extension which allows symbols to be exposed by version. It's not clear this is a good idea. Linker scripts can be explicit about what they want to link against.
__asm__(".symver current_foo, foo@"); __asm__(".symver v1_foo, foo@VERS_1.1");
Freeing memory can impact latency by flushing TLBs across cores
Freeing memory can affect translation lookaside buffers, and require them to be flushed. The resulting cache miss can impact latency in a system.
GRUB has tools for checking multiboot headers
GRUB comes with tools for checking multiboot headers
$ grub-file --is-x86-multiboot build/kernel.x86_64 $ grub-file --is-x86-multiboot2 build/kernel.x86_64
0 on success, and
1 on failure.
Note, to boot a multiboot 2 kernel, GRUB needs to boot the kernel with
as opposed to:
Sized data can be loaded into registers by using the correct names
Some registers have names for subranges. For example,
mov al, [pos] ; Loads byte [pos] into lower byte of ax mov ah, [pos] ; Loads byte [pos] into upper byte of ax mov ax, [pos] ; Loads 16-bit word [pos] into ax mov eax, [pos] ; Loads 32-bit double into eax mov rax, [pos] ; Loads 64-bit quad word into rax
It's possible to test individual bits using the
bt instruction allows you to test a bit, storing the result in the carry bit. Jump on carry can be used to act on this. For example
bt eax, 4 ; Test bit 4 in eax jnc bit_not_set ; Jump to bit_no_set if bit 4 wasn't set
C-r inverts colours in zathura
This makes reading some pdfs much more friendly on the eyes.
How to use x86_64 repeat string operations
This is useful for mass zeroing memory. String instructions such as
stosb (store byte string) can be repeated.
stosb will copy AL to the address in
es:edi, and increment or decrement
edi afterwards (incrementing if the
DF flag in
EFLAGS is zero, and decrementing if the
DF flag in
EFLAGS is one).
rep will repeat some operations until the value in
ecx is zero, so to zero 100 bytes starting at a label
foo we can
xor eax, eax ; Clear eax (and hence al) mov ecx, 100 ; Counter for 100 bytes mov edi, foo ; The address we want to start writing to rep stosb ; Zero the bytes! foo: resb 100
Page table structures
This is something I knew before, but had forgotten. Page table entries on
x86_64 (and presumably other architectures which use something like paging) contain addresses to subsequent structures, either pages in memory, or further page table structures. Each page table entry is essentially the address of the next structure that matters, with the bits that aren't used filled in with control structure. An obvious idea, but very neat!
To make this clearer, suppose a page table structure is pointing at a 2MB page, at address
addr has length
n bits, then the page table structure has structure
CTL_BITS:addr[n:21]:CTL_BITS, since the first 20 bits are used as an offset into the page, and x86_64 doesn't have full 64-bit addressing.
A simple idea, and in some way the obvious way to do it, but it saves shifting chunks of addresses around. Some faff is still required in 32-bit protected mode however, because only 32-bit double words can be written, and the address straddles the word boundary.
In addition to this, because pages and page table structures need to be aligned on appropriate boundaries, the lower bits are guaranteed to be zero, so they don't need to be manually cleared.
glibc has hooks for memory allocation functions
These allow you to intercept calls to
musl libc doesn't appear to have the same utilities.
See the documentation on these hooks for more details.
The GNU linker allows external symbols to be wrapped
The GNU linker allows a
--wrap= option to be specified to allow e.g. interception of calls to
The GNU linker documentation has more details (see
gold linker is installed as part of binutils
gold linker, distinct from the GNU linker (
ld) is installed as part of
binutils. It can be invoked on a system using
Defining themes in emacs
Defining basic themes in emacs is quite straightforward:
(deftheme mytheme) (let ((color-bg "#2f4858") (color-bg-alt "#25333d") (color-fg "#ffedcb") (color-yellow "#d79921") (color-orange "#f56800") (color-green "#d6d129") (color-cyan "#009c89") (color-red "#d92929") (color-grey "#b6a99a")) (custom-theme-set-faces 'mytheme `(default ((t (:foreground ,color-fg :background ,color-bg-alt)))) `(fringe ((t (:foreground ,color-fg :background ,color-bg-alt)))) `(highlight ((t (:foreground nil :background ,color-bg)))) ;; Font lock faces `(font-lock-builtin-face ((t (:foreground ,color-fg)))) `(font-lock-comment-face ((t (:foreground ,color-cyan)))) `(font-lock-constant-face ((t (:foreground ,color-fg)))) `(font-lock-function-name-face ((t (:foreground ,color-green :weight bold)))) `(font-lock-keyword-face ((t (:foreground ,color-orange)))) `(font-lock-string-face ((t (:foreground ,color-grey)))) `(font-lock-type-face ((t (:foreground ,color-yellow :weight bold)))) `(font-lock-variable-name-face ((t (:foreground ,color-fg)))) ) ) (provide-theme 'mytheme)
Additional customizations can be set to get a reasonable appearance.
The difference between interrupt gates and trap gates
When execution moves through a trap gate, additional further external hardware interrupts are not masked (the
IF flags is not cleared).
Interrupt gates ensure that the
IF bit is cleared, so further external interrupts cannot happen during the execution of the interrupt handler.
tsp the task spooler
I knew about this before, but I'm noting it here since I'm prone to forgetting about it.
task-spooler is a handy shell tool for queueing tasks. It's accessed using the binary
Executing normal mode commands in
vim on a block
If a block has been selected in visual mode, line-wise commands can be run across that block by typing
make will match on the source as well as the target in multiple rules
It's possible to have multiple rules for a target which are discriminated by the dependencies. This allows, for example, easy mixing of C and assembly files.
%.o: %.c Makefile $(CC) -c $< -o $@ %.o: %.asm Makefile nasm -felf64 $< -o $@
How patterns match provides more information.
Lots about TTY
This article explains a lot about the
Locked instructions can sometimes avoid locking the bus
xchg will automatically prefix themselves with
lock to ensure consistency across processors.
Lock ensures that the processor has exclusive access to the memory it references. This in general can raise the lock signal on the bus, but if the memory location is in the processors cache,, and within a cache line, only the processors cache is locked.
This page provides more detail, as does the Intel manual.
Some details on cache coherency and how that impacts memory models
Here is a good article on this.
Embedding files in binaries
This article describes how to do this with
ld. I've used this before, but making a note of it here for further use.
firejail has a default profile
firejail has a default profile, which can be used to run any program. Without an argument, it will run a shell. For example
zsh in a basic sandbox. In this sandbox, for example
~/.ssh is not readable.
Makefile automatic variables
GNU make has a nice page summarizing the automatic variables available in make rules. The documentation is available here.
Escaping a stuck
Type enter followed by
~. (tilde, fullstop). The combination of enter followed by
~ acts as an escape character
Font rendering is more interesting than it at first seems
There seem to be some interesting algorithms to render fonts efficiently. See e.g. an explanation of the stb TrueType rasterizer.
XEmbed is used for embedding one X window inside another
XEmbed is the protocol used to embed one GUI inside another. It's what allows e.g.
st to work with
tabbed expects child windows to support XEmbed.
movsb is a repeatable instruction (takes
rep prefixes) for copying strings
I missed this before despite looking for it. There are also examples for moving words and doubles.
How to cherry-pick commits with magit
Turns out this is crazy ergonomic. Start with the branch you want to pick commits into checked out. Then press
l to open the git log, and
o to switch to another branch. Select a branch to switch to (one where the commits you need reside).
Go to the commits you want to pick, and press
A A followed by enter. You've cherry picked the commit.
Referencing variables from linker scripts
It's possible to define variables in linker scripts and then resolve them in source code. This is useful for a binary to have some idea of where it is in memory, and how much memory its occupying.
For example, we can define
position in a linker script:
position = .;
And then reference it from
C source code with external linkage, but referencing and address (there is no memory allocated in the binary, only a symbol in the address table).
extern uint8_t position; ... &position
Its a little simpler in assembly, because an
extern declaration already resolves to the address
extern position; foo: dq position
man ascii is a thing
Turns out that
man ascii is a thing (ASCII has a man page), with a convenient table of characters.