Today I learned

January 1, 0001

This is an account of things I've learned day-by-day. Some are things I've looked up and maybe even used before, and subsequently forgot.

Some entries are mostly links to good articles that I've read.

May 2020

[2020-05-12 Tue]

date can be used to convert between timezones

For example:

$ date --date='TZ="US/Pacific" 09:00 next Fri'
Fri 15 May 17:00:00 BST 2020

[2020-05-13 Wed]

Assembly and C linkage

A clarification, more than a TIL. Given a label foo in assembly

  db 'This is a string',0

and a declaration in C

extern char foo;

foo in C will refer to the data at label foo, in this case 'T', whereas &foo will provide the address of the label foo.

If the string foo is desired, declare foo as

extern char[] foo;

[2020-05-14 Thu]

gdb can connect to a remote server for debugging

gdb can connect over tcp/unix socket etc to a remote binary (even a gdb server, which can launch and/or attach to any process). Debug symbols are only needed in a local binary.

qemu has some integration with this functionality to help debug kernels etc.

org-mode can insert dates with C-c !

Handy for writing the date next to items in org-mode. Uses a calendar for selection.

iret vs iretq

iret needs special instruction to return from an interrupt handler in 64-bit mode (it defaults to a 32-bit return). iretq does this.

[2020-05-15 Fri]

Linux calling convention stack alignment

Linux expects the stack to be 16-byte aligned when a function is entered. call pushes the return address to the stack (8 bytes), so before call, the stack needs to be odd height (assuming it is aligned to 16 bytes to begin with).

Linux allows multiple returns in registers

More precisely, integers are returned in rax or rdx:rax, while floating point numbers are returned in xmm0 or xmm1:xmm0.

x86_64 has conditional jump, mov and set instructions

Conditional jump is well known and standard, but (the prefix) cmov provides a conditional move, and (prefix) set provides a conditional set.

These operations can be conditioned on (s)sign, (z)zero, (c)carry or (o)overflow. (n) can be added to negate.

For example cmovno rax, 10 moves 10 into rax if the previous instruction did not overflow.

NASM has a nice tutorial

[2020-05-16 Sat]

hlt waits for an interrupt to come in

The instruction hlt waits for an interrupt. Kernels often boil down to installing a bunch of interrupt handlers, and then entering a very simple event loop:

  jmp loop

Handlers for double faults can't return

Implementing handlers for double faults is useful nonetheless, for outputting diagnostic information. Care needs to be taken for stack overflows, since the interrupt handler won't be able to push to the stack. For this reason it's a good idea to have a separate stack for the double fault handler, and use long modes interrupt stack switching mechanism to handle it.

This requires setting up a TSS (Task state segment).

nm can be used to inspect symbols in an object file

This is a nice alternative to objdump -T.

objdump -h can be used to look at the sections in a binary

It's interesting to compare the sections added by gcc versus an executable built with nasm and linked directly with ld.

The basics of linker scripts

This page gives a nice overview of the basics. Linker scripts are handy for controlling how executables are laid out in memory.

The GNU linker allows symbols to be versioned

The linker scripts introductory article gives the basics, but there is a GNU extension which allows symbols to be exposed by version. It's not clear this is a good idea. Linker scripts can be explicit about what they want to link against.

__asm__(".symver current_foo, foo@");
__asm__(".symver v1_foo, foo@VERS_1.1");

[2020-05-18 Mon]

Freeing memory can impact latency by flushing TLBs across cores

Freeing memory can affect translation lookaside buffers, and require them to be flushed. The resulting cache miss can impact latency in a system.

GRUB has tools for checking multiboot headers

GRUB comes with tools for checking multiboot headers

$ grub-file --is-x86-multiboot build/kernel.x86_64

$ grub-file --is-x86-multiboot2 build/kernel.x86_64

Both return 0 on success, and 1 on failure.

Note, to boot a multiboot 2 kernel, GRUB needs to boot the kernel with

multiboot2 /path/to/kernel

as opposed to:

multiboot /path/to/kernel

[2020-05-20 Wed]

Sized data can be loaded into registers by using the correct names

Some registers have names for subranges. For example,

  mov al, [pos]     ; Loads byte [pos] into lower byte of ax
  mov ah, [pos]     ; Loads byte [pos] into upper byte of ax
  mov ax, [pos]     ; Loads 16-bit word [pos] into ax
  mov eax, [pos]    ; Loads 32-bit double into eax
  mov rax, [pos]    ; Loads 64-bit quad word into rax

[2020-05-22 Fri]

It's possible to test individual bits using the bt instruction

The bt instruction allows you to test a bit, storing the result in the carry bit. Jump on carry can be used to act on this. For example

  bt eax, 4         ; Test bit 4 in eax
  jnc bit_not_set   ; Jump to bit_no_set if bit 4 wasn't set

C-r inverts colours in zathura

This makes reading some pdfs much more friendly on the eyes.

[2020-05-23 Sat]

How to use x86_64 repeat string operations

This is useful for mass zeroing memory. String instructions such as stosb (store byte string) can be repeated. stosb will copy AL to the address in es:edi, and increment or decrement edi afterwards (incrementing if the DF flag in EFLAGS is zero, and decrementing if the DF flag in EFLAGS is one).

rep will repeat some operations until the value in ecx is zero, so to zero 100 bytes starting at a label foo we can

  xor eax, eax    ; Clear eax (and hence al)
  mov ecx, 100    ; Counter for 100 bytes
  mov edi, foo    ; The address we want to start writing to
  rep stosb       ; Zero the bytes!

foo: resb 100

Page table structures

This is something I knew before, but had forgotten. Page table entries on x86_64 (and presumably other architectures which use something like paging) contain addresses to subsequent structures, either pages in memory, or further page table structures. Each page table entry is essentially the address of the next structure that matters, with the bits that aren't used filled in with control structure. An obvious idea, but very neat!

To make this clearer, suppose a page table structure is pointing at a 2MB page, at address addr. If addr has length n bits, then the page table structure has structure CTL_BITS:addr[n:21]:CTL_BITS, since the first 20 bits are used as an offset into the page, and x86_64 doesn't have full 64-bit addressing.

A simple idea, and in some way the obvious way to do it, but it saves shifting chunks of addresses around. Some faff is still required in 32-bit protected mode however, because only 32-bit double words can be written, and the address straddles the word boundary.

In addition to this, because pages and page table structures need to be aligned on appropriate boundaries, the lower bits are guaranteed to be zero, so they don't need to be manually cleared.

[2020-05-24 Sun]

glibc has hooks for memory allocation functions

These allow you to intercept calls to malloc / calloc / free etc. musl libc doesn't appear to have the same utilities.

See the documentation on these hooks for more details.

[2020-05-25 Mon]

The GNU linker allows external symbols to be wrapped

The GNU linker allows a --wrap= option to be specified to allow e.g. interception of calls to malloc.

The GNU linker documentation has more details (see --wrap).

The gold linker is installed as part of binutils

The gold linker, distinct from the GNU linker (ld) is installed as part of binutils. It can be invoked on a system using gold.

[2020-05-29 Fri]

Defining themes in emacs

Defining basic themes in emacs is quite straightforward:

(deftheme mytheme)

(let ((color-bg "#2f4858")
      (color-bg-alt "#25333d")
      (color-fg "#ffedcb")
      (color-yellow "#d79921")
      (color-orange "#f56800")
      (color-green "#d6d129")
      (color-cyan "#009c89")
      (color-red "#d92929")
      (color-grey "#b6a99a"))

   `(default ((t (:foreground ,color-fg :background ,color-bg-alt))))
   `(fringe ((t (:foreground ,color-fg :background ,color-bg-alt))))
   `(highlight ((t (:foreground nil :background ,color-bg))))
   ;; Font lock faces
   `(font-lock-builtin-face ((t (:foreground ,color-fg))))
   `(font-lock-comment-face ((t (:foreground ,color-cyan))))
   `(font-lock-constant-face ((t (:foreground ,color-fg))))
   `(font-lock-function-name-face ((t (:foreground ,color-green :weight bold))))
   `(font-lock-keyword-face ((t (:foreground ,color-orange))))
   `(font-lock-string-face ((t (:foreground ,color-grey))))
   `(font-lock-type-face ((t (:foreground ,color-yellow :weight bold))))
   `(font-lock-variable-name-face ((t (:foreground ,color-fg))))

(provide-theme 'mytheme)

Additional customizations can be set to get a reasonable appearance.

[2020-05-31 Sun]

The difference between interrupt gates and trap gates

When execution moves through a trap gate, additional further external hardware interrupts are not masked (the IF flags is not cleared). Interrupt gates ensure that the IF bit is cleared, so further external interrupts cannot happen during the execution of the interrupt handler.

June 2020

[2020-06-01 Mon]

tsp the task spooler

I knew about this before, but I'm noting it here since I'm prone to forgetting about it. task-spooler is a handy shell tool for queueing tasks. It's accessed using the binary tsp.

[2020-06-02 Tue]

Executing normal mode commands in vim on a block

If a block has been selected in visual mode, line-wise commands can be run across that block by typing :norm SEQUENCE_OF_KEYPRESSES.

make will match on the source as well as the target in multiple rules

It's possible to have multiple rules for a target which are discriminated by the dependencies. This allows, for example, easy mixing of C and assembly files.

For example

%.o: %.c Makefile
  $(CC) -c $< -o $@

%.o: %.asm Makefile
  nasm -felf64 $< -o $@

How patterns match provides more information.

[2020-06-03 Wed]

Lots about TTY

This article explains a lot about the tty.

[2020-06-04 Thu]

Locked instructions can sometimes avoid locking the bus

Instructions like xchg will automatically prefix themselves with lock to ensure consistency across processors.

Lock ensures that the processor has exclusive access to the memory it references. This in general can raise the lock signal on the bus, but if the memory location is in the processors cache,, and within a cache line, only the processors cache is locked.

This page provides more detail, as does the Intel manual.

Some details on cache coherency and how that impacts memory models

[2020-06-05 Fri]

Embedding files in binaries

This article describes how to do this with ld. I've used this before, but making a note of it here for further use.

[2020-06-06 Sat]

firejail has a default profile

firejail has a default profile, which can be used to run any program. Without an argument, it will run a shell. For example

firejail zsh

will start zsh in a basic sandbox. In this sandbox, for example ~/.ssh is not readable.

Makefile automatic variables

GNU make has a nice page summarizing the automatic variables available in make rules. The documentation is available here.

[2020-06-07 Sun]

Escaping a stuck ssh session

Type enter followed by ~. (tilde, fullstop). The combination of enter followed by ~ acts as an escape character

[2020-06-12 Fri]

Font rendering is more interesting than it at first seems

There seem to be some interesting algorithms to render fonts efficiently. See e.g. an explanation of the stb TrueType rasterizer.

[2020-06-13 Sat]

XEmbed is used for embedding one X window inside another

XEmbed is the protocol used to embed one GUI inside another. It's what allows e.g. st to work with tabbed. tabbed expects child windows to support XEmbed.

[2020-06-16 Tue]

movsb is a repeatable instruction (takes rep prefixes) for copying strings

I missed this before despite looking for it. There are also examples for moving words and doubles.

[2020-06-22 Mon]

How to cherry-pick commits with magit

Turns out this is crazy ergonomic. Start with the branch you want to pick commits into checked out. Then press l to open the git log, and o to switch to another branch. Select a branch to switch to (one where the commits you need reside).

Go to the commits you want to pick, and press A A followed by enter. You've cherry picked the commit.

This stackexchange post covers the process.

[2020-06-30 Tue]

Referencing variables from linker scripts

It's possible to define variables in linker scripts and then resolve them in source code. This is useful for a binary to have some idea of where it is in memory, and how much memory its occupying.

For example, we can define position in a linker script:

  position = .;

And then reference it from C source code with external linkage, but referencing and address (there is no memory allocated in the binary, only a symbol in the address table).

  extern uint8_t position;



Its a little simpler in assembly, because an extern declaration already resolves to the address

  extern position;
  foo: dq position

Further documentation for linker script source code references can be found here.

July 2020

[2020-07-07 Tue]

man ascii is a thing

Turns out that man ascii is a thing (ASCII has a man page), with a convenient table of characters.