Today I learned
January 1, 0001
This is an account of things I've learned day-by-day. Some are things I've looked up and maybe even used before, and subsequently forgot.
Some entries are mostly links to good articles that I've read.
May 2020
[2020-05-12 Tue]
date
can be used to convert between timezones
For example:
$ date --date='TZ="US/Pacific" 09:00 next Fri'
Fri 15 May 17:00:00 BST 2020
[2020-05-13 Wed]
Assembly and C linkage
A clarification, more than a TIL. Given a label foo
in assembly
foo:
db 'This is a string',0
and a declaration in C
extern char foo;
foo
in C
will refer to the data at label foo
, in this case 'T'
, whereas &foo
will provide the address of the label foo
.
If the string foo
is desired, declare foo
as
extern char[] foo;
[2020-05-14 Thu]
gdb
can connect to a remote server for debugging
gdb
can connect over tcp/unix socket etc to a remote binary (even a gdb server, which can launch and/or attach to any process). Debug symbols are only needed in a local binary.
qemu
has some integration with this functionality to help debug kernels etc.
org-mode
can insert dates with C-c !
Handy for writing the date next to items in org-mode. Uses a calendar for selection.
iret
vs iretq
iret
needs special instruction to return from an interrupt handler in 64-bit mode (it defaults to a 32-bit return). iretq
does this.
[2020-05-15 Fri]
Linux calling convention stack alignment
Linux expects the stack to be 16-byte aligned when a function is entered. call
pushes the return address to the stack (8 bytes), so before call, the stack needs to be odd height (assuming it is aligned to 16 bytes to begin with).
Linux allows multiple returns in registers
More precisely, integers are returned in rax
or rdx:rax
, while floating point numbers are returned in xmm0
or xmm1:xmm0
.
x86_64 has conditional jump, mov and set instructions
Conditional jump is well known and standard, but (the prefix) cmov
provides a conditional move, and (prefix) set
provides a conditional set.
These operations can be conditioned on (s)sign, (z)zero, (c)carry or (o)overflow. (n) can be added to negate.
For example cmovno rax, 10
moves 10 into rax
if the previous instruction did not overflow.
NASM has a nice tutorial
NASM has a tutorial.
[2020-05-16 Sat]
hlt
waits for an interrupt to come in
The instruction hlt
waits for an interrupt. Kernels often boil down to installing a bunch of interrupt handlers, and then entering a very simple event loop:
loop:
hlt
jmp loop
Handlers for double faults can't return
Implementing handlers for double faults is useful nonetheless, for outputting diagnostic information. Care needs to be taken for stack overflows, since the interrupt handler won't be able to push to the stack. For this reason it's a good idea to have a separate stack for the double fault handler, and use long modes interrupt stack switching mechanism to handle it.
This requires setting up a TSS (Task state segment).
nm
can be used to inspect symbols in an object file
This is a nice alternative to objdump -T
.
objdump -h
can be used to look at the sections in a binary
It's interesting to compare the sections added by gcc
versus an executable built with nasm
and linked directly with ld
.
The basics of linker scripts
This page gives a nice overview of the basics. Linker scripts are handy for controlling how executables are laid out in memory.
The GNU linker allows symbols to be versioned
The linker scripts introductory article gives the basics, but there is a GNU extension which allows symbols to be exposed by version. It's not clear this is a good idea. Linker scripts can be explicit about what they want to link against.
__asm__(".symver current_foo, foo@");
__asm__(".symver v1_foo, foo@VERS_1.1");
[2020-05-18 Mon]
Freeing memory can impact latency by flushing TLBs across cores
Freeing memory can affect translation lookaside buffers, and require them to be flushed. The resulting cache miss can impact latency in a system.
GRUB has tools for checking multiboot headers
GRUB comes with tools for checking multiboot headers
$ grub-file --is-x86-multiboot build/kernel.x86_64
$ grub-file --is-x86-multiboot2 build/kernel.x86_64
Both return 0
on success, and 1
on failure.
Note, to boot a multiboot 2 kernel, GRUB needs to boot the kernel with
multiboot2 /path/to/kernel
as opposed to:
multiboot /path/to/kernel
[2020-05-20 Wed]
Sized data can be loaded into registers by using the correct names
Some registers have names for subranges. For example,
mov al, [pos] ; Loads byte [pos] into lower byte of ax
mov ah, [pos] ; Loads byte [pos] into upper byte of ax
mov ax, [pos] ; Loads 16-bit word [pos] into ax
mov eax, [pos] ; Loads 32-bit double into eax
mov rax, [pos] ; Loads 64-bit quad word into rax
[2020-05-22 Fri]
It's possible to test individual bits using the bt
instruction
The bt
instruction allows you to test a bit, storing the result in the carry bit. Jump on carry can be used to act on this. For example
bt eax, 4 ; Test bit 4 in eax
jnc bit_not_set ; Jump to bit_no_set if bit 4 wasn't set
C-r inverts colours in zathura
This makes reading some pdfs much more friendly on the eyes.
[2020-05-23 Sat]
How to use x86_64 repeat string operations
This is useful for mass zeroing memory. String instructions such as stosb
(store byte string) can be repeated. stosb
will copy AL to the address in es:edi
, and increment or decrement edi
afterwards (incrementing if the DF
flag in EFLAGS
is zero, and decrementing if the DF
flag in EFLAGS
is one).
rep
will repeat some operations until the value in ecx
is zero, so to zero 100 bytes starting at a label foo
we can
xor eax, eax ; Clear eax (and hence al)
mov ecx, 100 ; Counter for 100 bytes
mov edi, foo ; The address we want to start writing to
rep stosb ; Zero the bytes!
foo: resb 100
Page table structures
This is something I knew before, but had forgotten. Page table entries on x86_64
(and presumably other architectures which use something like paging) contain addresses to subsequent structures, either pages in memory, or further page table structures. Each page table entry is essentially the address of the next structure that matters, with the bits that aren't used filled in with control structure. An obvious idea, but very neat!
To make this clearer, suppose a page table structure is pointing at a 2MB page, at address addr
. If addr
has length n
bits, then the page table structure has structure CTL_BITS:addr[n:21]:CTL_BITS
, since the first 20 bits are used as an offset into the page, and x86_64 doesn't have full 64-bit addressing.
A simple idea, and in some way the obvious way to do it, but it saves shifting chunks of addresses around. Some faff is still required in 32-bit protected mode however, because only 32-bit double words can be written, and the address straddles the word boundary.
In addition to this, because pages and page table structures need to be aligned on appropriate boundaries, the lower bits are guaranteed to be zero, so they don't need to be manually cleared.
[2020-05-24 Sun]
glibc
has hooks for memory allocation functions
These allow you to intercept calls to malloc
/ calloc
/ free
etc. musl
libc doesn't appear to have the same utilities.
See the documentation on these hooks for more details.
[2020-05-25 Mon]
The GNU linker allows external symbols to be wrapped
The GNU linker allows a --wrap=
option to be specified to allow e.g. interception of calls to malloc
.
The GNU linker documentation has more details (see --wrap
).
The gold
linker is installed as part of binutils
The gold
linker, distinct from the GNU linker (ld
) is installed as part of binutils
. It can be invoked on a system using gold
.
[2020-05-29 Fri]
Defining themes in emacs
Defining basic themes in emacs is quite straightforward:
(deftheme mytheme)
(let ((color-bg "#2f4858")
(color-bg-alt "#25333d")
(color-fg "#ffedcb")
(color-yellow "#d79921")
(color-orange "#f56800")
(color-green "#d6d129")
(color-cyan "#009c89")
(color-red "#d92929")
(color-grey "#b6a99a"))
(custom-theme-set-faces
'mytheme
`(default ((t (:foreground ,color-fg :background ,color-bg-alt))))
`(fringe ((t (:foreground ,color-fg :background ,color-bg-alt))))
`(highlight ((t (:foreground nil :background ,color-bg))))
;; Font lock faces
`(font-lock-builtin-face ((t (:foreground ,color-fg))))
`(font-lock-comment-face ((t (:foreground ,color-cyan))))
`(font-lock-constant-face ((t (:foreground ,color-fg))))
`(font-lock-function-name-face ((t (:foreground ,color-green :weight bold))))
`(font-lock-keyword-face ((t (:foreground ,color-orange))))
`(font-lock-string-face ((t (:foreground ,color-grey))))
`(font-lock-type-face ((t (:foreground ,color-yellow :weight bold))))
`(font-lock-variable-name-face ((t (:foreground ,color-fg))))
)
)
(provide-theme 'mytheme)
Additional customizations can be set to get a reasonable appearance.
[2020-05-31 Sun]
The difference between interrupt gates and trap gates
When execution moves through a trap gate, additional further external hardware interrupts are not masked (the IF
flags is not cleared).
Interrupt gates ensure that the IF
bit is cleared, so further external interrupts cannot happen during the execution of the interrupt handler.
June 2020
[2020-06-01 Mon]
tsp
the task spooler
I knew about this before, but I'm noting it here since I'm prone to forgetting about it. task-spooler
is a handy shell tool for queueing tasks. It's accessed using the binary tsp
.
[2020-06-02 Tue]
Executing normal mode commands in vim
on a block
If a block has been selected in visual mode, line-wise commands can be run across that block by typing :norm SEQUENCE_OF_KEYPRESSES
.
make
will match on the source as well as the target in multiple rules
It's possible to have multiple rules for a target which are discriminated by the dependencies. This allows, for example, easy mixing of C and assembly files.
For example
%.o: %.c Makefile
$(CC) -c $< -o $@
%.o: %.asm Makefile
nasm -felf64 $< -o $@
How patterns match provides more information.
[2020-06-03 Wed]
Lots about TTY
This article explains a lot about the tty
.
[2020-06-04 Thu]
Locked instructions can sometimes avoid locking the bus
Instructions like xchg
will automatically prefix themselves with lock
to ensure consistency across processors.
Lock ensures that the processor has exclusive access to the memory it references. This in general can raise the lock signal on the bus, but if the memory location is in the processors cache,, and within a cache line, only the processors cache is locked.
This page provides more detail, as does the Intel manual.
Some details on cache coherency and how that impacts memory models
Here is a good article on this.
[2020-06-05 Fri]
Embedding files in binaries
This article describes how to do this with ld
. I've used this before, but making a note of it here for further use.
[2020-06-06 Sat]
firejail
has a default profile
firejail
has a default profile, which can be used to run any program. Without an argument, it will run a shell. For example
firejail zsh
will start zsh
in a basic sandbox. In this sandbox, for example ~/.ssh
is not readable.
Makefile automatic variables
GNU make has a nice page summarizing the automatic variables available in make rules. The documentation is available here.
[2020-06-07 Sun]
Escaping a stuck ssh
session
Type enter followed by ~.
(tilde, fullstop). The combination of enter followed by ~
acts as an escape character
[2020-06-12 Fri]
Font rendering is more interesting than it at first seems
There seem to be some interesting algorithms to render fonts efficiently. See e.g. an explanation of the stb TrueType rasterizer.
[2020-06-13 Sat]
XEmbed is used for embedding one X window inside another
XEmbed is the protocol used to embed one GUI inside another. It's what allows e.g. st
to work with tabbed
. tabbed
expects child windows to support XEmbed.
[2020-06-16 Tue]
movsb
is a repeatable instruction (takes rep
prefixes) for copying strings
I missed this before despite looking for it. There are also examples for moving words and doubles.
[2020-06-22 Mon]
How to cherry-pick commits with magit
Turns out this is crazy ergonomic. Start with the branch you want to pick commits into checked out. Then press l
to open the git log, and o
to switch to another branch. Select a branch to switch to (one where the commits you need reside).
Go to the commits you want to pick, and press A A
followed by enter. You've cherry picked the commit.
[2020-06-30 Tue]
Referencing variables from linker scripts
It's possible to define variables in linker scripts and then resolve them in source code. This is useful for a binary to have some idea of where it is in memory, and how much memory its occupying.
For example, we can define position
in a linker script:
position = .;
And then reference it from C
source code with external linkage, but referencing and address (there is no memory allocated in the binary, only a symbol in the address table).
extern uint8_t position;
...
&position
Its a little simpler in assembly, because an extern
declaration already resolves to the address
extern position;
foo: dq position
Further documentation for linker script source code references can be found here.
July 2020
[2020-07-07 Tue]
man ascii
is a thing
Turns out that man ascii
is a thing (ASCII has a man page), with a convenient table of characters.