Orion language and compiler testing

December 3, 2024

My language and compiler (“orion”) is getting to the point I could use it to write some vaguely real software. This year it got a new backend and intermediate representation which ought to make it easier for me to a) add new features and b) get greater leverage and consistency across existing features.

However, the language still feels a bit rough to use in many (most?) areas, so it’s useful to find programs to write to work out where the kinks are. Advent of code provides some problems of this kind, so I had a go with day 1.

Since this is my own language there is no standard library (yet), so I’ve used a small amount of libc for now, which more or less translates one-to-one with syscalls (which I can wrap separately / implement better later).

I wrote also some very rough utilities to help solve the problem. These are a long way from good, on any level, but they provide some initial ideas for things that might belong in a standard library.

Since I can look at the data, practically no validation is done, and error handling is minimal. The main aim is to get a feel for using the language, and start shaping a minimal set of useful tools that I can go and implement properly later.

Things learned/affirmed

There are some obvious things I know need doing (for loops, multiple returns…). Here are some others that stood out from this exercise.

Character comparisons are currently clumsy.
More compact += modifiers would improve ergonomics.
Although not present in the solution below, initializing arrays in structs is not really possible at the moment without what is essentially a workaround (build an array, copy it in).
Array indexing mandating usize is essentially inflates structure sizes (often arrays are small and a u32 or smaller would do), or forces conversions/casts to be used everywhere.
- Because the value of an index doesn’t propagate outside the indexing expression, it’s probably reasonable to auto-convert the types here. Inference won’t cause the conversion to bleed out to other value declarations.
- Still undecided as to whether the default size type should be signed. Bounds checking can help catch the negative cases, should they happen (which is better than missing a massive unsigned index caused by a bad subtraction).

Bugs fixed

A bad assert on variadic procedure calls in IR lowering - got triggered when no variadic arguments were passed (boundary check overly strong).

Solution

//
// Day 1 advent-of-code in orion.
//
// Compiled with
//   oc -c day1.oc -o day1.o && cc -o day1 day1.o
// where the cc is really just linking libc and the C runtime to get us into main.
// These are both relatively easily replaced.
//

//
// A couple of bindings are used from libc to get us off the ground
//
O_RDONLY :: 0;
open : (*c_char, i32) -> i32 : @external;
printf : (*c_char, ...) -> i32 : @external;
read : (i32, *u8, usize) -> i64 : @external;

//
// Input buffering structure 
//
Ibuf :: struct {
	buf    : []u8;
	count  : usize;
	used   : usize;
	fd     : i32;
};

ibuf_new :: (fd : i32, buf: []u8) -> Ibuf
{
	return Ibuf {
		buf     = buf,
		count   = 0,
		used    = 0,
		fd      = fd
	};
}

ibuf_shift :: (ibuf : *Ibuf) -> void
{
	next := 0;

	while ibuf.used + next < ibuf.count {
		ibuf.buf[next] = ibuf.buf[ibuf.used + next];
	}

	ibuf.count = ibuf.count - ibuf.used;
	ibuf.used  = 0;

	return;
}

ibuf_fill :: (ibuf : *Ibuf) -> void
{
	numbytes := read(ibuf.fd, &ibuf.buf[ibuf.used], ibuf.buf.length - ibuf.count);

	// TODO Errors - we're lazy here and just ignore the error - the user
	// will spot no new bytes are read and bail out.
	if numbytes >= 0 {
		ibuf.count = ibuf.count + cast(usize)numbytes;
	}

	return;
}

ibuf_next :: (ibuf : *Ibuf, out : *u8) -> bool
{
	if ibuf.used == ibuf.count {
		ibuf_shift(ibuf);
		ibuf_fill(ibuf);
	}

	// If we fail to get new characters, just assume EOF
	if ibuf.used == ibuf.count {
		return false;
	}

	*out = ibuf.buf[ibuf.used];
	ibuf.used = ibuf.used + 1;
	return true;
}

ibuf_i32 :: (ibuf : *Ibuf, out : *i32) -> bool
{
	acc := 0;

	b : u8;
	in_number := false;
	while ibuf_next(ibuf, &b) {
		if c_char(b) == '\n' || c_char(b) == ' ' {
			if in_number {
				*out = acc;
				return true;
			}
		}

		if i32('0') <= i32(b) && i32(b) <= i32('9')
		{
			in_number = true;
			acc = acc * 10 + i32(b) - i32('0');
		}
	}

	if in_number {
		*out = acc;
		return true;
	}
	   
	return false;
}

//
// Sorting.
//
quicksort :: (ns : []i32) -> void
{
	if ns.length <= 1 {
		return;
	}

	pivot := ns[ns.length - 1];

	l := 0;
	r := ns.length - 1;

	// r is an empty slot, everything to the left of l is < pivot,
	// everything to the right of r is >= pivot.
	while l < r {
		if ns[l] < pivot {
			l = l + 1;
		} else {
			ns[r] = ns[l];
			r = r - 1;
			ns[l] = ns[r];
		}
	}

	ns[r] = pivot;

	quicksort([]i32{ data = ns.data, length = r});
	quicksort([]i32{ data = &ns[r + 1], length = ns.length - r - 1});

	return;
}

//
// Solution to problem.
//

// Looking at the data files, we can preallocate buffers for the numbers.
buf_a : [i32 | 1024];
buf_b : [i32 | 1024];

main :: () -> i32
{
	fd := open("data/day1-input", O_RDONLY);

	if fd < 0 {
	   printf("failed to open input file\n"); // TODO stderr
	   return 1;
	}

	ibuf_mem : [u8 | 128];
	ibuf := ibuf_new(fd, []u8(&ibuf_mem));

	list_a := []i32 { data = &buf_a[0], length = 0 };
	list_b := []i32 { data = &buf_b[0], length = 0 };

	count : i32;
	n : i32;

	while ibuf_i32(&ibuf, &n) {
		if (count & 0x1) == 0 {
			list_a[list_a.length] = n;
			list_a.length = list_a.length + 1;
		} else {
			list_b[list_b.length] = n;
			list_b.length = list_b.length + 1;
		}

		count = count + 1;
	}

	quicksort(list_a);
	quicksort(list_b);

	distance := 0;
	i := 0;
	while i < list_a.length {
		delta := list_b[i] - list_a[i];

		if delta < 0 {
			delta = -delta;
		}

		distance = distance + delta;
		i = i + 1;
	}

	printf("Part 1: %d\n", distance);
	printf("Part 2: %d\n", similarity_score(list_a, list_b));

	return 0;
}

// Assumes l and r are sorted.
similarity_score :: (l : []i32, r: []i32) -> i32
{
	score := 0;

	i := 0;
	j := 0;
	while i < l.length {
		count := 0;

		while j < r.length && r[j] < l[i] {
			j = j + 1;
		}

		while j < r.length && r[j] == l[i] {
			count = count + 1;
			j = j + 1;
		}

		score = score + l[i] * count;
		i = i + 1;
	}

	return score;
}