A simple fuzzy matcher

February 29, 2024

I like interfaces which permit fuzzy matching. I always imagined the algorithms for fuzzy matching over a large body of data to be quite sophisticated (I suppose they can be), but realised you can get a reasonable way with very little.

The following implements a very basic fuzzy matcher. This was inspired by a skim of the fzf source code (or at least one of the comments), but is not quite the same algorithm. The intention here is to demonstrate the technique, so it can be adjusted to specific needs. There are some obvious extensions that can make it more useful (case insensitivity for example).

static int
fuzzy_matches(const char *pattern, const char *text)
{
	const char *s = text;

	while (*pattern != '\0')
	{
		if (*s == ' ')		{ s++; continue; }
		if (*pattern == ' ')	{ pattern++; s = text; continue; }
		if (*s == '\0')		{ break; }
		if (*pattern == *s)	{ pattern++; }
		s++;
	}

	return *pattern == '\0';
}

How does this work? pattern is considered as space separated chunks, and then each chunk is used to seek a subsequence of text which matches the letters in that chunk. So if pattern is "mth fzy", and text is "fuzzy match", then pattern will be considered in two chunks, "mth" and "fzy":

mth:       fuzzy match
                 ^ ^ ^

fzy:       fuzzy match
           ^ ^ ^

yielding a match. If we run off the end of the text before a chunk has completed, we exit the loop, and return a non-match.

A little driver code shows that this works reasonably:

#include <stdio.h>

static void
print_match(const char *pattern, const char *text)
{
	printf("fuzzy_matches(\"%s\", \"%s\"): %s\n", pattern, text,
	       fuzzy_matches(pattern, text) ? "yes" : "no");
}

int
main()
{
	print_match("hlo",	"hello");
	print_match("I tpt",	"I'm a teapot");
	print_match("fllw",	"What level is this?");
	print_match("two one",	"one two");

	return 0;
}

Which when run produces the output:

fuzzy_matches("hlo", "hello"): yes
fuzzy_matches("I tpt", "I'm a teapot"): yes
fuzzy_matches("fllw", "What level is this?"): no
fuzzy_matches("two one", "one two"): yes