A simple fuzzy matcher
February 29, 2024
I like interfaces which permit fuzzy matching. I always imagined the algorithms for fuzzy matching over a large body of data to be quite sophisticated (I suppose they can be), but realised you can get a reasonable way with very little.
The following implements a very basic fuzzy matcher.
This was inspired by a skim of the fzf
source code (or at least one of the comments), but is not quite the same algorithm.
The intention here is to demonstrate the technique, so it can be adjusted to specific needs.
There are some obvious extensions that can make it more useful (case insensitivity for example).
static int
fuzzy_matches(const char *pattern, const char *text)
{
const char *s = text;
while (*pattern != '\0')
{
if (*s == ' ') { s++; continue; }
if (*pattern == ' ') { pattern++; s = text; continue; }
if (*s == '\0') { break; }
if (*pattern == *s) { pattern++; }
s++;
}
return *pattern == '\0';
}
How does this work?
pattern
is considered as space separated chunks, and then each chunk is used to seek a subsequence of text
which matches the letters in that chunk.
So if pattern
is "mth fzy"
, and text is "fuzzy match"
, then pattern will be considered in two chunks, "mth"
and "fzy"
:
mth: fuzzy match
^ ^ ^
fzy: fuzzy match
^ ^ ^
yielding a match. If we run off the end of the text
before a chunk has completed, we exit the loop, and return a non-match.
A little driver code shows that this works reasonably:
#include <stdio.h>
static void
print_match(const char *pattern, const char *text)
{
printf("fuzzy_matches(\"%s\", \"%s\"): %s\n", pattern, text,
fuzzy_matches(pattern, text) ? "yes" : "no");
}
int
main()
{
print_match("hlo", "hello");
print_match("I tpt", "I'm a teapot");
print_match("fllw", "What level is this?");
print_match("two one", "one two");
return 0;
}
Which when run produces the output:
fuzzy_matches("hlo", "hello"): yes
fuzzy_matches("I tpt", "I'm a teapot"): yes
fuzzy_matches("fllw", "What level is this?"): no
fuzzy_matches("two one", "one two"): yes