Introducing shroudage

Published: 14 September, 2024, updated: 14 September, 2024

tl;dr

shroudage is a shell wrapper around age to set up transparent encryption of content in a git repository, allowing you to have plaintext in your working copy, while keeping the contents encrypted in the repository.

The only dependency is age itself.

Quick overview

Running shroudage init will configure a repository to use the git filters. There are two pieces - the in-repository files, and the repository configuration files.

The in-repository files are populated on the first run - a passphrase-encrypted Age key that will be created for you if it doesn’t exist, a .gitattributes file to define which files should use which git filters, and a copy of the shroudage script so you don’t need to install is separately in future.

The repository configuration files actually configure how the filters work, and needs to happen with every clone of the repository. If you don’t do this, you’ll just see the encrypted contents. These are the clean (working copy to index/commit) and smudge (commit to working copy) filters.

By default, the content folder will be transparently encrypted for you.

You can use git diff --staged --no-textconv to ensure that when you add a file to the staging area, it is being encrypted as you expect. Similarly, if you have a commit, you can use git show --no-textconv to verify it. Then you can push it to upstream.

Otherwise, git diff and git show will show you plaintext changes.

On a new clone, you’ll need to run git restore content to decrypt the files after running .scripts/shroudage init.

The full story

Run-up

I’ve been wanting to try out diary-style journaling lately to augment some other types of journaling I’ve been doing lately, and had some requirements that were hard to find in existing tools. I wanted something low-friction, with the ability to synchronize across devices (and not just Apple devices), while also being encrypted.

Ideally, it would be text-based, easy to search and navigate, and so forth.

I did look through a bunch of options, but they all had some non-trivial irritations. At the end of the day, it’s basically what I want from the environment I use today - text files, version control, and so forth, just with encryption.

I don’t know where the idea of git filters came from exactly, but I’ve been researching what’s changed in the security space since I last spent time in it, and I’d encountered age along the way, including using age-encrypted files in public repositories, and the idea of using age with git filters to transparently encrypt content in a git repository appeared somewhere.

I quickly found git-crypt, and transcrypt, which looked like they would work, but for whatever reason I wanted to look at age, and continued until I found git-agecrypt.

At this point, I almost committed, but then I ran into a blog post about using the age command line directly as a filter, and I tried this out. And it worked-ish! Nice and simple use of an existing tool without much overhead. Easy to understand. No unnecessary dependencies.

This allowed me to start that journaling.

The issue

This worked-ish, as I said, and the -ish was a bit annoying - other checkouts of the repository would show the files as different from what is in the repository in git status.

Short version is that encrypting the same content twice won’t generate the same encrypted content - and in these cases git viewed the files on-disk as changed, and was re-processing them and thus encryting them, and the content was shown as different.

The solution

git-agecrypt had a solution for this - it maintains its own cache of hashed plaintext content to encrypted content, and could consult that and return the cached encrypted content instead if the plaintext didn’t change.

Again, probably easiest to move over to it?

I then realized that this copy wasn’t really even needed - git already has the content in the repository, and I could just fetch it. It would mean moving from the simple command line, though…

So I needed a name for the script (the most important thing, obviously), and I quickly created a wrapper around those commands. Then, updated the clean subcommand to query the git repository itself for HEAD’s copy of the file, and reuse the encrypted content if the plaintext hasn’t changed.

Scaling

Git appears to do a good job avoiding doing work it doesn’t need to do. So, running strace -f -e trace=%process git diff (and similar operations) shows that it only ends up calling shroudage for changed content. I wasn’t exactly planning on doing this with tens of thousands of files, but even a hundred or two files will eventually be slow if there’s some operations run on all of them.

Anyway, nice to know it’ll support a few years of daily journaling.

Limitations and concerns

Probably the biggest limitation right now is that it requires vigilance to validate it is operating the way you expect, and the commands to do these are a little esoteric. Feeling like something might just silently fail to run and suddenly there’s plaintext in the repository isn’t all that reassuring.

Maybe something like a pre-commit hook would help do make this less fraught? And maybe some check target that actually stages a change to validate things are working?