Some notes on building an immutable OS on commodity hardware

A customer needed an OS image for a digital signage project, basically a display in a strange format with a built-in Celeron PC. Content to be displayed comes from a web server and is displayed by Chrome in kiosk mode, OS basis is Ubuntu because that's what the hardware vendor officially supports.

What I encountered building this was a number of uncharted areas on some key infrastructure components -- in the sense that documentation gets rather sparse at the level I was integrating stuff. And on the level that distribution maintainers, people writing installers or great tools like GRML, work all the time. Hat's off to them.

Overall design

The OS is packaged as SquashFS images, and two of those are installed on the machines -- the idea is, like with some embedded appliances, to update image #2 while running #1, and then boot over to the new one.

On top of the SquashFS is OverlayFS to make the whole thing writeable again -- because that means that we can run a pretty standard OS and don't need to go and configure, or worse, change and recompile a lot of software.

Booting is via UEFI and grub2, allowing a seemingly easy switch between the images.

Building the OS image

That is the easy part. mmdebstrap is a nice and fast piece of software, its hook scripts allow for all the needed customization, and after it's done, mksquashfs and voila.

The whole image is completely built from scratch every time, so you always get the newest security updates -- and hopefully, in conjunction with Ubuntu LTS, this will not be one of those OT projects which gets built once and still runs unchanged 10 years into the future (because I'm expecting this installation will live on way past my retirement age).

Configuring grub

Oh my. That's where the lack of documentation really starts to show.

Because what I want to build is a setup with two instances of grub + cfg in two separate directories on the EFI system partition, to be selected by the EFI boot manager (the thing built into the UEFI firmware).

Theres a few reasons for this uncommon setup:

You can then update grub in parallel with the OS image, and still have a fallback
You only have to write to the ESP if you're updating grub (and not on every image update), reducing the risk of a filesystem corruption
You can use the EFI "BootNext" functionality to switch from one image to the other, and only alter the permanent boot order when the new image has succesfully booted

You should just be able to drop a grubx64.efi into a directory on your EFI system partition, put a grub.cfg next to it, add an EFI boot entry into the EFI vars and you're good.

Well, no.

If you just do a grub-install from the chroot (the one we built with mmdebstrap), you'll get a grub.cfg which has the filesystem UUIDs and stuff from your build machine.

Also, you'll probably get the signed (for Secure Boot) grub variant installed, which looks for its grub.cfg in EFI/ubuntu (also, you have to look at the distro-patched sources to find this, debian/build-efi-images is the script to read to see what grub.cfg is included in the memdisk and what --prefix is set).

So, grub-mkstandalone is the tool to use. It even has a manpage, but no mention at all in the all-encompassing grub manual, so it's difficult to grasp where it fits in the grand scheme of things.

What the manpage doesn't explicitely tell you is this:

grub-mkstandalone  --format x86_64-efi -o "$TARGET/boot/efi/EFI/BOOT/BOOTX64.EFI" "boot/grub/grub.cfg=$TARGET.grub.cfg"

See that grub.cfg=grub.cfg at the end? That's how you include files into the memdisk, left hand side is the path on the memdisk, right hand path is the source file.

So that grub.cfg gets run first, when the standalone image starts. Standalone, BTW, means it is grub + the memdisk with all of the modules and fonts and whatnot, so you need no filesystem access to run it.

This first grub.cfg in the memdisk ended up like this:

set prefix=(memdisk)/boot/grub
insmod part_gpt
if ! regexp '^(hd\d+,gpt\d+)/' $cmdpath; then
  if ! search --file --set=esp /efi/Partition-A/grub.cfg; then
    search --file --set=esp /efi/Partition-B/grub.cfg
  fi
  regexp -s 1:path '\)/(.*)$' $cmdpath
  set cmdpath='('${esp}')/'${path}
fi
configfile $cmdpath/grub.cfg

Because there is a grub variable $cmdpath which is supposed to contain the path where grub was loaded from. Except on this specific machine's firmware, where it is (hd0)/efi/Partition-A/ instead of the correct (hd0,gpt1)/.... That's what the whole if block with the regexes is about -- detect that case, search for the correct device and partition, and put it all back together.

Then, the configfile command just loads the grub.cfg from the correct directory on the ESP.

This second grub.cfg then also has some special handling, because I want to boot from a specific partition and not a specific filesystem instance (like most standard grub configs do, search -fs-uuid ...), which is a good thing, because SquashFS has neither a filesystem UUID nor a filesystem label (grub can search for both of those...).

And the partition, we're talking GPT here, has a UUID of it's own, right? Well, but grub has no search --part-uuid, it turns out, but what it has is a probe --part-uuid which lets you query the UUID of a given partition. At least it has shell-like loops, so we build our own search command:

set prefix=(memdisk)/boot/grub
insmod part_gpt
insmod squash4
insmod probe
insmod linux
insmod linuxefi
for disk in hd0 hd1 hd2; do
  for part in gpt2 gpt3; do
    probe --set puuid --part-uuid (${disk},${part})
    if [ "${puuid}" == "DISPLAY_PARTITION_UUID" ]; then
      set root=(${disk},${part})
      break 2
    fi
  done
done

set prefix=($root)'/boot/grub'
linux /boot/vmlinuz root=/dev/disk/by-partuuid/DISPLAY_PARTITION_UUID rw net.ifnames=0 quiet splash
initrd /boot/initrd.img
boot

You also need those linux and linuxefi modules loaded explicitely, because they're autoloaded from $prefix/..., which is no longer pointing to the memdisk when the linux and initrd commands run, BTW.

That DISPLAY_PARTITION_UUID is just a placeholder, which is replaced via sed with the real UUID at the time when the image and ESP contents is actually installed -- because only at that time we know which partition they go into. Also, I currently have those partition UUIDs set to fixed values.

The above also gives some ugly (hd0,gpt3) not found-style messages, but I'll deal with them later. Or just set the foreground color to the background color, we'll see.

Secure Boot

In this setup, the distribution-signed grub binaries with the shim won't help us, so we'd have to go with a Machine Owner's Key or MOK and sign our grub images with that. Then we could also use grub's GPG signature functionality to sign all of grub's components including grub.cfg (which the distribution-signed grub doesn't give us) plus the initrd (also only achievable this way, I think), and probably with dm-verity we could get a chain of trust from the firmware up to the running OS image.

But that involves a lot of key management, the decision about this hasn't been made yet.

X11 and Chrome

There's a lot of documentation and How-Tos about how to run a kiosk-mode web browser on Raspberry Pi, but all I've found basically adds autologin and auto-launch to XFCE or whatever the desktop enviromment on the Pi is.

Since my goal is to minimize the installed software packages, I took another approach. I first naively tried to just launch X (directly, or via xinit), but that didn't go anywhere. So the solution is lightdm and lightdm-autologin-greeter, which does just what it says and needs minimal configuration:

[Seat:*]
autologin-user=username

# put any session from /usr/share/xsessions (strip .desktop from the file names there)
# here, if you want to run any other session than x-session-manager
autologin-session=chrome-session

chrome-session.desktop then is just an entry that starts, wait for it, ratpoison, because...

Weird display geometries

So, this signage device's display, as seen from the X server, is just a full HD LCD with the usual 1920x1080 resolution.

Physically, it is about 1/3 of that -- 1920 pixels wide, but only 357 pixels high.

The whole device is also mounted upside-down when installed in the field, but that is easily fixed with Option "Rotate" "inverted" in xorg.conf. Of course it also means that what is visible is now the lower 1/3rd of the 1080 virtual lines...

I've written my share of Xorg.conf files back in the day, heck, I even wrote a driver for a display once. But I haven't studied all of X's man pages as thoroughly as in the past days when I tried to get X to reduce the usable area on a monitor...

Just setting the virtual screen size doesn't work because then it doesn't fit onto the detected monitors anymore and X fails to start.

Modelines don't work, because at some point the display just doesn't seem to sync anymore as you approach the correct number of lines.

What also didn't work for me is the (again undocumented) xrandr --setmonitor which should have allowed to virtually split the screen into multiple virtual monitors.

In the end, I went for a tiling window manager. This seems to be a space for people's pet projects (more power to them!), but in the end I found ratpoison which has a simple config syntax and doesn't require you to write LUA or even Haskell just to configure the thing... And the config is quite simple:

# be quiet and invisible:
set startupmessage 0
set border 0
set framemsgwait -1
# split at 2/3, lower portion is visible when inverted
vsplit 724
focus
# when the child exits, we do too, so X can be restarted:
addhook deletewindow quit
# run it!
exec google-chrome-wrapper

What this does is to split the display at line 724, focus to the lower area, and then start Chrome there.

Chrome in kiosk mode actually does not try and cover the whole screen, but just what the WM gives it, so this all works out fine.