edit: everything sucks. i just mounted usb for /home and /var, "classic" sysv-style. boom done.

* overlay is a PITA; could probably be made to work but zomg is complex. Better if you don't actually want to preserve anything (like, put upperdir on tmpfs)

* exotic filesystems don't really help

* with non-SD stuff (home, var) off-SD, one could easily-enough just remount ro, and remount rw during upgrades/updates.

problem: raspberry pi needs an SD card to boot* and SD cards aren't great for repeated writes. Eventually they die and it sucks.

Raspberry Pi OS now has an overlay function which can (sort of) let you use a read-only root on the sd card. I'm not using rpiOS tho, because ZFS is happier on vanilla 64-bit Ubuntu.

Ubuntu has an overlay option, but it's kinda a one-time deal; breaking out of it is *hard*.

What I really need:

  • Something VERY easy to manage. Like if I forget it's there it should still work
  • Something that limits or eliminates writing to the DS card
  • but I should be able to write to the card e.g. for updates

So here's what *I* and doing:

  • remount /boot/firmware as read-only (a FAT FS for the kernel and initramfs)
  • boot as usual on the rw root fs, mounted on the SD card
  • in runlevel3, remount high-usage dirs as overlay; use the local USB storage for the "upper"
  • provide a "sync" option which can rsync stuff from upper -> lower, e.g. following an apt upgrade
    • implied: /lower is remounted (mount --bind) in another part of the VFS

Gory details / notes for future self:

  • mount -t overlay overlay -o lowerdir=/home,upperdir=/mnt/overlay/home/upper,workdir=/mnt/overlay/home/work /home
  • # mount --bind /home /mnt/overlay/home/lower # does not work; or, only works if /home is a filesystem (not a subdir). If home is just part of a larger fs, use mount --bind / /mnt/overlay/lower/root and be clever about which of those are mucked-with
  • rsync -axXp /mnt/overlay/home/upper /mnt/overlay/lower/root/home # note that if you delete a file in the overlay it is NOT deleted in the lower; therefore lower could get ugly (leftover files)
  • overlay doesn't work with zfs (no upper or work); ext4 works OK. I use LVM + 4 smallish partitions to host raid1 volumes for /var, /home
  • /tmp can be tmpfs; 1GB is *plenty* on an 8GB rpi
  • most of the rest of the VFS is relatively static

Stratus is working really well. Phase II: use it for IPC (pub/sub store-n-forward)

Findings:

  • HTTPS continues to be nontrivial
    • need to push on this a little bit harder just to be sure
    • no hard requirement tho, so maybe screw it?
  • client-side caching is super effective; ~ 1kb cache stores 10 of everything (incl Strings), ~ 10% performance cost vs. uncached access to int32
  • Shared per-app secret is OK. Functionally similar to API key / API secret. Write requests are signed with the shared secret, API key is essentially the per-device GUID. Requires no per-device configurations in firmware.
  • Not attempted: a "configurator" program which wold include all GUIDs + per-device private keys, burn the key into EEPROM. Not clear whether this is required or even meaningful...
  • Timing doesn't (cannot) matter. Specifically, race conditions would make fine-grained sequence-of-publication difficult to enforce. Therefore it probably doesn't (shouldn't) matter in which order subscriptions are replayed

Concerns:

  • writes might be pretty slow, should look into caching / batching
    • publish() queue, flush with update()
  • no real threading available for asynchronous updates
  • pub/sub pattern is attractive, but (because threading) the sub is in the foreground during update(). pub might be queued or synchronous.

Core functionality / API:

  • subscribe(stream, callback, scope=PRIVATE, limit=none, reset=false)
    • for each event: callback(stream, data)
    • if specified, will re-send events up to limit events (oldest -> newest)
    • should be able to reset his access counter
    • should also be able to unsubscribe
      • a client-side activity; server doesn't care
  • publish(stream, data, scope=PRIVATE, ttl=SHORTISH, queue=false)
    • publishes String data to stream
      • this may trigger a subscribed callback, synchronously
    • if specified, can queue data for the next update()
  • update()
    • updates the configuration (accessed via get())
    • sends any queued data
    • downloads any subscription data
      • may trigger callbacks, synchronously
  • maybeUpdate(interval=get("REFRESH INTERVAL"))
    • helper function, trivial
    • if interval seconds have passed, update()

Scope is just a channel, identified with a String. For simplicity any Stratus node can subscribe to any channels, but one per subscribe() call (assuming acl permits such)

Publishing to a PRIVATE (unreadable) scope with a very long TTL is just a key/value store.

HTTP/1.1 & keepAlive (persistent TCP) would improve performance measurably. Does that matter?

  • annoying to implement (TCP client & socket server)
  • Do we care if the MCU pauses for a ~ second during an update?
  • Probably decide when HTTPClient stops working out (e.g. very long timeouts or something)

IOT MQ table would require:

  • GUID of the writer (ownership)
  • Scope (PRIVATE or other keyword)
  • key & value (both String, which can be interpreted later to whatever)
  • TTL
  • timestamp

Access control? Stratus MQ has a name (URL) and a secret, access would also require a GUID. Do we need to further grant access by GUID (and therefore maintain a table of queue, guid, secret, approve/deny) ? -> yes, this allows for publish-only things (data sources) or subscribe-only (consumers). Implementation is (was) straightforward

Implementation details:

  • publish: action=publish, fields: key, value, ttl, scope, IOTMQ, GUID, signature
  • subscribe: action=subscribe, fields: key, scope, IOTMQ, GUID, signature

Access records (approve & reject) can be stored in the queue (private scope to the server), with a longish TTL. Rejects would be consumed for logging purposes or for an admin console ("approve this access"). Approved access can be consumed to track subscription (in which case the TTL can be pretty short).

Do we need a TTL based on message depth? "store N messages for at most M seconds" No current use case.

Do we need default TTLs for queues (either #messages, or time-per-message) ? #messages over time? no use case.

After careful consideration I've decided that stratus are shitty clouds. They're grey and amorphous, can be low-flying, and are aesthetically unpleasing. In a not-unrelated vein I am pondering a thing that might serve as an IOT-friendly key:value datastore.

High-level requirements / goals:

  1. Easy to add to existing code. Minimal boilerplate, minimal RAM/CPU overhead.
  2. reasonably secure; an endpoint (IOT device) should be reasonably confident that it's getting unmolested data
    1. note that this shouldn't protect against MITM or anything sophisticated
    2. note also that you shouldn't use this for anything important like heart monitors
  3. dead-ass simple. IOT devices suck and have limited CPU/RAM/connectivity resources
    1. unfortunately this pretty much eliminates active crypto
    2. hard-coded & configurable API keys would be just fine tho
    3. shared-secret check bits might work ok (actually they'd be great with a decent hashing algo)
  4. massively tolerant of failure
  5. self-configurable / zero bootstrap

So here's a starting place, read-only:

  • some ideas are stupid, but I don't want to forget that they're stupid
  • hosted via simple static/flat text file with key: value\n pairs
    • build static config files offline, signed
  • simple strnpos (or similar) to identify key and end-of-key "\n", substr() to extract it
  • "API key" as an n-bit hex value
    • this is nontrivial. IOT code is distributed en masse, per-device configs are infeasible
    • each host can have a GUID (MAC address or derived from)
  • host the static text file in a web space under that key, making it hard to find
    • /df/0xDEADBEEF.txt
  • Use a key like "where my config should live". If you need to move it for any reason, just update that key and the thing would next pull from the new location. This key should mostly be self-referential tho (duh)
  • Use a key for the API key too, in case you have to migrate that for any reason
  • an IOT device should also check for its own GUID as a key, possibly indicating a new config location
    • to let one retroactively split off some clients, or per-client config specialization
  • Other self-referential config things should include update frequency and debug / logging levels

If done correctly one could debug production device(s), ship them, disable debugging, and later re-enable it -- without requiring a code push / OTA.

#include "stratus.h"
Stratus stratus("http://austindavid.com:80/df/df.txt", "secret key");
...
setup () {
// networking setup happens
stratus.update();
}

loop() {
static int variable = 1;
EVERY_N_SECONDS(stratus.get("refresh interval", 300)) {
stratus.update();
variable = stratus.get("variable", variable);
}
}

Doing a little better (potential TODO):

  • implement a pub-sub; URL -> MQ, GUID + secret for authorization & authentication
  • for more arbitrary-sized objects (beelobs), http get a single key / retrieve a single (biggish) value?
  • use the datastore -- http post a key + value pub
  • have clients pump a checkin value to indicate they last successfully read sub

I hope the ads aren't too annoying. I enabled what I HOPE will be unobtrusive ads across all platforms, to be in the non-content areas of the pages. If they get out of hand please This email address is being protected from spambots. You need JavaScript enabled to view it.. Your ad blockers should entirely hide them if needed.

I choose to run the ads to finance the site. My net hosting costs are on the order of $200/yr, and ad traffic roughly covers that -- with most volume to my virtual hat around Christmas time. Unfortunately the volume has tailed off in the last year so I'm exploring options for improving things -- higher-quality placement (and therefore higher revenue per ad) and more prominent placement, beyond the virtual hat.

One of my boys is homeschooled, and even before he did well with some warm-up / practice worksheets at home. I made randomized worksheets in Google Sheets that would let us print out the same types of problems, but with different values. Because he could memorize them (at least the earlier, simpler ones). These are targeted for a 7th grade program but likely apply to 6th and 8th grades.

I'm writing this because I spent too much time puzzling over it until I found a few gems. I probably won't use this IRL because it's clunky (and I think I have a way I prefer). but here's ACTUALLY how to write a JSON-serializable class, and why.

The official JSONEncoder docs say: "To use a custom JSONEncoder subclass (e.g. one that overrides the default() method to serialize additional types), specify it with the cls kwarg; otherwise JSONEncoder is used." Less-obvious (at least to me): JSONEncoder can *only* encode native Python types (list, dict, int, string, tuple ... that sort of thing); if you want to encode any other class, you *must* specify cls= in the dump/load methods, and you must provide a JSONEncoder subclass.

Specifically, json.dumps(data, cls=CustomClass) (where CustomClass is a subclass of JSONEncoder, with a default(self, o) method:

class SpecialDataClass(JSONEncoder):
    def default(self, obj):
        if isinstance(obj, self.__class__):
            return { 'data': obj.data } # or whatever serializable native type
        else:
            super().default(obj)

Serialize with: json.dumps(specialData, cls=SpecialDataClass)

This "works," but it broke down (for me) when I wanted to have a class stored within a dict, and I wanted to serialize that dict. Like, json.dumps(dict) fails if dict contains a special class. The class doing the serialization has to know what it's serializing (so it can specify that keyword), but I don't want that level of binding in my code. Generally this will fail if you have >1 special, unrelated classes to deserialize. Yes you can write a custom deserializing class which handles all of them (sort of a Moderator), but c'mon.

What I did instead (and what will probably change): I provided my to-be-serialized class with "serialize" and "deserialize" methods; they return or take simple JSON-friendly data. The enclosing class (PersistentDict) now accepts a "cls=" argument. New values in that dict would be instantiated in this class, and on read/write all values are pre-serialized or post-deserialized for sending down to json's dump/load methods.

Working on my cluster-backups I wanted an RPC-like mechanism for communication from client -> server, but I required decoupled operation of the clients and server. I wanted to "magically" serialize data of arbitrary length, simplify the logic (abstracting out checking for error conditions etc), but if possible maintain long-lived TCP connections between client and server.

Above all else, code using datagrams should be very natural, readable, and very easy to use.

A simple "echo" server:

from datagram import *

s = DatagramServer("localhost", 5005)

while True:
    with s.accept() as datagram:
        message = datagram.value().upper()
        datagram.send(message.upper())

The client:

from datagram import *

buffer = "a" * 1000

with Datagram(buffer, server="localhost", port=5005) as datagram:
    if datagram.send(): # or send(server="localhost", port=5000)
        print(datagram.receive())

The datagramwill evaluate True or False based on the most recent connection. if datagram is great. datagram.value() returns its deserialized contents (which is either what was most recently sent or received). len() and contains/in treats it like a list, and it can return iterables like for token in datagram or if token in datagram