io-util

Utility modules, a no-dep JSON module, and a fixed-width external (disk) table module.


base64-32

A base-64 variant for encoding 32-byte values as text. Its present use case is for recording SHA-256 values in relational databases. This uses the URL / filename-friendly character-set specified in §5 of RFC 4648 but takes liberties (discussed tho not condoned in §3.2) for specifying boundary conditions in a way that does not require padding for our use case.

Format

As usual the encoding consists of mapping a sequence of 8-bit bytes to a sequence of 6-bit data. We have 43 of these 6-bit sequences, with the first 6-bit sequence initialized as follows:

Consider a protocol in which we prepend every 32-byte sequence with an extra 0 byte: this 33-byte sequence would map to 44 base64 characters exactly. The first character in every such base64 sequence would be redundant however, since its value would be identically zero. Our protocol, then, requires that we drop this leftmost zero.

More concretely consider a queue of bits added to on the right in chunks of 8 (byte) and consumed on the left in chunks of 6 (base 64). The initialization, then, consists of filling this bit-queue with 2 zero bits (followed by 8 bits from the first byte).

One-to-One

Implementation-wise, the first 2 bits could be ignored (i.e. set them to zero even if the base64 character read does not.) However, we require padding bits be zero in order maintain a one-to-one mapping.

Pros & Cons

Compiled here to inform possible adoption.

Pros

  • Compact. Nearly 50% savings compared to hex.
  • URL friendly. Does not need URL encoding.

Cons

  • Broken Doubleclick Selection. Doubleclicking selects the whole number as one word if it’s all alphanumeric; the ‘-‘ character is not alphanumeric, so it breaks the selection.
  • Not widely known.

Remarks

See also Base58Check encoding.