John Millikinurn:uuid:74d01167-d832-4d46-b2ca-8022408f8f0f2023-04-18T02:53:51Zhttps://john-millikin.com/static/favicon.icoSoftware and articles by John Millikin.John Millikinhttps://john-millikin.comjohn@john-millikin.comCreating TUN/TAP interfaces in Linux2023-04-18T02:53:51Zurn:uuid:3e59f473-7ee7-48f7-a8ca-936de4156785<style type=text/css>tt{font-weight:700}</style><blog-article posted=2023-04-18T02:53:51Z><h1 slot=title>Creating TUN/TAP interfaces in Linux</h1><div slot=tableofcontents></div><div slot=summary><p>The basic approach to writing a TUN/TAP client (such as a VPN) for Linux is:</p><ol><li>Open the <tt>/dev/net/tun</tt> device as a file, which (once configured) will communicate network traffic to userspace.</li><li>Allocate (or bind) a virtual network interface with the file handle using <tt>ioctl(TUNSETIFF)</tt>.</li><li>Configure the network interface's address and link state.</li><li>Process network traffic in the userspace program.</li></ol><p>There's reasonably complete documentation about each step of this process, but I couldn't find a worked example that tied it all together. The following C program is intended to serve as a basic minimal TUN/TAP client.</p></div><blog-section><h2 slot=title>Steps 1-2: Allocating a TUN/TAP interface</h2><p>Opening a file is straightforward, so the important part of this function is the <tt>ioctl(TUNSETIFF)</tt> call. It's this call that creates the network interface, and there are two user-configurable fields:</p><ul><li>The <tt>ifr_name</tt> field contains the interface name, which may be specified by the caller. If unset (empty), then the kernel will assign a name such as <tt>tun0</tt> or <tt>tap0</tt>.</li><li>The <tt>ifr_flags</tt> field sets whether the create a TUN or TAP interface. TUN interfaces process IP packets, and TAP interfaces process Ethernet frames.</li></ul><p>The set of possible flags and their effects are documented at <a href=https://docs.kernel.org/networking/tuntap.html>Linux Networking Documentation » Universal TUN/TAP device driver</a>.</p><p>The interface name, if provided, must be less than <tt>IFNAMSIZ</tt> bytes. After the ioctl call returns, the <tt>ifr_name</tt> field can be inspected to see what name the interface was created with.</p><blog-code syntax=c><pre>
/* Copyright (c) John Millikin <john@john-millikin.com> */
/* SPDX-License-Identifier: 0BSD */
#define _POSIX_C_SOURCE 200809L
#include <errno.h>
#include <fcntl.h>
#include <linux/if.h>
#include <linux/if_tun.h>
#include <string.h>
#include <sys/ioctl.h>
#include <unistd.h>
int tuntap_connect(const char *iface_name, short flags, char *iface_name_out) {
int tuntap_fd, rc;
size_t iface_name_len;
struct ifreq setiff_request;
if (iface_name != NULL) {
iface_name_len = strlen(iface_name);
if (iface_name_len >= IFNAMSIZ) {
errno = EINVAL;
return -1;
}
}
tuntap_fd = open("/dev/net/tun", O_RDWR | O_CLOEXEC);
if (tuntap_fd == -1) {
return -1;
}
memset(&setiff_request, 0, sizeof setiff_request);
setiff_request.ifr_flags = flags;
if (iface_name != NULL) {
memcpy(setiff_request.ifr_name, iface_name, iface_name_len + 1);
}
rc = ioctl(tuntap_fd, TUNSETIFF, &setiff_request);
if (rc == -1) {
int ioctl_errno = errno;
close(tuntap_fd);
errno = ioctl_errno;
return -1;
}
if (iface_name_out != NULL) {
memcpy(iface_name_out, setiff_request.ifr_name, IFNAMSIZ);
}
return tuntap_fd;
}</pre></blog-code></blog-section><blog-section><h2 slot=title>Step 3: Configure the interface with Netlink</h2><p>At this point, most TUN/TAP examples I've found tell the user to configure the newly-created network interface by using the command line to run tools from <a href=https://wiki.linuxfoundation.org/networking/iproute2>iproute2</a>. In this post I will instead use the Linux kernel's native <a href=https://docs.kernel.org/userspace-api/netlink/intro.html>Netlink</a> subsystem.</p><p>Netlink can be thought of as a sort of RPC-ish request/response protocol, where messages are assembled manually from C structs. Besides the kernel docs linked above, the following manpages are useful for writing a Netlink client:</p><ul><li><a href=https://man7.org/linux/man-pages/man3/netlink.3.html>netlink(3)</a></li><li><a href=https://man7.org/linux/man-pages/man7/netlink.7.html>netlink(7)</a></li><li><a href=https://man7.org/linux/man-pages/man7/rtnetlink.7.html>rtnetlink(7)</a></li></ul><p>In this example we will be using the <tt>NETLINK_ROUTE</tt> mode to send <tt>RTM_NEWADDR</tt> and <tt>RTM_NEWLINK</tt> requests. Netlink error handling is a bit obtuse since it requires manual response handling, so I'm not going to bother with it for this example.</p><p>The first step is to open an <tt>AF_NETLINK</tt> socket by calling <tt>socket(AF_NETLINK)</tt>. I'm also calling <tt>bind()</tt>, which isn't strictly necessary but provides metadata useful to <tt>strace</tt><blog-footnote-ref>[<a href=#fn:1>1</a>]</blog-footnote-ref>.</p><blog-code syntax=c><pre>
/* Copyright (c) John Millikin <john@john-millikin.com> */
/* SPDX-License-Identifier: 0BSD */
#include <arpa/inet.h>
#include <linux/if.h>
#include <linux/netlink.h>
#include <linux/rtnetlink.h>
#include <net/if.h>
#include <stdint.h>
#include <string.h>
int netlink_connect() {
int netlink_fd, rc;
struct sockaddr_nl sockaddr;
netlink_fd = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, NETLINK_ROUTE);
if (netlink_fd == -1) {
return -1;
}
memset(&sockaddr, 0, sizeof sockaddr);
sockaddr.nl_family = AF_NETLINK;
rc = bind(netlink_fd, (struct sockaddr*) &sockaddr, sizeof sockaddr);
if (rc == -1) {
int bind_errno = errno;
close(netlink_fd);
errno = bind_errno;
return -1;
}
return netlink_fd;
}</pre></blog-code><p>The first Netlink command will be <tt>RTM_NEWADDR</tt>, which sets the address and prefix length (netmask) of the interface. I've only implemented IPv4 support for this example, but IPv6 is similar.</p><p>A Netlink request contains a header (<tt>struct nlmsghdr</tt>), message content (here that's a <tt>struct ifaddrmsg</tt>), and an optional list of key-value attributes. The set of necessary attributes isn't well documented, so I ran <tt>strace ip addr add</tt> and replicated its requests.</p><blog-code syntax=c><pre data-start=32>
int netlink_set_addr_ipv4(
int netlink_fd
, const char *iface_name
, const char *address
, uint8_t network_prefix_bits
) {
struct {
struct nlmsghdr header;
struct ifaddrmsg content;
char attributes_buf[64];
} request;
struct rtattr *request_attr;
size_t attributes_buf_avail = sizeof request.attributes_buf;
memset(&request, 0, sizeof request);
request.header.nlmsg_len = NLMSG_LENGTH(sizeof request.content);
request.header.nlmsg_flags = NLM_F_REQUEST | NLM_F_EXCL | NLM_F_CREATE;
request.header.nlmsg_type = RTM_NEWADDR;
request.content.ifa_index = if_nametoindex(iface_name);
request.content.ifa_family = AF_INET;
request.content.ifa_prefixlen = network_prefix_bits;
/* request.attributes[IFA_LOCAL] = address */
request_attr = IFA_RTA(&request.content);
request_attr->rta_type = IFA_LOCAL;
request_attr->rta_len = RTA_LENGTH(sizeof (struct in_addr));
request.header.nlmsg_len += request_attr->rta_len;
inet_pton(AF_INET, address, RTA_DATA(request_attr));
/* request.attributes[IFA_ADDRESS] = address */
request_attr = RTA_NEXT(request_attr, attributes_buf_avail);
request_attr->rta_type = IFA_ADDRESS;
request_attr->rta_len = RTA_LENGTH(sizeof (struct in_addr));
request.header.nlmsg_len += request_attr->rta_len;
inet_pton(AF_INET, address, RTA_DATA(request_attr));
if (send(netlink_fd, &request, request.header.nlmsg_len, 0) == -1) {
return -1;
}
return 0;
}</pre></blog-code><p>The second Netlink command uses <tt>RTM_NEWLINK</tt> to enable the interface. It's equivalent to running <tt>ip link set up</tt>.</p><blog-code syntax=c><pre data-start=75>
int netlink_link_up(int netlink_fd, const char *iface_name) {
struct {
struct nlmsghdr header;
struct ifinfomsg content;
} request;
memset(&request, 0, sizeof request);
request.header.nlmsg_len = NLMSG_LENGTH(sizeof request.content);
request.header.nlmsg_flags = NLM_F_REQUEST;
request.header.nlmsg_type = RTM_NEWLINK;
request.content.ifi_index = if_nametoindex(iface_name);
request.content.ifi_flags = IFF_UP;
request.content.ifi_change = 1;
if (send(netlink_fd, &request, request.header.nlmsg_len, 0) == -1) {
return -1;
}
return 0;
}</pre></blog-code><p>At this point the TUN/TAP interface has been fully configured and is just waiting for our process to read/write network data.</p></blog-section><blog-section><h2 slot=title>Step 4: Process network traffic</h2><p>For this example I'll be writing a very simple <tt>tun2udp</tt> binary, which forwards IPv4 packets to/from UDP on localhost. Compile it with GCC or Clang:</p><blog-code syntax=commands><pre>
gcc -o tun2udp tun2udp.c
send_port=12345
recv_port=12346
sudo ./tun2udp 10.11.12.0/24 $send_port $recv_port</pre></blog-code><p></p><blog-code syntax=c><pre>
/* Copyright (c) John Millikin <john@john-millikin.com> */
/* SPDX-License-Identifier: 0BSD */
#include <poll.h>
#include <stdio.h>
#include <stdlib.h>
int run_proxy(int tuntap_fd, int send_fd, int recv_fd) {
struct pollfd poll_fds[2];
char recv_buf[UINT16_MAX];
poll_fds[0].fd = tuntap_fd;
poll_fds[0].events = POLLIN;
poll_fds[1].fd = recv_fd;
poll_fds[1].events = POLLIN;
while (1) {
if (poll(poll_fds, 2, -1) == -1) {
return -1;
}
if ((poll_fds[0].revents & POLLIN) != 0) {
ssize_t count = read(tuntap_fd, recv_buf, UINT16_MAX);
if (count < 0) {
return -1;
}
send(send_fd, recv_buf, count, 0);
}
if ((poll_fds[1].revents & POLLIN) != 0) {
ssize_t count = recv(recv_fd, recv_buf, UINT16_MAX, 0);
if (count < 0) {
return -1;
}
if (write(tuntap_fd, recv_buf, count) == -1) {
return -1;
}
}
}
return 0;
}
int bind_localhost_udp(uint16_t port) {
int fd, rc;
struct sockaddr_in addr;
fd = socket(AF_INET, SOCK_DGRAM, 0);
if (fd == -1) {
return -1;
}
memset(&addr, 0, sizeof addr);
addr.sin_family = AF_INET;
addr.sin_port = htons(port);
addr.sin_addr.s_addr = inet_addr("127.0.0.1");
rc = connect(fd, (struct sockaddr*) &addr, sizeof addr);
if (rc == -1) {
int connect_errno = errno;
close(fd);
errno = connect_errno;
return -1;
}
return fd;
}
int connect_localhost_udp(uint16_t port) {
int fd, rc;
struct sockaddr_in addr;
fd = socket(AF_INET, SOCK_DGRAM, 0);
if (fd == -1) {
return -1;
}
memset(&addr, 0, sizeof addr);
addr.sin_family = AF_INET;
addr.sin_port = htons(port);
addr.sin_addr.s_addr = inet_addr("127.0.0.1");
rc = bind(fd, (struct sockaddr*) &addr, sizeof addr);
if (rc == -1) {
int bind_errno = errno;
close(fd);
errno = bind_errno;
return -1;
}
return fd;
}</pre></blog-code><p>The rest of the code is just argument parsing. For the TUN interface address it accepts an IPv4 dotted quad, with an optional netmask (defaulting to <tt>/32</tt>).</p><blog-code syntax=c><pre data-start=93>
int split_address(char *address_str, uint8_t *network_prefix_bits) {
char *prefix_sep, *prefix_str;
prefix_sep = strchr(address_str, '/');
if (prefix_sep == NULL) {
prefix_str = NULL;
*network_prefix_bits = 32;
} else {
*prefix_sep = 0;
prefix_str = prefix_sep + 1;
}
if (inet_addr(address_str) == INADDR_NONE) {
return -1;
}
if (prefix_str != NULL) {
char *prefix_extra;
long prefix_raw = strtol(prefix_str, &prefix_extra, 10);
if (prefix_raw < 0 || prefix_raw > 32) {
*prefix_sep = '/';
return -1;
}
if (*prefix_extra != 0) {
*prefix_sep = '/';
return -1;
}
*network_prefix_bits = prefix_raw;
}
return 0;
}
int parse_port(char *port_str, uint16_t *port) {
char *extra;
long raw = strtol(port_str, &extra, 10);
if (raw < 0 || raw > UINT16_MAX) {
return -1;
}
if (*extra != 0) {
return -1;
}
*port = raw;
return 0;
}</pre></blog-code><p>Finally we get to <tt>main()</tt> and can glue everything together. Copy (or <tt>#include</tt>) the TUN/TAP and Netlink code from earlier sections. The TUN/TAP flags are hardcoded to <tt>IFF_TUN | IFF_NO_PI</tt>, which means it will send/receive IP packets with no additional framing. The interface name will be assigned by the kernel.</p><blog-code syntax=c><pre data-start=141>
int main(int argc, char **argv) {
int tuntap_fd, netlink_fd, send_fd, recv_fd, rc;
char iface_name[IFNAMSIZ];
char *address;
uint8_t prefix_bits;
uint16_t send_port, recv_port;
if (argc < 4) {
fprintf(stderr, "Usage: %s <address> <send-port> <recv-port>\n", argv[0]);
return 1;
}
address = argv[1];
if (split_address(address, &prefix_bits) == -1) {
fprintf(stderr, "Invalid address \"%s\"\n", argv[1]);
return 1;
}
if (parse_port(argv[2], &send_port) == -1) {
fprintf(stderr, "Invalid port \"%s\"\n", argv[2]);
return 1;
}
if (parse_port(argv[3], &recv_port) == -1) {
fprintf(stderr, "Invalid port \"%s\"\n", argv[3]);
return 1;
}
send_fd = bind_localhost_udp(send_port);
if (send_fd == -1) {
fprintf(stderr, "bind_localhost_udp(%u): ", send_port);
perror(NULL);
return 1;
}
recv_fd = connect_localhost_udp(recv_port);
if (recv_fd == -1) {
fprintf(stderr, "connect_localhost_udp(%u): ", recv_port);
perror(NULL);
return 1;
}
tuntap_fd = tuntap_connect(NULL, IFF_TUN | IFF_NO_PI, iface_name);
if (tuntap_fd == -1) {
perror("tuntap_connect");
return 1;
}
netlink_fd = netlink_connect();
if (netlink_fd == -1) {
perror("netlink_connect");
return 1;
}
rc = netlink_set_addr_ipv4(netlink_fd, iface_name, address, prefix_bits);
if (rc == -1) {
perror("netlink_set_addr_ipv4");
return 1;
}
rc = netlink_link_up(netlink_fd, iface_name);
if (rc == -1) {
perror("netlink_link_up");
return 1;
}
close(netlink_fd);
if (run_proxy(tuntap_fd, send_fd, recv_fd) == -1) {
perror("run_proxy");
return 1;
}
return 0;
}</pre></blog-code></blog-section><blog-footnotes><hr><ol><li id=fn:1><p>If the Netlink socket has <tt>bind()</tt> called on it, then the traced <tt>RTM_NEWADDR</tt> command is formatted like this:</p><blog-code><pre>
sendto(6, [
{
nlmsg_len=40,
nlmsg_type=RTM_NEWADDR,
nlmsg_flags=NLM_F_REQUEST|NLM_F_EXCL|NLM_F_CREATE,
nlmsg_seq=0
nlmsg_pid=0
}, {
ifa_family=AF_INET,
ifa_prefixlen=24,
ifa_flags=0,
ifa_scope=RT_SCOPE_UNIVERSE,
ifa_index=if_nametoindex("tun0")
}, [
[{nla_len=8, nla_type=IFA_LOCAL}, inet_addr("10.10.0.1")],
[{nla_len=8, nla_type=IFA_ADDRESS}, inet_addr("10.10.0.1")]
]
], 40, 0, NULL, 0) = 40</pre></blog-code><p>If the socket does not have <tt>bind()</tt> called on it, then the same command is formatted like this:</p><blog-code><pre>
sendto(6, [
{
nlmsg_len=40,
nlmsg_type=0x14 /* NLMSG_??? */,
nlmsg_flags=NLM_F_REQUEST|0x600,
nlmsg_seq=0,
nlmsg_pid=0
}, "\x02\x18\x00\x00\x55\x00\x00\x00\x08\x00\x02\x00\x0a\x0a\x00\x01\x08\x00\x01\x00\x0a\x0a\x00\x01"
], 40, 0, NULL, 0) = 40</pre></blog-code></li></ol></blog-footnotes></blog-article>2023-04-18T02:53:51ZRunning SunOS 4 in QEMU (SPARC)2023-04-13T00:50:27Zurn:uuid:c57d8c8b-de97-4434-aaf1-658555595104<style type=text/css>tt{font-weight:700}</style><blog-article posted=2023-04-13T00:50:27Z><h1 slot=title>Running SunOS 4 in QEMU (SPARC)</h1><div slot=summary><p><a href=https://en.wikipedia.org/wiki/SunOS>SunOS</a> is a historical UNIX operating system widely used from the mid 80s into the early/mid 90s. Older versions of QEMU struggled to emulate the SPARC platform that SunOS ran on, but QEMU v7.2 supports SPARC well enough to install and run SunOS without any unusual workarounds.</p></div><blog-section><h2 slot=title>Installation media</h2><p>The installation CD-ROM for SunOS 4.1.4 (also branded Solaris 1.1.2) is available on the Internet Archive:</p><ul><li><a href=https://archive.org/details/solaris112sparc>Solaris v1.1.2 SPARC (704-4662-10)</a> (uploaded 2019-09-23)</li></ul><p>You might also want a dump of the SparcStation 5 boot PROM. QEMU's bundled OpenBIOS is capable of booting SunOS, but the original PROM is useful for people who want a more authentic emulation experience.</p><ul><li><a href=https://github.com/andarazoroflove/sparc/raw/master/ss5.bin>https://github.com/andarazoroflove/sparc/raw/master/ss5.bin</a></li><li><a href=http://vtda.org/bits/ROMs/Sun/ss5.bin>http://vtda.org/bits/ROMs/Sun/ss5.bin</a></li></ul><blog-code syntax=commands><pre>
shasum -a 256 *
# 559c8455918029ffdaaf9890caf9f791c3a3604d2f2158793751b770593c0a3c SunOS-v4.1.4.iso
# e7f40845504c65f4011278aa3e97a9810aa36775e6c199b715839fbc25eec45e ss5.bin</pre></blog-code></blog-section><blog-section><h2 slot=title>Preparing the SunOS mini-root</h2><p>The first stage of the SunOS installation process is to prepare a minimal bootable environment.</p><p>SunOS is designed to run on Sun's hardware, so it's relatively fussy about device layout and configuration compared to an OS intended for consumer hardware. The <a href=https://docs.oracle.com/cd/E19127-01/sparc5.ws/801-6396-11/801-6396-11.pdf>SPARCstation 5 Service Manual</a> is a useful reference.</p><ul><li>The internal HDD must have SCSI target 3, and the internal CD-ROM must have SCSI target 6.</li><li>SunOS expects the CD-ROM to have a physical block size of 512 bytes<blog-footnote-ref>[<a href=#fn:1>1</a>]</blog-footnote-ref>.</li><li>Although a real SPARCstation 5 supports up to 256 MiB of RAM, we'll be giving it only 64 MiB to simplify the installation process<blog-footnote-ref>[<a href=#fn:2>2</a>]</blog-footnote-ref></li></ul><p>Leave off the <tt>-bios ss5.bin</tt> line to use QEMU's built-in OpenBIOS.</p><blog-code syntax=commands><pre>
qemu-system-sparc -version
# QEMU emulator version 7.2.1
# Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers
qemu-img create -f qcow2 sunos-hdd.img 2G
# Formatting 'sunos-hdd.img', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2147483648 lazy_refcounts=off refcount_bits=16
qemu-system-sparc \
# -machine SS-5 \
# -m 64 \
# -bios ss5.bin \
# -drive file=sunos-hdd.img,bus=0,unit=3,media=disk \
# -device scsi-cd,channel=0,scsi-id=6,id=cdrom,drive=cdrom,physical_block_size=512 \
# -drive if=none,file=SunOS-v4.1.4.iso,media=cdrom,id=cdrom</pre></blog-code><p>Once at the firmware prompt, type <tt>boot cdrom</tt> (or <tt>boot cdrom:d</tt> for OpenBIOS).</p><div><img src=https://john-millikin.com/by-sha256/f930483c3271679e40e7bffada5f4646116db225bc417d4812a6ef875af0a977/first-boot-ss5.png style=max-width:512px;margin:1em></div><div><img src=https://john-millikin.com/by-sha256/98b29c99084ae8449ff4eefbcf32311fcfa6e552bde34a7f09c76fa1a16bf64f/first-boot-install-prompt.png style=max-width:512px;margin:1em></div><p>In the disk formatter, select disk type 13 (<tt>SUN2.1G</tt>), write the label to disk, then quit the formatting utility.</p><div><img src=https://john-millikin.com/by-sha256/a30764a4ca8e8bbc4a5e0b288db6f0142382d63cbdf6eb0544d0fd3945b8c2b2/first-boot-format-prompt.png style=max-width:512px;margin:1em></div><div><img src=https://john-millikin.com/by-sha256/ef9f1fb5d3f80f8bb4289f48bc07be18033c9e71b39ed9f46ac062cf6f7f3f12/first-boot-format-done.png style=max-width:512px;margin:1em></div><p>The installation script will prep the disk for the main installer, then prompt for a reboot.</p><p>If using OpenBIOS, the VM might not boot into mini-root by itself. Type <tt>boot disk0:b -sw</tt> at the firmware prompt to continue.</p></blog-section><blog-section><h2 slot=title>Installing SunOS itself</h2><p>After rebooting, you should see some logspam and a root prompt. Run <tt>suninstall</tt> to continue the installation process.</p><div><img src=https://john-millikin.com/by-sha256/f3585168e7aeb2ee9b693b31e1d0de050d503cd760f8035e42de0cf89a70d41c/second-boot-root-prompt.png style=max-width:512px;margin:1em></div><p>There's no complicated decisions to make here, so I just went with the quick install of the full system.</p><div><img src=https://john-millikin.com/by-sha256/393e961d7bb01c02dd152de851aa78978d185f66083ba9b1ee4dfcf92a8aa76a/suninstall-standard-installations.png style=max-width:512px;margin:1em></div><p>After the installation is finished the VM will reboot and you'll be back at the firmware prompt. Type <tt>boot disk</tt> (or <tt>boot disk3:a</tt> for OpenBIOS) to boot.</p><p>In its original environment, a new SunOS workstation would have received its network configuration from <a href=https://en.wikipedia.org/wiki/Reverse_Address_Resolution_Protocol>RARP</a> (the predecessor of DHCP) and <a href=https://en.wikipedia.org/wiki/Network_Information_Service>NIS</a> (sort of a proto-LDAP). Since we don't have a lab of 100 workstations to provision, manual data entry is fine.</p><div><img src=https://john-millikin.com/by-sha256/037a765488c67be16f8cb64c5575e449ac394867d59e21b2ffbf82886575b25e/system-setup.png style=max-width:512px;margin:1em></div><div><img src=https://john-millikin.com/by-sha256/0b4bcf9df79e25fb89551d9234442ac64273ea13ecc4fd14f1dc6e00edd45899/manual-setup.png style=max-width:512px;margin:1em></div><p>The default IP address for QEMU's usermode networking is 10.0.2.15, for which SunOS will assign a netmask of 0xFF000000<blog-footnote-ref>[<a href=#fn:3>3</a>]</blog-footnote-ref>.</p><div><img src=https://john-millikin.com/by-sha256/3fbf9a8bb6f06ebac21fc433ec382219e6f84abf889dd44ea18fbe343f775183/network-setup.png style=max-width:512px;margin:1em></div><p><q>A password should be six to eight characters long</q> 🔒.</p><div><img src=https://john-millikin.com/by-sha256/9f644a2579a79206fb8848c748ab0b6331a00e26d3fbd5edced4a41684897e85/root-password.png style=max-width:512px;margin:1em></div><p>Almost done. The last step is to configure the gateway router, and then the VM will have working networking. Just log in as root, set the gateway address, and write it to <tt>/etc/defaultrouter</tt> so it'll persist across reboots.</p><div><img src=https://john-millikin.com/by-sha256/f332958b7f1abf3eb611ffa455a683ae47d366143c4c14bd1fd65c077468c274/default-route-ping.png style=max-width:512px;margin:1em></div><p>Log in as a non-root user to launch the native graphical UI of SunOS, OpenWindows.</p><div><img src=https://john-millikin.com/by-sha256/d21350fcf093cda00d40f26bfa28ae0cdc63c0741e4b77d93538dd1f23603d48/openwindows.png style=max-width:512px;margin:1em></div></blog-section><blog-section><h2 slot=title>Installing a web browser (Netscape)</h2><p>The final version of SunOS was released when the Web was in its infancy, and therefore does not have a bundled web browser (or any sort of HTTP-related utilities). Luckily for us SunOS/SPARC was a popular platform and Netscape published binaries for it. Actually <i>finding</i> those binaries was a bit of a slog, but I eventually located a copy of Netscape Communicator v4.61 on the delightfully retro page <a href=http://www.floodgap.com/retrobits/solace/>The Solbourne Solace @ Floodgap Retrobits</a> (<a href=https://web.archive.org/web/20230325091337/http://www.floodgap.com/retrobits/solace/>archive</a>).</p><p>In the least surprising twist ever, the tarball itself is only available via Gopher, at <a href=gopher://gopher.floodgap.com/9/archive/sunos-4-solbourne-os-mp/communicator-v461-us.sparc-sun-sunos4.1.3_U1.tar.gz>gopher://gopher.floodgap.com/9/archive/sunos-4-solbourne-os-mp/communicator-v461-us.sparc-sun-sunos4.1.3_U1.tar.gz</a>. I have mirrored it to archive.org at <a href=https://archive.org/details/netscape-communicator-v461-us.sparc-sun-sunos4.1.3_U1>Netscape Communicator 4.61 [SunOS 4.1.3]</a>.</p><p>In any case, once you've obtained a copy of the Netscape installation package you'll find that it needs gzip, which at the time was a GNU-specific technology. I recommend following the manual installation instructions from <tt>README.install</tt> on your host machine to produce a plain tarball.</p><blog-code syntax=commands><pre>
shasum -a 256 communicator-v461-us.sparc-sun-sunos4.1.3_U1.tar.gz
# c667feb3a73721872d60ffd4aab24e39be8d5a48761397b4dd2184b4dd2bb5de communicator-v461-us.sparc-sun-sunos4.1.3_U1.tar.gz
tar -xf communicator-v461-us.sparc-sun-sunos4.1.3_U1.tar.gz
cd communicator-v461.sparc-sun-sunos4.1.3_U1/
mkdir -p netscape-v4.61/java/classes
mv *.nif netscape-v4.61/
mv *.jar netscape-v4.61/java/classes/
cd netscape-v4.61/
gzip -dc netscape-v461.nif | tar -xf -
gzip -dc nethelp-v461.nif | tar -xf -
gzip -dc spellchk-v461.nif | tar -xf -
cd ..
tar -cf ../netscape-v4.61.tar netscape-v4.61/</pre></blog-code><p>Getting that tarball into the VM is also a little tricky due to the lack of common network protocols between 1994 and 2023. I ended up writing a helper (<a href=#recv-c>recv.c</a>) that will connect to a TCP socket and stream any data it receives to a file.</p><blog-code syntax=commands><pre>
# # host (Linux, BSD, and most others)
nc -Nl 127.0.0.1 5000 < netscape-v4.61.tar
#
# # host (macOS)
nc -l 127.0.0.1 5000 < netscape-v4.61.tar</pre></blog-code><p></p><blog-code syntax=commands prompt=%><pre>
# # VM
cc -o recv recv.c
./recv 10.0.2.2:5000 netscape-v4.61.tar</pre></blog-code><p>Unpack that tarball, write a wrapper script and a stub <tt>/etc/resolv.conf</tt>, and Netscape is ready to go.</p><blog-code syntax=commands prompt=%><pre>
cat /etc/resolv.conf
# domain sunos.local
# nameserver 10.0.2.3
cat ~/netscape.sh
# #!/bin/sh
# XNLSPATH="${HOME}/netscape-v4.61/nls"
# XKEYSYMDB="${HOME}/netscape-v4.61/XKeysymDB"
# export XNLSPATH XKEYSYMDB
# exec "${HOME}/netscape-v4.61/netscape_dns" "$@"</pre></blog-code><div><img src=https://john-millikin.com/by-sha256/6be1f117f67e6cc235088f2788b3542c767e99f3fb5eaba695e38ee717480695/netscape.png style=max-width:512px;margin:1em></div></blog-section><blog-section id=recv-c><h2 slot=title>Appendix A: recv.c</h2><p>This should be fairly readable despite being written in K&R C; the BSD sockets API hasn't changed much.</p><p>If you don't want to type the whole thing in by hand, see the next section about X11 forwarding.</p><blog-code syntax=c><pre>
#include <arpa/inet.h>
#include <fcntl.h>
#include <netdb.h>
#include <netinet/in.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <unistd.h>
int split_server_address(server_address, server_ip, server_port)
char *server_address;
unsigned long *server_ip;
unsigned short *server_port;
{
char *port_str, *port_extra;
long port_raw;
port_str = strchr(server_address, ':');
if (port_str == NULL) {
return -1;
}
*(port_str++) = 0;
*server_ip = inet_addr(server_address);
if (*server_ip == -1) {
return -1;
}
port_raw = strtol(port_str, &port_extra, 10);
if (port_raw < 1 || port_raw > 65535) {
return -1;
}
if (*port_extra != 0) {
return -1;
}
*server_port = port_raw;
return 0;
}
int recv_file(server_ip, server_port, output_path)
unsigned long server_ip;
unsigned short server_port;
char *output_path;
{
int socket_fd, output_fd;
struct sockaddr_in server;
char buffer[2048];
socket_fd = socket(AF_INET, SOCK_STREAM, 0);
if (socket_fd == -1) {
return -1;
}
memset(&server, 0, sizeof server);
server.sin_family = AF_INET;
server.sin_addr.s_addr = server_ip;
server.sin_port = htons(server_port);
if (connect(socket_fd, (struct sockaddr*)&server, sizeof server) == -1) {
return -1;
}
output_fd = open(output_path, O_WRONLY | O_CREAT, 0600);
if (output_fd == -1) {
return -1;
}
while (1) {
int n = read(socket_fd, buffer, sizeof buffer);
if (n == -1) {
close(output_fd);
return -1;
}
if (n == 0) {
return close(output_fd);
}
write(output_fd, buffer, n);
}
}
int main(argc, argv)
int argc;
char **argv;
{
unsigned long server_ip;
unsigned short server_port;
if (argc < 3) {
fprintf(stderr, "Usage: %s <server_address> <output_path>\n", argv[0]);
return 1;
}
if (split_server_address(argv[1], &server_ip, &server_port) == -1) {
fprintf(stderr, "Invalid server address \"%s\"\n", argv[1]);
return 1;
}
if (recv_file(server_ip, server_port, argv[2]) == -1) {
perror("Error receiving file");
return 1;
}
return 0;
}
</pre></blog-code></blog-section><blog-section><h2 slot=title>Appendix B: X11 forwarding</h2><p>The experience of interacting with a GUI from 1994 via QEMU's console is not great, so I recommend running an X11 server on your host and having the VM connect to it.</p><p>If you're already running an X11-based desktop (BSD, older Linux, macOS with XQuartz<blog-footnote-ref>[<a href=#fn:4>4</a>]</blog-footnote-ref>) then you can proxy its socket directly to TCP and then connect to it from the VM. This will let you copy-paste big blobs of text such as <a href=#recv-c>recv.c</a>.</p><blog-code syntax=commands><pre>
# # host
socat TCP-LISTEN:6001,fork,bind=127.0.0.1 UNIX-CONNECT:/tmp/.X11-unix/X0</pre></blog-code><blog-code syntax=commands prompt=%><pre>
# # VM
setenv DISPLAY 10.0.2.2:1
xterm</pre></blog-code><p>Alternatively, use a nested X11 server such as Xnest or Xephyr. You'll be able to run the OpenWindows window manager, so it feels a bit like using VNC.</p><blog-code syntax=commands><pre>
# # host
Xephyr -ac -listen tcp -screen 2048x1536 :1</pre></blog-code><blog-code syntax=commands prompt=%><pre>
# # VM
setenv DISPLAY 10.0.2.2:1
olwm</pre></blog-code><div><a href=https://john-millikin.com/by-sha256/697985b5647b1048a4e17dd888d0dd5b1dee3d18ce336dc3fb1dad39b5209daf/xephyr.png><img src=https://john-millikin.com/by-sha256/697985b5647b1048a4e17dd888d0dd5b1dee3d18ce336dc3fb1dad39b5209daf/xephyr.png style=max-width:512px;margin:1em></a></div><p>If <tt>olwm</tt> segfaults on startup, make sure that the host machine has the legacy X11 fonts installed. In Ubuntu 22.04 I had to install the <tt>xfonts-100dpi</tt> package.</p></blog-section><blog-footnotes><hr><ol><li id=fn:1><p>Nowadays the physical block size for CD-ROMs is 2048 bytes, but in the 90s this value wasn't standardized yet. Consumer CD-ROM drives had a physical jumper on the back that could select the block size, and some OSes (including SunOS) would encounter read errors if the jumper wasn't set to what they expected.</p></li><li id=fn:2><p>SunOS requires a swap partition that is at least as large as machine memory, and the default swap partition size for <tt>SUN2.1G</tt> is 100 MiB. Using 64 MiB lets us avoid fiddling with the disk geometry in the formatting tool.</p></li><li id=fn:3><p>SunOS pre-dates CIDR, so it thinks of all 10.x.x.x addresses as belonging to the 10.0.0.0/8 "Class A" network. This is technically wrong for QEMU, which by default uses a netmask of 0xFFFFFF00, but it doesn't really matter as long as you don't try to do anything too complicated with multi-VM networking.</p></li><li id=fn:4><p>Note that the default socket path for XQuartz may contain a colon, which will make socat unhappy because it uses colons as part of its option syntax. You can work around this with a symlink.</p></li></ol></blog-footnotes></blog-article>2023-04-13T00:50:27ZImproved UNIX socket networking in QEMU 7.22023-04-11T02:17:00Zurn:uuid:d512bc63-fe52-4a85-9c87-26303eed70cd<blog-article posted=2023-04-11T02:17:00Z><h1 slot=title>Improved UNIX socket networking in QEMU 7.2</h1><div slot=summary><p><a href=https://www.qemu.org/2022/12/14/qemu-7-2-0/>QEMU 7.2</a> quietly introduced two new network backends, <tt>-netdev dgram</tt> and <tt>-netdev stream</tt>. Unlike the older <tt>-netdev socket</tt>, these new backends directly support <tt>AF_UNIX</tt> socket addresses without the need for an intermediate wrapper tool.</p></div><blog-section><h2 slot=title>The situation up until now</h2><p>QEMU has a <tt>-netdev socket</tt> network backend, which will send/receive Ethernet frames via TCP (the <tt>connect=</tt> and <tt>listen=</tt> modes) or UDP (the <tt>mcast=</tt> and <tt>udp=</tt> modes). This functionality isn't well documented, and its intended use appears to be as a sort of simple network hub for hosts that can't use a TAP device<blog-footnote-ref>[<a href=#fn:1>1</a>]</blog-footnote-ref>.</p><blockquote><pre>$ qemu-system-x86_64 --help
[...]
-netdev socket,id=str[,fd=h][,listen=[host]:port][,connect=host:port]
configure a network backend to connect to another network
using a socket connection
-netdev socket,id=str[,fd=h][,mcast=maddr:port[,localaddr=addr]]
configure a network backend to connect to a multicast maddr and port
use 'localaddr=addr' to specify the host address to send packets from
-netdev socket,id=str[,fd=h][,udp=host:port][,localaddr=host:port]
configure a network backend to connect to another network
using an UDP tunnel</pre></blockquote><p>A less-obvious (and <i>completely</i> undocumented) behavior of <tt>-netdev socket</tt> is that (1) the <tt>fd=</tt> syntax is actually its own mutually-exclusive mode, and (2) it doesn't need to be the file descriptor of a TCP socket in particular. This means it's possible to coax QEMU into using a UNIX socket for its network backend, by connecting to the socket in a wrapper process before spawning QEMU. The wrapper doesn't have to be complex; see <a href=#qemu-wrapper-c>qemu-wrapper.c</a> for a working example in 50 lines of C.</p><p>Whatever process created the UNIX socket can of course do whatever it needs to with the raw Ethernet frames it receives, including acting as a switch or VPN or whatever. If you don't already have a preferred usermode network library, I recommend <a href=https://scapy.net/>Scapy</a> as a comprehensive and beginner-friendly option. For a starting point, try using <a href=print-frames-py>print-frames.py</a> to log network traffic of a Debian live CD:</p><div><img src=https://john-millikin.com/by-sha256/5bdb157d59ab761707c9ff5404cee640fac578b701400ebf3ea8288fde90f368/netdev-socket.png style=max-width:800px></div></blog-section><blog-section><h2 slot=title>New backends in QEMU 7.2</h2><p>The <a href=https://www.qemu.org/2022/12/14/qemu-7-2-0/>QEMU 7.2</a> release adds two new network backends, <tt>-netdev dgram</tt> and <tt>-netdev stream</tt>. Although the related mailing list discussion<blog-footnote-ref>[<a href=#fn:2>2</a>]</blog-footnote-ref> make it clear that the new functionality exists to better support UNIX sockets, in classic QEMU fashion this minor detail has been left out of the documentation<blog-footnote-ref>[<a href=#fn:3>3</a>]</blog-footnote-ref>.</p><blockquote><pre style=overflow:auto>
-netdev stream,id=str[,server=on|off],addr.type=inet,addr.host=host,addr.port=port[,to=maxport][,numeric=on|off][,keep-alive=on|off][,mptcp=on|off][,addr.ipv4=on|off][,addr.ipv6=on|off]
-netdev stream,id=str[,server=on|off],addr.type=unix,addr.path=path[,abstract=on|off][,tight=on|off]
-netdev stream,id=str[,server=on|off],addr.type=fd,addr.str=file-descriptor
configure a network backend to connect to another network
using a socket connection in stream mode.
-netdev dgram,id=str,remote.type=inet,remote.host=maddr,remote.port=port[,local.type=inet,local.host=addr]
-netdev dgram,id=str,remote.type=inet,remote.host=maddr,remote.port=port[,local.type=fd,local.str=file-descriptor]
configure a network backend to connect to a multicast maddr and port
use ``local.host=addr`` to specify the host address to send packets from
-netdev dgram,id=str,local.type=inet,local.host=addr,local.port=port[,remote.type=inet,remote.host=addr,remote.port=port]
-netdev dgram,id=str,local.type=unix,local.path=path[,remote.type=unix,remote.path=path]
-netdev dgram,id=str,local.type=fd,local.str=file-descriptor
configure a network backend to connect to another network
using an UDP tunnel
</pre></blockquote><p>The <tt>-netdev stream</tt> backend works just like the pseudo-TCP example above, but doesn't require a wrapper:</p><div><img src=https://john-millikin.com/by-sha256/b7fde3b5d5736a80365d40317e2973012d03875429dc66723de3b4a478b44911/netdev-stream.png style=max-width:800px></div><p>The <tt>-netdev dgram</tt> backend is a bit different. Since datagrams are inherently unidirectional, frames sent to the host use a separate socket from frames sent to the guest. The receiving program also needs to be adjusted, because QEMU (reasonably) doesn't length-prefix datagrams.</p><p><a href=#print-frames-dgram-arp-py>print-frames-dgram-arp.py</a> is an expanded version of the earlier example. It waits for the VM to send an ARP request for address 192.168.100.101, then prints any frames received after that request. Within the VM I turned off Avahi (noisy), manually configured the network, and used Python to send a UDP packet.</p><p>Within the <tt>-netdev dgram</tt> flag, the value of <tt>local.path=</tt> is the socket address that the host will send frames to, and <tt>remote.path=</tt> is the socket address that the host will receive frames from.</p><div><img src=https://john-millikin.com/by-sha256/88f56835840fee177540792f94e91b767be408481362ecd2c0c0d2c8c8bc8fd3/netdev-dgram.png style=max-width:400px>
<img src=https://john-millikin.com/by-sha256/ed457ab4ab465b4dbadde41c6746dbb6b6a6d3c98f5a8ee9e2cf4e39a864681e/hello-from-vm.png style=max-width:400px></div><p>Despite my general crankiness about the docs coverage, I'm quite happy to see this functionality land. Native support for <tt>AF_UNIX</tt> datagrams is exciting (for a certain type of person) because it eliminates a lot of the complexity involved in wiring up QEMU with a userspace network stack. Using UNIX sockets means you don't need to worry about port conflicts, it doesn't need TAP so it's sandbox-friendly, and the VM's network won't break if the packet processor restarts.</p></blog-section><blog-section id=qemu-wrapper-c><h2 slot=title>Appendix A: qemu-wrapper.c</h2><p>Nothing fancy here, it just creates a socket and connects it to a user-provided path.</p><blog-code syntax=c><pre>
/* Copyright (c) John Millikin <john@john-millikin.com> */
/* SPDX-License-Identifier: 0BSD */
#include <stdio.h>
#include <string.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <sys/un.h>
#include <unistd.h>
int main(int argc, char **argv) {
int sock_fd, rc;
char *sock_path;
size_t sock_path_len;
struct sockaddr_un sock_addr = {AF_UNIX, ""};
if (argc < 3) {
fprintf(stderr, "Usage: %s <socket> <qemu> [args...]\n", argv[0]);
return 1;
}
sock_path = argv[1];
sock_path_len = strlen(sock_path);
if (sock_path_len >= sizeof sock_addr.sun_path) {
fprintf(stderr, "Socket path \"%s\" too long\n", sock_path);
return 1;
}
memcpy(sock_addr.sun_path, sock_path, sock_path_len + 1);
sock_fd = socket(AF_UNIX, SOCK_STREAM, 0);
if (sock_fd == -1) {
perror("Failed to create socket");
return 1;
}
rc = connect(sock_fd, (struct sockaddr*)&sock_addr, sizeof sock_addr);
if (rc == -1) {
fprintf(stderr, "Failed to connect to socket \"%s\": ", sock_path);
perror(NULL);
return 1;
}
execv(argv[2], argv + 2);
fprintf(stderr, "%s: ", argv[2]);
perror(NULL);
return 1;
}</pre></blog-code></blog-section><blog-section id=print-frames-py><h2 slot=title>Appendix B: print-frames.py</h2><p>Reads Ethernet frames from a socket, then uses <a href=https://scapy.net/>Scapy</a> to parse and print them.</p><p>The expected format of the TCP stream doesn't seem to be documented. In my testing the Ethernet frames were always prefixed with their length as a big-endian 32-bit uint.</p><blog-code syntax=python><pre>
#!/usr/bin/python3
# Copyright (c) John Millikin <john@john-millikin.com>
# SPDX-License-Identifier: 0BSD
import os
import os.path
import socket
import struct
import sys
from scapy import all as scapy
socket_path = sys.argv[1]
if os.path.exists(socket_path):
os.remove(socket_path)
sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
sock.bind(socket_path)
sock.listen(1)
conn, addr = sock.accept()
while True:
frame_len_buf = conn.recv(4)
if len(frame_len_buf) == 0:
break
(frame_len,) = struct.unpack("!L", frame_len_buf)
frame = scapy.Ether(conn.recv(frame_len))
print(repr(frame))
print("")</pre></blog-code></blog-section><blog-section id=print-frames-dgram-arp-py><h2 slot=title>Appendix C: print-frames-dgram-arp.py</h2><p>Similar as above, but adjusted for unidirectional sockets and expanded to verify that sending frames (ARP responses) to the VM works as expected. Within the VM, ping 192.168.100.101 and watch the ICMP frames come through.</p><blog-code syntax=python><pre>
#!/usr/bin/python3
# Copyright (c) John Millikin <john@john-millikin.com>
# SPDX-License-Identifier: 0BSD
import os
import os.path
import socket
import sys
from scapy import all as scapy
send_socket_path = sys.argv[1]
recv_socket_path = sys.argv[2]
if os.path.exists(recv_socket_path):
os.remove(recv_socket_path)
send_sock = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM)
recv_sock = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM)
recv_sock.bind(recv_socket_path)
ready = False
while True:
frame = scapy.Ether(recv_sock.recv(9001))
if ready:
print(repr(frame))
print("")
if not isinstance(frame.payload, scapy.ARP):
continue
if frame.payload.op != 1: # who-has
continue
if frame.payload.pdst == "192.168.100.101":
resp_bytes = scapy.raw((scapy.Ether(
dst="52:54:00:12:34:56",
src="52:54:00:12:34:ff",
) / scapy.ARP(
op=2, # is-at
hwsrc="52:54:00:12:34:ff",
psrc="192.168.100.101",
hwdst="52:54:00:12:34:56",
pdst="192.168.100.100",
)))
send_sock.sendto(resp_bytes, send_socket_path)
ready = True</pre></blog-code></blog-section><blog-footnotes><hr><ol><li id=fn:1><p><a href=https://wiki.qemu.org/Documentation/Networking#Socket>https://wiki.qemu.org/Documentation/Networking#Socket</p></li><li id=fn:2><p><a href=https://lore.kernel.org/all/20220722185701.300449-1-lvivier@redhat.com/>[PATCH v7 00/14] qapi: net: add unix socket type support to netdev backend</a></p></li><li id=fn:3><p>In case the reader thinks I'm being unfair by expecting <tt>--help</tt> output to have more detail, consider that the QEMU documentation page <a href=https://qemu.readthedocs.io/en/v7.2.0/system/invocation.html>System Emulation » Invocation</a> is the most complete reference I can find for QEMU's <tt>-netdev</tt> flags, and it doesn't even <i>mention</i> the <tt>dgram</tt> or <tt>stream</tt> network backends.</p></li></ol></blog-footnotes></blog-article>2023-04-11T02:17:00ZDebugging Win32 binaries in Ghidra via Wine2022-08-23T06:09:54Zurn:uuid:bac3e12a-a679-4bee-9384-8621dec3406d<blog-article posted=2022-08-23T06:09:54Z><h1 slot=title>Debugging Win32 binaries in Ghidra via Wine</h1><div slot=tableofcontents></div><div slot=summary><div style="float:right;padding:0 0 0 2em"><img src=https://john-millikin.com/by-sha256/e8ac6f818477049205c63a3e23b2396f1b42a31620b9ad43eb9f54268aa9c57a/08_memory-regions.png style=max-width:400px></div><p><a href=https://ghidra-sre.org/>Ghidra</a> is a cross-platform reverse-engineering and binary analysis tool, with recent versions including support for dynamic analysis. I want to try using it as a replacement for IDA Pro in reverse-engineering of Win32 binaries, but hit bugs related to address space detection when running gdbserver with Wine (<a href=https://github.com/NationalSecurityAgency/ghidra/issues/4534>ghidra#4534</a>).</p><p>This post contains custom GDB commands that allow Ghidra to query the Linux process ID and memory maps of a Win32 target process running in 32-bit Wine on a 64-bit Linux host.</p></div><blog-section><h2 slot=title>Building a simple Win32 binary on Linux</h2><p>If you've already got a Win32 binary you're interested in analyzing, you can skip this step.</p><p>For the purposes of testing and writing blog posts, it's useful to have a simple "hello world" binary that doesn't have much fancy stuff going on. This is the code to a minimal Win32 console program:</p><blog-code syntax=c><pre>
#include <windows.h>
static const char message[] = "Hello, world!\n";
static const int message_len = sizeof(message);
int __stdcall mainCRTStartup(void) {
HANDLE stdout = GetStdHandle(STD_OUTPUT_HANDLE);
DWORD bytes_written;
WriteFile(stdout, message, message_len, &bytes_written, NULL);
return 0;
}</pre></blog-code><p>To compile a <a href=https://en.wikipedia.org/wiki/Portable_Executable>PE</a> binary in Linux you can either use Wine to install the <a href=https://developer.microsoft.com/en-us/windows/downloads/windows-sdk/>Windows SDK</a>, or use a cross-compiler. The Windows SDK is a bit of a pain to install since it's distributed as an ISO image full of installer wizards, so I choose the second option. For cross-compilation I prefer to use Clang and LLD whenever possible since they're "native" cross-compilers, which means that (unlike GNU GCC/LD) their target platform can be selected at runtime.</p><blog-code syntax=commands><pre>
WINE="${HOME}/.opt/wine-7.15"
clang -target i386-pc-win32 -O2 -c \
# -isystem "${WINE}"/include/wine/windows \
# -isystem "${WINE}"/include/wine/msvcrt \
# hello-win32.c
ld.lld -flavor link \
# /out:hello-win32.exe \
# /nxcompat:no \
# /subsystem:console \
# /defaultlib:kernel32 \
# hello-win32.o</pre></blog-code><p>If you don't have a copy of <tt>kernel32.lib</tt> from the Windows SDK, a usable substitute can be generated from <a href=https://gitlab.winehq.org/wine/wine/-/blob/wine-7.15/dlls/kernel32/kernel32.spec><tt>kernel32.spec</tt></a> in Wine's source tree.</p><blog-code syntax=commands><pre>
WINE_SRC="${HOME}/src/third_party/winehq.org/wine-7.15"
"${WINE}"/bin/winebuild --def \
# -E "${WINE_SRC}"/dlls/kernel32/kernel32.spec \
# -o kernel32.def
llvm-dlltool -m i386 -k -d kernel32.def -l kernel32.lib</pre></blog-code><p>Double-check that the executable works:</p><blog-code syntax=commands><pre>
wine hello-win32.exe
# Hello, world!</pre></blog-code></blog-section><blog-section><h2 slot=title>Debugging with gdbserver.exe</h2><p>First, install both Linux and Windows builds of <a href=https://www.sourceware.org/gdb/>GDB</a>, configured with <tt>--target=i686-w64-mingw32</tt>. On Ubuntu an appropriate build of GDB can be installed with <tt>apt install gdb-mingw-w64 gdb-mingw-w64-target</tt>.</p><p>The <tt>gdbserver.exe</tt> process will run "inside" Wine, and use Windows debugging APIs to control the binary being debugged. It listens on a TCP socket implementing the <a href=https://sourceware.org/gdb/onlinedocs/gdb/Remote-Protocol.html>GDB remote serial protocol</a>.</p><blog-code syntax=commands><pre>
wine /usr/share/win32/gdbserver.exe localhost:10000 ./hello-win32.exe
# Listening on port 10000</pre></blog-code><p>The <tt>i686-w64-mingw32-gdb</tt> process runs in the Linux environment, and provides a REPL that can control the "remote" gdbserver. This process is necessary because Ghidra doesn't directly speak the GDB serial protocol, it controls GDB through the text UI. Before starting up Ghidra, verify that the GDB bits are working:</p><blog-code syntax=commands><pre>
/usr/bin/i686-w64-mingw32-gdb</pre></blog-code><blog-code syntax=commands prompt=(gdb)><pre>
file ~/ghidra/hello-win32.exe
# Reading symbols from ~/ghidra/hello-win32.exe...
# (No debugging symbols found in ~/ghidra/hello-win32.exe)
target extended-remote :10000
# Remote debugging using :10000
# Reading C:/windows/system32/ntdll.dll from remote target...
# warning: File transfers from remote targets can be slow. Use "set sysroot" to access files locally instead.
# Reading C:/windows/system32/kernel32.dll from remote target...
# Reading C:/windows/system32/kernelbase.dll from remote target...
# 0x7bc7eb01 in ?? ()</pre></blog-code></blog-section><blog-section><h2 slot=title>Connecting Ghidra to GDB</h2><p>Create the Ghidra project, import the Win32 binary to be analyzed, and enter the Debugger tool. When connecting to GDB you can use either IN-VM or GADP, but GADP is probably better since Ghidra's debugger can get wedged and it's nice to be able to forcefully disconnect by killing the GADP agent.</p><p><a href=https://john-millikin.com/by-sha256/04da0365d1b89e236aea9700b684a12e169444a81b2d7fd53193db40a98132fb/01_start.png><img src=https://john-millikin.com/by-sha256/04da0365d1b89e236aea9700b684a12e169444a81b2d7fd53193db40a98132fb/01_start.png style=max-width:500px></a></p><p><a href=https://john-millikin.com/by-sha256/3e655c75a60cf9bae835893719cfc7167917a9dc1bcf379e0510cc7474192e6e/02_connect-gdb.png><img src=https://john-millikin.com/by-sha256/3e655c75a60cf9bae835893719cfc7167917a9dc1bcf379e0510cc7474192e6e/02_connect-gdb.png style=max-width:500px></a></p><p><a href=https://john-millikin.com/by-sha256/f4292df3e09f457fc63f4ea4a065bd7b2b5578f5b1c888d8d6bc01b127eee51e/03_connected.png><img src=https://john-millikin.com/by-sha256/f4292df3e09f457fc63f4ea4a065bd7b2b5578f5b1c888d8d6bc01b127eee51e/03_connected.png style=max-width:500px></a></p><p>Here's where things start to go wrong. After creating the trace record, Ghidra will start throwing out error popups about trying to access invalid address space. Github issue <a href=https://github.com/NationalSecurityAgency/ghidra/issues/4534>ghidra#4534</a> has some of the nitty-gritty details on what's going on, but in summary Ghidra depends on the GDB command <tt>info proc mappings</tt> to figure out what it can peek at, and GDB doesn't implement that command for Windows targets.</p><p><a href=https://john-millikin.com/by-sha256/b297c63f4d980e7bafada713744f0010855de8b5171bb9f1113001c9a6b9134e/04_record.png><img src=https://john-millikin.com/by-sha256/b297c63f4d980e7bafada713744f0010855de8b5171bb9f1113001c9a6b9134e/04_record.png style=max-width:500px></a></p><p><a href=https://john-millikin.com/by-sha256/6506362a576b78950cdddf6d8ae680d8156408446741b97713caabed5c01725b/05_error-popup.png><img src=https://john-millikin.com/by-sha256/6506362a576b78950cdddf6d8ae680d8156408446741b97713caabed5c01725b/05_error-popup.png style=max-width:500px></a></p></blog-section><blog-section><h2 slot=title>Shimming the GDB memory map</h2><p>There's two problems we're facing here:<ul><li>First, we need to get access to the <tt>/proc/{pid}/maps</tt> file corresponding to the target process, parse it, and render output that matches what Ghidra expects from GDB.</li><li>Second, the gdbserver is running inside Wine and therefore uses Windows process IDs. There's no way to query the Linux process ID for a Windows process; such an API obviously doesn't exist in Windows, and Wine developers have declined to implement it as an extension.</li></ul></p><p>The memory map parsing/formatting sounds tricky but is actually pretty straightforward because the format of <tt>info proc maps</tt> is <i>almost</i> the same as what GDB provides, and Ghidra doesn't care about the differences. The <a href=https://sourceware.org/gdb/onlinedocs/gdb/Python-API.html>GDB Python API</a> can be used to define a new <tt>remote-proc-mappings</tt> command, which reads <tt>/proc/{pid}/maps</tt> for any process accessible to the remote gdbserver.</p><blog-code syntax=python><pre>
import contextlib
import os
import threading
@contextlib.contextmanager
def pipe_fds():
r_fd, w_fd = os.pipe()
r_file = os.fdopen(r_fd, mode="rb")
w_file = os.fdopen(w_fd, mode="wb")
try:
yield (r_file, w_file)
finally:
r_file.close()
w_file.close()
class ReadThread(threading.Thread):
def __init__(self, reader):
super(ReadThread, self).__init__()
self.__r = reader
self.bytes = None
def run(self):
self.bytes = bytearray(self.__r.read())
def reformat_line(raw_line):
split = raw_line.decode("utf-8").split(None, 5)
# split[0] range
# split[1] mode
# split[2] offset
# split[3] major_minor
# split[4] inode
# split[5] object name
start_addr_s, end_addr_s = split[0].split("-")
start_addr = int(start_addr_s, 16)
end_addr = int(end_addr_s, 16)
if len(split) == 6:
objfile = split[5]
else:
objfile = ""
return "0x{:X} 0x{:X} 0x{:X} 0x{:X} {} {}\n".format(
start_addr, end_addr,
end_addr - start_addr,
int(split[2], 16),
split[1],
objfile,
)
class RemoteProcMappings(gdb.Command):
def __init__(self):
super(RemoteProcMappings, self).__init__("remote-proc-mappings", gdb.COMMAND_STATUS)
def invoke(self, arg, from_tty):
argv = gdb.string_to_argv(arg)
if len(argv) != 1:
gdb.write("usage: remote-proc-mappings PID\n", gdb.STDERR)
return
remote_pid = int(argv[0])
with pipe_fds() as (r_file, w_file):
read_thread = ReadThread(reader = r_file)
read_thread.start()
maps_path = "/proc/{}/maps".format(remote_pid)
pipe_writer_path = "/dev/fd/{}".format(w_file.fileno())
gdb.execute("remote get {} {}".format(maps_path, pipe_writer_path))
w_file.close()
read_thread.join()
raw_bytes = read_thread.bytes
for raw_line in raw_bytes.split(b"\n"):
if raw_line:
gdb.write(reformat_line(raw_line))
RemoteProcMappings()</pre></blog-code><p>Next we need the Linux PID. Luckily(?) Wine allows Win32 binaries to directly invoke Linux syscalls via the <a href=/unix-syscalls/#linux-i386-interrupt><tt>INT 0x80</tt></a> instruction, so a straightforward approach is to inject a <tt>linux_getpid()</tt> function into the target process's address space and then use GDB's <tt>call</tt> command to run it.</p><p>Many Windows binaries have executable stacks (<tt>/nxcompat:no</tt>), which makes this super easy:</p><blog-code syntax=shell><pre>
define getpid-linux-i386
# MOV eax,20 [SYS_getpid]
# INT 0x80
# RET
set $linux_getpid = {int (void)}($esp-7)
set {unsigned char[8]}($linux_getpid) = {\
0xB8, 0x14, 0x00, 0x00, 0x00, \
0xCD, 0x80, \
0xC3 \
}
output $linux_getpid()
echo \n
end</pre></blog-code><p>If the above command causes a segfault then the binary was probably compiled with <tt>/nxcompat</tt>, which places the stack in a non-executable mapping. Luckily(?) again, Windows processes map their <tt>.text</tt> segment to a fixed offset (by default <tt>0x401000</tt>), so you can use Ghidra to locate some function padding or an unused error branch or whatever and write the getpid stub there:</p><blog-code syntax=shell><pre>
define getpid-linux-i386
# MOV eax,20 [SYS_getpid]
# INT 0x80
# RET
set $linux_getpid = {int (void)}0x401020
set {unsigned char[8]}($linux_getpid) = {\
0xB8, 0x14, 0x00, 0x00, 0x00, \
0xCD, 0x80, \
0xC3 \
}
output $linux_getpid()
echo \n
end</pre></blog-code><p>With these two custom commands defined, it's now possible to override <tt>info proc mappings</tt> to (1) find the Linux pid, and (2) report its memory mappings to Ghidra.</p><blog-code syntax=shell><pre>
source ~/ghidra/getpid-linux-i386.gdb
source ~/ghidra/remote-proc-mappings.py
define info proc mappings
python
remote_pid = gdb.execute("getpid-linux-i386", to_string=True).strip()
gdb.execute("remote-proc-mappings {}".format(remote_pid))
end
end</pre></blog-code><p>Put that into a <tt>wine-win32.gdb</tt> file and source it from Ghidra's GDB interpreter panel. Note that to make Ghidra happy the <tt>info proc mappings</tt> command must be overridden before connecting to the remote gdbserver.</p><p>Since they're regular GDB commands, they can also be used from the command line:</p><blog-code syntax=commands prompt=(gdb)><pre>
file ~/ghidra/hello-win32.exe
# Reading symbols from ~/ghidra/hello-win32.exe...
# (No debugging symbols found in ~/ghidra/hello-win32.exe)
source ~/ghidra/wine-win32.gdb
target extended-remote :10000
# Remote debugging using :10000
# Reading C:/windows/system32/ntdll.dll from remote target...
# warning: File transfers from remote targets can be slow. Use "set sysroot" to access files locally instead.
# Reading C:/windows/system32/kernel32.dll from remote target...
# Reading C:/windows/system32/kernelbase.dll from remote target...
# 0x7bc7eb01 in ?? ()
getpid-linux-i386
# 1872324</pre></blog-code><p>When loaded into Ghidra's GDB session, the trace recording works and the dynamic analysis functionality (Dynamic panel, Regions panel, etc) work as expected.</p><p><a href=https://john-millikin.com/by-sha256/c85f6e19edfbd2f162f2bb97008663619a26fc7e3f315c80bfe2fa9e59522db1/06_record-with-getpid.png><img src=https://john-millikin.com/by-sha256/c85f6e19edfbd2f162f2bb97008663619a26fc7e3f315c80bfe2fa9e59522db1/06_record-with-getpid.png style=max-width:500px></a></p><p>Ghidra is able to disassemble code injected at runtime. Here, the Dynamic panel shows our <tt>linux_getpid</tt> code injected at 0x401020.</p><p><a href=https://john-millikin.com/by-sha256/5a31c22dceb78e7175acd1890b7f3d8adc281a2e2392ed0e6c3f83a63ad1061b/07_dynamic-view.png><img src=https://john-millikin.com/by-sha256/5a31c22dceb78e7175acd1890b7f3d8adc281a2e2392ed0e6c3f83a63ad1061b/07_dynamic-view.png style=max-width:500px></a></p></blog-section></blog-article>2022-08-23T06:09:54ZRunning BeOS 5 in QEMU (i386)2022-07-09T06:51:22Zurn:uuid:5134af49-8bc3-4e06-82d6-55633c6a74a9<blog-article posted=2022-07-09T06:51:22Z><h1 slot=title>Running BeOS 5 in QEMU (i386)</h1><div slot=tableofcontents></div><div slot=summary><p><a href=https://en.wikipedia.org/wiki/BeOS>BeOS</a> is an operating system from the '90s, notable for its prescient technical decisions and abject business failure<blog-footnote-ref>[<a href=#fn:1>1</a>]</blog-footnote-ref>. It embraced multi-threading at a time when 100mhz CPUs powered top-shelf workstations, and featured metadata-backed virtual folders ten years before their arrival in mainstream OSes.</p></div><blog-section><h2 slot=title>Installation media</h2><p>The installation CD-ROM for BeOS Pro Edition 5.0 is available on the Internet Archive. It's been uploaded there at least twice, but the content is identical, so either one works. You'll need both the <tt>.bin</tt> and <tt>.cue</tt> files.</p><ul><li><a href=https://archive.org/details/BeosVersion5.0.3IntelAndPpc>BeOS Version 5.0.3 Intel and PPC</a> (uploaded 2017-08-23)</li><li><a href=https://archive.org/details/beos-5.0.3-professional-gobe>BeOS Professional Edition 5.0.3</a> (uploaded 2021-08-18)</li></ul><blog-code syntax=commands><pre>
shasum -a 256 *
# 1889fd6cf5af4259b01c9d1925e62f664effdf9dd88f924dc9b4da41ce1f0106 BeOS_Tools.bin
# 6f4fd9fbf7dff01d27391bee3b8bb27def7ed2fcd978f4b698c220b69eb89af9 BeOS_Tools.cue
# 1889fd6cf5af4259b01c9d1925e62f664effdf9dd88f924dc9b4da41ce1f0106 beos-5.0.3-professional-gobe.bin
# a57d9552cdadbbdbe6f608e8dbe9ac2bec2a010da1ad801fc0176e4d66bb234c beos-5.0.3-professional-gobe.cue</pre></blog-code><p>The BeOS installation media has an unusual layout with three separate filesystems, which must be split to be usable by QEMU<blog-footnote-ref>[<a href=#fn:4>4</a>]</blog-footnote-ref>. Use <a href=http://he.fi/bchunk/>bchunk</a> to extract the bootable ISO 9660 filesystem into a <tt>.iso</tt> file.</p><blog-code syntax=commands><pre>
curl -L -O https://raw.githubusercontent.com/hessu/bchunk/release/1.2.2/bchunk.c
shasum -a 256 bchunk.c
# 34ce2e8c23b41a9f14a7e4f50e14996f2754c27237ba431ede1caaee39e759a6 bchunk.c
gcc -o bchunk bchunk.c
./bchunk BeOS_Tools.bin BeOS_Tools.cue BeOS_Tools.iso
# binchunker for Unix, version 1.2.2 by Heikki Hannikainen <hessu@hes.iki.fi>
# Created with the kind help of Bob Marietta <marietrg@SLU.EDU>,
# partly based on his Pascal (Delphi) implementation.
# Support for MODE2/2352 ISO tracks thanks to input from
# Godmar Back <gback@cs.utah.edu>, Colas Nahaboo <Colas@Nahaboo.com>
# and Matthew Green <mrg@eterna.com.au>.
# Released under the GNU GPL, version 2 or later (at your option).
#
# Reading the CUE file:
#
# Track 1: MODE1/2352 01 00:00:00
# Track 2: MODE1/2352 01 10:48:58
# Track 3: MODE1/2352 01 46:07:03
#
# Writing tracks:
#
# 1: BeOS_Tools.iso01.iso 95/95 MB [********************] 100 %
# 2: BeOS_Tools.iso02.iso 310/310 MB [********************] 100 %
# 3: BeOS_Tools.iso03.iso 236/236 MB [********************] 100 %
shasum -a 256 *.iso
# 5c193d1855ad542f9a40a092a32bf2c6072e273a51d781dbf925a9a02e66d759 BeOS_Tools.iso01.iso
# 0031b4eb35a8ebfcf578d197c2372dfda0f748ef260f44dba2dd93740da35626 BeOS_Tools.iso02.iso
# 26b771b4f22f01b3311b86c82d4a7c2f6d84973b2ed506cf8d65738732f21708 BeOS_Tools.iso03.iso</pre></blog-code><p>Of the split files, <tt>iso01</tt> is bootable. The other two are BeFS filesystems containing x86 and PowerPC installation data.</p><blog-code syntax=commands><pre>
ls -lh *.iso
# -rw-rw-r-- 1 john john 96M Oct 8 14:56 BeOS_Tools.iso01.iso
# -rw-rw-r-- 1 john john 311M Oct 8 14:56 BeOS_Tools.iso02.iso
# -rw-rw-r-- 1 john john 236M Oct 8 14:56 BeOS_Tools.iso03.iso
sudo mount -o loop BeOS_Tools.iso01.iso BeOS_Tools_01
sudo mount -o loop BeOS_Tools.iso02.iso BeOS_Tools_02
sudo mount -o loop BeOS_Tools.iso03.iso BeOS_Tools_03
ls BeOS_Tools_01
# AUTORUN.INF boot.catalog floppy.img GNU Gobe Macintosh Personal PMAGIC
ls BeOS_Tools_02
# apps beos demos home _packages_ preferences var
file BeOS_Tools_02/beos/apps/Terminal
# BeOS_Tools/beos/apps/Terminal: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, not stripped
file BeOS_Tools_03/beos/apps/Terminal
# BeOS_Tools/beos/apps/Terminal: header for PowerPC PEF executable</pre></blog-code><p>Note how the x86 edition of BeOS uses a common executable format (<a href=https://en.wikipedia.org/wiki/Executable_and_Linkable_Format>ELF</a>), whereas the PowerPC edition uses <a href=https://en.wikipedia.org/wiki/Preferred_Executable_Format>PEF</a> from early Mac OS. It's an unusual decision.</p></blog-section><blog-section><h2 slot=title>Booting the installer</h2><p>We're now ready to start up QEMU and enter the BeOS graphical installer.</p><blog-code syntax=commands><pre>
qemu-system-i386 -version
# QEMU emulator version 7.0.0
# Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers
qemu-img create -f qcow2 beos-5.img 1G
# Formatting 'beos-5.img', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1073741824 lazy_refcounts=off refcount_bits=16
qemu-system-i386 -m 512M \
-drive media=cdrom,file=BeOS_Tools.iso01.iso \
-drive media=cdrom,file=BeOS_Tools.iso02.iso \
-drive file=beos-5.img</pre></blog-code><div><img src=https://john-millikin.com/by-sha256/028754df4b20845987086c7d384b1097519b3f06b06a1e85ef1eca2e54f86553/Screen%20Shot%202022-07-08%20at%2013.12.07.png style=max-width:500px>
<img src=https://john-millikin.com/by-sha256/87c0825ef5524659b01a3e1e5f525306c27b456b80f6255e2dd29c2c88362954/Screen%20Shot%202022-07-08%20at%2013.12.13.png style=max-width:500px></div><p>At this point the screen will go blank and no further progress happens. To proceed, we must use the boot menu to disable BIOS calls.</p><div><img src=https://john-millikin.com/by-sha256/b74c5bb01a023d401997c92fb440a4b21433a2412ff37108c131ec20c712a978/Screen%20Shot%202022-07-08%20at%2013.16.56.png style=max-width:500px>
<img src=https://john-millikin.com/by-sha256/de8f94fc3d6582b7af2e1aab3ff1636af91b83d4511a5810eb27b17ad64cb0ad/Screen%20Shot%202022-07-08%20at%2013.17.02.png style=max-width:500px></div><p>The installer is now able to boot, and it would actually be able to fully install from here. However, BeOS doesn't recognize the QEMU graphics device and therefore defaults to low-resolution greyscale graphics.</p><div><img src=https://john-millikin.com/by-sha256/172ae576e3a6d0185c34249499fe15330a7b28f47251e5704afe7f97fc7a2a17/Screen%20Shot%202022-07-08%20at%2013.17.26.png style=max-width:500px></div><p>Going back to the boot screen, the default video mode can be manually configured to something more reasonable. I picked <tt>1024x768x16</tt> to get color and a bit more usable screen area.</p><div><img src=https://john-millikin.com/by-sha256/291d007b26248b01b23fee6dd19b1e5f5475bc058f504fb872efd441f9f49ef9/Screen%20Shot%202022-07-08%20at%2013.18.31.png style=max-width:500px>
<img src=https://john-millikin.com/by-sha256/11751b7ef439f46056d6421807190a510a92db95b30022726a88f9118b365032/Screen%20Shot%202022-07-08%20at%2013.19.21.png style=max-width:500px></div><p>BeOS is now ready to install.</p><div><img src=https://john-millikin.com/by-sha256/7aa28aaba778356c0e7b6e8c7a24972541d0fd25cff786e12d2d3273c7ea039a/Screen%20Shot%202022-07-08%20at%2013.20.56.png style=max-width:768px></div></blog-section><blog-section><h2 slot=title>Installing BeOS</h2><p>The installation process for BeOS 5 is mostly unremarkable to modern eyes, but remember this thing was a contemporary of Windows 98. The typical installer UI back then used text-mode VGA, and then here comes BeOS with full graphics (the windows repaint on drag!) straight from the installation media.</p><p>Also, I say <i>mostly</i> unremarkable, because there's no modern OS in the world that could install a complete desktop environment (including web browser and development tools) in 265 MB. Chrome is larger than that just by itself<blog-footnote-ref>[<a href=#fn:5>5</a>]</blog-footnote-ref></p><div><img src=https://john-millikin.com/by-sha256/4ad6168d5720361eaa75f135b299cd84a269e0d1e138dd39e7a121c2e7716f91/Screen%20Shot%202022-07-08%20at%2013.22.33.png style=max-width:500px></div><div><img src=https://john-millikin.com/by-sha256/4fe870f08e6b43690dde6a367b27e60184e66b5a57665ddcf9f54cfebd868d7f/Screen%20Shot%202022-07-08%20at%2013.23.59.png style=max-width:500px>
<img src=https://john-millikin.com/by-sha256/d94a0f5ebfa42616a0e4c2d3e3f802133459cf3028f755e5f32308c40768f816/Screen%20Shot%202022-07-08%20at%2013.25.36.png style=max-width:500px></div><div><img src=https://john-millikin.com/by-sha256/be7df93c3b118788dbf75ea15cbc2811d2a86962ec915e68c707585dd75eca1c/Screen%20Shot%202022-07-08%20at%2013.32.19.png style=max-width:500px></div></blog-section><blog-section><h2 slot=title>Post-install configuration</h2><p>At this point, BeOS has been installed but still has some emulation issues. It will kernel panic on startup unless BIOS calls are disabled in the boot menu, and the graphics will default to 640x480 greyscale. Also, it doesn't have any network connectivity.</p><p>For networking, the list of supported NICs is available at <a href=https://web.archive.org/web/20010331170926/http://www.be.com/support/guides/beosreadylist_intel.html#network>BeOS Ready List - Intel » BeOS Ready Network Cards and Connections</a>. QEMU's default NIC emulates an Intel e1000, but it can also emulate the NE2000 family supported by BeOS.</p><blog-code syntax=commands><pre>
$ qemu-system-i386 -m 512M -drive file=beos-5.img -nic user,model=ne2k_pci</pre></blog-code><p>Once booted, go into the BeOS network preferences and enable DHCP. Click "restart networking" to let the changes take effect. The QEMU user-mode networking stack has built-in DHCP and DNS servers, so it doesn't matter how the host system is configured.</p><div><img src=https://john-millikin.com/by-sha256/269351bf8a51a8610f097306298af25ae77a7cc5ce4d1167e944a200739df5e2/Screen%20Shot%202022-07-08%20at%2014.04.03.png style=max-width:768px></div><p>Next up is fixing the default graphics and disabling BIOS calls. Open a terminal into <tt>/boot/home/config/settings/kernel/drivers</tt>. This directory configures the BeOS boot loader; the <tt>sample/</tt> directory contains example config files.</p><p>Using <tt>sample/kernel</tt> and <tt>sample/vesa</tt> as a guide, create two files in <tt>[...]/kernel/drivers</tt>:</p><ul><li>File <tt>kernel</tt> should contain <tt>bios_calls disabled</tt></li><li>File <tt>vesa</tt> should contain <tt>mode 1024 768 16</tt> (or whatever resolution you want)</li></ul><div><img src=https://john-millikin.com/by-sha256/3cddf2bbc2bd64c581b1a49d37bae78f22cfb27d7b3411285301c54739101a93/Screen%20Shot%202022-07-08%20at%2013.59.14.png style=max-width:768px></div><p>BeOS 5.0.3 comes with VIM 4.5, so features like `:split` are available.</p><div><img src=https://john-millikin.com/by-sha256/87a38401ad6456140b65c67549f282e077c80d3c7068b3edf22e81aef9ba00a4/Screen%20Shot%202022-07-08%20at%2014.00.20.png style=max-width:768px></div><p>Booting now works without any special boot menu selection, and the built-in web browser can be used to view any page that allows plaintext<blog-footnote-ref>[<a href=#fn:6>6</a>]</blog-footnote-ref> HTTP clients.</p><div><img src=https://john-millikin.com/by-sha256/37d10e31a25ee7a2834cb03ee320977601cc2d0ce877e63a8c7ee1e8ea21fff4/Screen%20Shot%202022-07-08%20at%2014.14.08.png style=max-width:768px></div></blog-section><blog-footnotes><hr><ol><li id=fn:1><p>Unfortunately for BeOS, technical innovation alone was not enough to pay the bills. After rejecting a $200 million buyout offer from Apple<blog-footnote-ref>[<a href=#fn:2>2</a>]</blog-footnote-ref>, Be Inc struggled to compete with hyper-competitive mid-90s Microsoft. It was swept away in the <a href=https://en.wikipedia.org/wiki/Dot-com_bubble#Bursting_of_the_bubble>dot-com crash</a>, and in 2001 the remaining assets were sold to Palm for $11 million<blog-footnote-ref>[<a href=#fn:3>3</a>]</blog-footnote-ref>.</p></li><li id=fn:2><p><a href=https://www.mercurynews.com/2014/08/29/1996-jobs-is-back-at-apple/>The Mercury News: Jobs is back at Apple</a></p></li><li id=fn:3><p><a href=https://www.latimes.com/archives/la-xpm-2002-jan-01-fi-be1-story.html>Los Angeles Times: Be Inc. Founder Leaving as Firm Nears Closure</a></p></li><li id=fn:4><p>Invoking QEMU with <tt>-drive media=cdrom,file=BeOS_Tools.bin</tt> will fail to locate a boot sector.</p><div><img src=https://john-millikin.com/by-sha256/ba9351ca9739e4978175641820506d9919c024fdb5c3a1dc4401aff4c40ec70a/Screen%20Shot%202022-07-08%20at%2013.01.11.png style=max-width:768px></div></li><li id=fn:5><p>The Linux build of Chrome is slightly larger than BeOS 5, and it doesn't even include an OpenGL teapot demo.</p><blog-code syntax=commands><pre>du -s --si /opt/google/chrome/
# 278M /opt/google/chrome/</pre></blog-code></li><li id=fn:6><p>Minimum requirements for HTTPS have evolved somewhat in the 20 years since the NetPositive browser last saw development, so the modern web can only be accessed via a MITM proxy.</p></li></ol></blog-footnotes></blog-article>2022-07-09T06:51:22ZGmail accepts forged YouTube emails2022-06-01T01:36:59Zurn:uuid:3e3f4197-df0a-45b9-b70c-e34dde015c8d<blog-article posted=2022-06-01T01:36:59Z><h1 slot=title>Gmail accepts forged YouTube emails</h1><div slot=tableofcontents></div><p>This morning I woke up to an official-looking email from YouTube in my inbox, addressed to an address that isn't mine.</p><p style=display:flex><a href=https://john-millikin.com/by-sha256/b83b6b7be5a258d456e0430f509719c67a6e93ec71871ab43f05c77672f46fef/screenshot.png><img src=https://john-millikin.com/by-sha256/f7a53000766fa634632c395dfc3642ac911432cfb38d00ea9862d16330f994cc/screenshot-small.jpg style="max-width:500px;border:2px solid blue;padding:.5em;align-self:flex-start;margin:1em"></a>
<a href=https://john-millikin.com/by-sha256/335eb94570a5883a1a008f93a2e541fcdac9646b9e0f2cfe660e69357488c6d3/dkim-dmarc-pass.png><img src=https://john-millikin.com/by-sha256/335eb94570a5883a1a008f93a2e541fcdac9646b9e0f2cfe660e69357488c6d3/dkim-dmarc-pass.png style="max-width:500px;border:2px solid blue;padding:.5em;align-self:flex-start;margin:1em"></a></p><p>Long ago this sort of thing would happen if someone sent an email with forged headers<blog-footnote-ref>[<a href=#fn:1>1</a>]</blog-footnote-ref> (e.g. to fish for logins), but the advent of <a href=https://en.wikipedia.org/wiki/DomainKeys_Identified_Mail>DKIM</a> and <a href=https://en.wikipedia.org/wiki/DMARC>DMARC</a> has relegated header forging to ancient history. I was greatly surprised to see that the forged email had passed Gmail's DKIM/DMARC checks.</p><p>A selection of the email's headers (<a href=https://john-millikin.com/by-sha256/1b1955f1efca511c54b29a0aaed74e3de26ac3b0a5ff974337e63a2d32f21a94/original-email.txt>full email</a>) shows that it was accepted as coming from <tt>youtube.com</tt>, despite being received from <tt>robtoledoyour.com</tt>. I'm not familiar enough with the details of email authentication to say <i>why</i> this passed, but it seems pretty clear that something has gone wrong.</p><blog-code><pre>
Delivered-To: jmillikin@gmail.com
Received: by 2002:a19:6d05:0:0:0:0:0 with SMTP id i5csp3611067lfc;
Tue, 31 May 2022 10:35:25 -0700 (PDT)
From: YouTube <no-reply@youtube.com>
To: alltimecaptaincool2019@gmail.com
Date: Fri, 26 Nov 2021 22:16:25 -0800
[...]
ARC-Authentication-Results: i=2; mx.google.com;
dkim=pass header.i=@robtoledoyour.com header.s=prime header.b=On+Vo8dl;
dkim=pass header.i=@youtube.com header.s=20210112 header.b=xGMHx3cn;
arc=pass (i=1 spf=pass spfdomain=scoutcamp.bounces.google.com dkim=pass dkdomain=youtube.com dmarc=pass fromdomain=youtube.com);
spf=pass (google.com: domain of postalerts@robtoledoyour.com designates 2a01:7c8:bb01:51a::7 as permitted sender) smtp.mailfrom=postalerts@robtoledoyour.com;
dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=youtube.com
Return-Path: <postalerts@robtoledoyour.com>
Received: from 7n.robtoledoyour.com (7n.robtoledoyour.com. [2a01:7c8:bb01:51a::7])
</pre></blog-code><p>Whoever is behind this has been active since at least August 2021 – I found references to that <tt>from:</tt> address on Twitter and Reddit:<ul><li>[2021-08-19] <a href=https://old.reddit.com/r/indiasocial/comments/p7jby3/is_this_some_new_kind_of_spam_or_a_glitch/>https://old.reddit.com/r/indiasocial/comments/p7jby3/is_this_some_new_kind_of_spam_or_a_glitch/</a> (<a href=https://archive.ph/MtU14>archive</a>)</li><li>[2022-03-17] <a href=https://twitter.com/CTF/status/1504458796206374918>https://twitter.com/CTF/status/1504458796206374918</a> (<a href=https://archive.ph/diofK>archive</a>)</li><li>[2022-03-18] <a href=https://twitter.com/jurasick/status/1504812649967767556>https://twitter.com/jurasick/status/1504812649967767556</a> (<a href=https://archive.ph/m9jG4>archive</a>)</li><li>[2022-03-18] <a href=https://twitter.com/JayChandran_/status/1504819850044092419>https://twitter.com/JayChandran_/status/1504819850044092419</a> (<a href=https://archive.ph/xSESm>archive</a>)</li></ul></p><p>The <tt>robtoledoyour.com</tt> domain is registered to an address in India. I find this notable, given that the first report of a <tt>alltimecaptaincool2019@gmail.com</tt> email impersonated Amazon.in and was posted in Reddit's /r/indiasocial forum. Also, the YouTube-style email mentions India-specific regulation. Finally, the domain was registered one month before the report on Reddit.</p><blog-section><h2 slot=title>Snapshots of WHOIS and DNS</h2><blog-code><pre>
$ whois robtoledoyour.com
% IANA WHOIS server
% for more information on IANA, visit http://www.iana.org
% This query returned 1 object
[...]
Domain Name: ROBTOLEDOYOUR.COM
Registry Domain ID: 2626055284_DOMAIN_COM-VRSN
Registrar WHOIS Server: whois.name.com
Registrar URL: http://www.name.com
Updated Date: 2021-07-12T06:25:22Z
Creation Date: 2021-07-12T06:25:22Z
Registrar Registration Expiration Date: 2022-07-12T06:25:22Z
Registrar: Name.com, Inc.
Registrar IANA ID: 625
Reseller:
Domain Status: clientTransferProhibited https://www.icann.org/epp#clientTransferProhibited
Registry Registrant ID: Not Available From Registry
Registrant Name: Natarajan K kannan
Registrant Organization:
Registrant Street: 79-1/43-1,Matha sannathi street
Registrant City: Tirunelveli
Registrant State/Province: TN
Registrant Postal Code: 627006
Registrant Country: IN
Registrant Phone: Non-Public Data
Registrant Email: https://www.name.com/contact-domain-whois/robtoledoyour.com/registrant
Registry Admin ID: Not Available From Registry
Admin Name: Natarajan K kannan
Admin Organization:
Admin Street: 79-1/43-1,Matha sannathi street
Admin City: Tirunelveli
Admin State/Province: TN
Admin Postal Code: 627006
Admin Country: IN
Admin Phone: Non-Public Data
Admin Email: https://www.name.com/contact-domain-whois/robtoledoyour.com/admin
Registry Tech ID: Not Available From Registry
Tech Name: Natarajan K kannan
Tech Organization:
Tech Street: 79-1/43-1,Matha sannathi street
Tech City: Tirunelveli
Tech State/Province: TN
Tech Postal Code: 627006
Tech Country: IN
Tech Phone: Non-Public Data
Tech Email: https://www.name.com/contact-domain-whois/robtoledoyour.com/tech
Name Server: ns1dns.name.com
Name Server: ns2fwz.name.com
Name Server: ns3bfm.name.com
Name Server: ns4clq.name.com
DNSSEC: unSigned
Registrar Abuse Contact Email: abuse@name.com
Registrar Abuse Contact Phone: +1.7203101849
URL of the ICANN WHOIS Data Problem Reporting System: http://wdprs.internic.net/
>>> Last update of WHOIS database: 2022-05-31T22:52:19Z <<<
</pre></blog-code><p></p><blog-code><pre>
$ dig robtoledoyour.com MX
[...]
;; ANSWER SECTION:
robtoledoyour.com. 300 IN MX 10 mail.redrool.com.
;; Query time: 134 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Wed Jun 01 08:18:51 JST 2022
;; MSG SIZE rcvd: 75
</pre></blog-code><p>The MX domain <tt>mail.redrool.com</tt> is registered by NameCheap, doesn't have public WHOIS data, and was registered in 2013. If I had to speculate, I'd say this domain is unrelated and is merely being taken advantage of as an <a href=https://en.wikipedia.org/wiki/Open_mail_relay>open relay</a>.</p><blog-code><pre>
$ whois redrool.com
% IANA WHOIS server
% for more information on IANA, visit http://www.iana.org
% This query returned 1 object
[...]
Domain name: redrool.com
Registry Domain ID: 1827884879_DOMAIN_COM-VRSN
Registrar WHOIS Server: whois.namecheap.com
Registrar URL: http://www.namecheap.com
Updated Date: 2021-07-29T08:50:59.21Z
Creation Date: 2013-09-17T10:28:13.00Z
Registrar Registration Expiration Date: 2022-09-17T10:28:13.00Z
Registrar: NAMECHEAP INC
Registrar IANA ID: 1068
Registrar Abuse Contact Email: abuse@namecheap.com
Registrar Abuse Contact Phone: +1.9854014545
Reseller: NAMECHEAP INC
Domain Status: clientTransferProhibited https://icann.org/epp#clientTransferProhibited
Registry Registrant ID:
Registrant Name: Redacted for Privacy
Registrant Organization: Privacy service provided by Withheld for Privacy ehf
Registrant Street: Kalkofnsvegur 2
Registrant City: Reykjavik
Registrant State/Province: Capital Region
Registrant Postal Code: 101
Registrant Country: IS
Registrant Phone: +354.4212434
Registrant Phone Ext:
Registrant Fax:
Registrant Fax Ext:
Registrant Email: a95454e67a0c42f988e530f0aeaa91d5.protect@withheldforprivacy.com
Registry Admin ID:
Admin Name: Redacted for Privacy
Admin Organization: Privacy service provided by Withheld for Privacy ehf
Admin Street: Kalkofnsvegur 2
Admin City: Reykjavik
Admin State/Province: Capital Region
Admin Postal Code: 101
Admin Country: IS
Admin Phone: +354.4212434
Admin Phone Ext:
Admin Fax:
Admin Fax Ext:
Admin Email: a95454e67a0c42f988e530f0aeaa91d5.protect@withheldforprivacy.com
Registry Tech ID:
Tech Name: Redacted for Privacy
Tech Organization: Privacy service provided by Withheld for Privacy ehf
Tech Street: Kalkofnsvegur 2
Tech City: Reykjavik
Tech State/Province: Capital Region
Tech Postal Code: 101
Tech Country: IS
Tech Phone: +354.4212434
Tech Phone Ext:
Tech Fax:
Tech Fax Ext:
Tech Email: a95454e67a0c42f988e530f0aeaa91d5.protect@withheldforprivacy.com
Name Server: ara.ns.cloudflare.com
Name Server: george.ns.cloudflare.com
DNSSEC: unsigned
URL of the ICANN WHOIS Data Problem Reporting System: http://wdprs.internic.net/
>>> Last update of WHOIS database: 2022-05-31T18:17:35.20Z <<<
</pre></blog-code></blog-section><blog-footnotes><hr><ol><li id=fn:1><p>Email was designed without any sort of security or authentication. I remember reading an IRC story, now lost, in which a student emails their professor from deadguy@yourhouse with the message "Help! I'm dead and I'm in your house!".</li></ol></blog-footnotes></blog-article>2022-06-01T01:36:59ZCompacting Lunr search indices2022-05-27T10:06:02Zurn:uuid:402b1c16-0087-4280-90c2-29fc17b067fc<style type=text/css>li{margin:.5em 0}p,li{line-height:1.5}</style><blog-article posted=2022-05-27T10:06:02Z><h1 slot=title>Compacting Lunr search indices</h1><div slot=tableofcontents></div><p><a href=https://lunrjs.com/>Lunr</a> is a small JavaScript library for full-text search, which I recently used to implement client-side search for this site. The user experience of client-side search depends in part on how large the search index is, and Lunr's default JSON encoding is more verbose than it needs to be. This page describes a more compact encoding that can reduce the serialized index size by about 40%.</p><blog-section><p>I'll be using the <a href=https://www.gutenberg.org>Project Gutenberg</a> editions of <a href=https://www.gutenberg.org/ebooks/2701>Moby Dick</a> and <a href=https://www.gutenberg.org/ebooks/1342>Pride and Prejudice</a> as the example search corpus, with the Project Gutenberg metadata and licensing information trimmed off.</p><blog-code syntax=commands><pre>
curl -L -O https://www.gutenberg.org/files/1342/1342-0.txt
curl -L -O https://www.gutenberg.org/files/2701/2701-0.txt
shasum -a 256 *.txt
# c3fc0e1900e233a0c3c6ca5784a54a3d3aaf00d40603315c644487bd7a07e22f 1342-0.txt
# 61d5ab6a3910fab66eabc9d2fc708b68b756199cb754fd5ff51751dbe5f766cd 2701-0.txt
tail -n +168 1342-0.txt | head -n 14060 > pride-and-prejudice.txt
tail -n +337 2701-0.txt | head -n 21624 > moby-dick.txt</pre></blog-code><p>A basic Lunr indexing program uses a <a href=https://lunrjs.com/docs/lunr.Builder.html><tt>lunr.Builder</tt></a> to assemble the index, then converts it to JSON with <tt>toJSON()</tt>.</p><blog-code syntax=javascript><pre>
import * as fs from "fs";
import lunr from "lunr";
function main(argv) {
if (argv.length < 3) {
console.error("usage: lunr-index FILES...");
process.exit(1);
}
const fileNames = argv.slice(2);
const idx = lunr((builder) => {
builder.ref("name");
builder.field("text");
builder.pipeline.remove(lunr.stemmer);
fileNames.forEach((fileName => {
builder.add({
name: fileName,
text: fs.readFileSync(fileName)
});
}));
});
process.stdout.write(JSON.stringify(indexToJSON(idx)));
}
function indexToJSON(idx) {
return idx.toJSON();
}
main(process.argv);</pre></blog-code><p>The resulting index JSON is about 1.68 MB raw, 250 KB with gzip, or 192 KB with Brotli. These two compression formats are useful because they can be consumed directly by common web browsers.</p><p>I also tried compressing it with zstd, which I expected to provide the best compression ratio, but in this case Brotli performed better.</p><blog-code syntax=commands><pre>
node lunr-index-v1.js moby-dick.txt pride-and-prejudice.txt > index-v1.json
gzip -9k index-v1.json
brotli -q 11 -w 24 index-v1.json
zstd -19 index-v1.json
# index-v1.json : 12.76% ( 1.61 MiB => 211 KiB, index-v1.json.zst)
ls -lS --si index-v1.*
# -rw-r--r-- 1 john john 1.7M May 27 16:08 index-v1.json
# -rw-r--r-- 1 john john 249k May 27 16:08 index-v1.json.gz
# -rw-r--r-- 1 john john 216k May 27 16:08 index-v1.json.zst
# -rw-r--r-- 1 john john 192k May 27 16:08 index-v1.json.br</pre></blog-code></blog-section><blog-section><h2>Compacting the inverted index</h2><p>The Lunr index JSON contains two big lists, <tt>fieldVectors</tt> and <tt>invertedIndex</tt>. Of those, the inverted index is far bigger, and has more opportunity for space savings.</p><div style=display:flex><blog-code syntax=json style=order:1;width:50%><pre>
{
"fields": ["text"],
"fieldVectors": [
[ "text/moby-dick.txt", [ /* ... */ ] ],
[ "text/pride-and-prejudice.txt", [ /* ... */ ] ] ],
"invertedIndex": [
[ "1", {
"_index": 1363,
"text": {
"moby-dick.txt": {},
"pride-and-prejudice.txt": {}
} } ],
[ "1,000,000", {
"_index": 7298,
"text": {
"moby-dick.txt": {}
} } ],
// ...
] }</pre></blog-code><div style=order:0;margin-right:1em;width:50%><p>Notice how much duplicate text there is:<ul><li>The index is semantically a list ordered by when the token was first observed, but is stored sorted by the token value. The <tt>_index</tt> property wouldn't be needed if the index was stored unsorted.</li><li>Field names (here, <tt>"text"</tt>) are repeated for each token. These could be indexes to the top-level <tt>fields</tt> property instead. Or, since the set of fields is bounded and small, the field indexes could be implied by list ordering.</li><li>Document references (such as <tt>"moby-dick.txt"</tt>) are also repeated, and when combined with field names could be indexes into the <tt>fieldVectors</tt> property.</li></ul></p></div></div><p>The following code applies these basic transformations to the <tt>invertedIndex</tt> property.</p><blog-code syntax=javascript><pre>
function indexToJSON(idx) {
const output = idx.toJSON();
output.invertedIndex = compactInvIndex(output);
return output;
}
function compactInvIndex(index) {
const fields = index["fields"];
const fieldVectorIdxs = new Map(index["fieldVectors"].map((v, idx) => {
return [v[0], idx];
}));
const items = new Map(index["invertedIndex"].map(item => {
const token = item[0];
const props = item[1];
const newItem = [token];
fields.forEach((field) => {
const fProps = props[field];
const matches = [];
Object.keys(fProps).forEach((docRef) => {
const fieldVectorIdx = fieldVectorIdxs.get(`${field}/${docRef}`);
if (fieldVectorIdx === undefined) {
throw new Error();
}
matches.push(fieldVectorIdx);
matches.push(fProps[docRef]);
});
newItem.push(matches);
});
return [props["_index"], newItem];
}));
const indexes = Array.from(items.keys()).sort((a, b) => a - b);
const compacted = Array.from(indexes, (k) => {
const item = items.get(k);
if (item === undefined) {
throw new Error();
}
return item;
});
return compacted;
}</pre></blog-code><p>The raw (uncompressed) index size is reduced by over 50%, and compressed sizes by about 25%.</p><blog-code syntax=commands><pre>
node lunr-index-v2.js moby-dick.txt pride-and-prejudice.txt > index-v2.json
[...]
ls -lS --si index-v2.*
# -rw-r--r-- 1 john john 751k May 27 16:17 index-v2.json
# -rw-r--r-- 1 john john 188k May 27 16:17 index-v2.json.gz
# -rw-r--r-- 1 john john 162k May 27 16:17 index-v2.json.zst
# -rw-r--r-- 1 john john 145k May 27 16:17 index-v2.json.br</pre></blog-code></blog-section><blog-section><h2>Compacting the field vectors</h2><div style=display:flex><div style=margin-right:1em;width:50%><p>The field vectors are semantically a key-value map, where the keys are integer indexes into the inverted index. There isn't much here to optimize in terms of plain text, but a simple tweak can improve how compressible the data is.</p><p>Consider a file that is simply a big list of integers: <tt>1,2,3,4,6,7</tt> and so on. A generic compression function can take advantage of the limited character set, but doesn't have the semantic understanding to encode "next integer in sequence". However, if the original file is changed such that sequential integers use a static sentinel value, then repetitions of the sentinel can be greatly compressed.</p></div><blog-code syntax=json style=width:50%><pre>
"fieldVectors": [
[ "text/moby-dick.txt", [
0, 0.607,
1, 0.356,
2, 0.382,
// ...
] ],
[ "text/pride-and-prejudice.txt", [
1, 0.278,
2, 0.383,
6, 0.278,
// ...
] ] ]</pre></blog-code></div><p>For Lunr indexes, because the inverted index is expanded as new tokens are observed, there are likely to be long runs of sequential integers in the first "column" of the field vectors. They can be replaced with <tt>null</tt>.</p><blog-code syntax=javascript><pre>
function indexToJSON(idx) {
const output = idx.toJSON();
output.invertedIndex = compactInvIndex(output);
output.fieldVectors = compactVectors(output);
return output;
}
function compactVectors(index) {
return index["fieldVectors"].map((item) => {
const id = item[0];
const vectors = item[1];
let prev = null;
const compacted = vectors.map((v, ii) => {
if (ii % 2 === 0) {
if (prev !== null && v === prev + 1) {
prev += 1;
return null;
}
prev = v;
}
return v;
});
return [id, compacted];
});
}</pre></blog-code><p>This optimization shaves another 20% to 25% from the compressed file sizes.</p><blog-code syntax=commands><pre>
node lunr-index-v3.js moby-dick.txt pride-and-prejudice.txt > index-v3.json
[...]
ls -lS --si index-v3.*
# -rw-r--r-- 1 john john 741k May 27 16:27 index-v3.json
# -rw-r--r-- 1 john john 137k May 27 16:27 index-v3.json.gz
# -rw-r--r-- 1 john john 120k May 27 16:27 index-v3.json.zst
# -rw-r--r-- 1 john john 118k May 27 16:27 index-v3.json.br</pre></blog-code></blog-section><blog-section><h2>Recovering the original index data</h2><p>Lunr can't directly consume the compacted form of its search index, so we need to reverse the above optimizations before calling <tt>lunr.Index.load()</tt>.</p><blog-code syntax=javascript><pre>
import * as fs from "fs";
function main(argv) {
if (argv.length !== 3) {
console.error("usage: lunr-index-expand FILE");
process.exit(1);
}
const compactIndex = JSON.parse(fs.readFileSync(argv[2]));
process.stdout.write(JSON.stringify(expand(compactIndex)));
}
function expand(compact) {
const fields = compact["fields"];
const fieldVectors = compact["fieldVectors"].map((item) => {
const id = item[0];
const vectors = item[1];
let prev = null;
const expanded = vectors.map((v, ii) => {
if (ii % 2 === 0) {
if (v === null) {
v = prev + 1;
}
prev = v;
}
return v;
});
return [id, expanded];
});
const invertedIndex = compact["invertedIndex"].map((item, itemIdx) => {
const token = item[0];
const fieldMap = {"_index": itemIdx};
fields.forEach((field, fieldIdx) => {
const matches = {};
let docRef = null;
item[fieldIdx + 1].forEach((v, ii) => {
if (ii % 2 === 0) {
docRef = fieldVectors[v][0].slice(`${field}/`.length);
} else {
matches[docRef] = v;
}
});
fieldMap[field] = matches;
})
return [token, fieldMap];
});
invertedIndex.sort((a, b) => {
if (a[0] < b[0]) {
return -1;
}
if (a[0] > b[0]) {
return 1;
}
return 0;
});
return {
"version": compact["version"],
"fields": fields,
"fieldVectors": fieldVectors,
"invertedIndex": invertedIndex,
"pipeline": compact["pipeline"],
};
}
main(process.argv);</pre></blog-code><p>The expanded output is identical to the original search index JSON.</p><blog-code syntax=commands><pre>
node lunr-index-expand.js index-v3.json > index-v3-expanded.json
shasum -a 256 index-v1.json index-v3-expanded.json
# a6f96e4046152213c0c41a12dc83522a85f91db6603c34cb8b85174efc3ade3f index-v1.json
# a6f96e4046152213c0c41a12dc83522a85f91db6603c34cb8b85174efc3ade3f index-v3-expanded.json</pre></blog-code></blog-section></blog-article>2022-05-27T10:06:02ZJSON is not a YAML subset2022-05-17T05:40:23Zurn:uuid:472c8484-6482-4130-b2a1-3aabf2e110a4<blog-article posted=2022-05-17T05:40:23Z><h1 slot=title>JSON is not a YAML subset</h1><div slot=tableofcontents></div><p>People on the internet believe that JSON is a subset of YAML, and that it's safe to parse JSON using a YAML parser:</p><div><img src=https://john-millikin.com/by-sha256/ee63b2b624eee575eb02a86aa9b30c99d82bedeb06090183be7eba440af4196a/stackoverflow-1726802.png style=max-width:800px;margin:.5em>
<img src=https://john-millikin.com/by-sha256/23fe072b5a626913777f0fd855468def024fac47d8050b32cab03c6e071f6418/hn-12797477.png style=max-width:800px;margin:.5em>
<img src=https://john-millikin.com/by-sha256/38c4831ec27268409254c256f98044fe61abf199cb4658459c251c50110845ef/hn-24371948.png style=max-width:800px;margin:.5em></div><p>Following this advice will end badly because JSON is not a subset of YAML. It is easy to construct JSON documents that (1) fail to parse as YAML, or (2) parse to valid but semantically different YAML. The second case is more dangerous because it's difficult to detect.</p><blog-section><h2>False has over "1.7e3" named fjords</h2><p>YAML (infamously) allows string scalars to be unquoted. A conforming YAML parser, presented with a token known to contain a scalar value, must match that token against a set of patterns and then <i>fall back</i> to treating it as a string. This behavior produces surprising outcomes, and has been named <a href=https://hitchdev.com/strictyaml/why/implicit-typing-removed/>The Norway Problem</a>.</p><blog-code syntax=commands prompt=">>" output-prefix=@><pre>
@$ irb-3.1.2
require 'yaml'
@=> true
YAML.load '[FI,NO,SE]'
@=> ["FI", false, "SE"]</pre></blog-code><p>A similar issue affects JSON documents passed to a YAML parser when dealing with numbers in exponential notation. The YAML 1.1 spec is stricter about the syntax of numbers than JSON: <tt>1e2</tt> is a valid JSON number, but YAML 1.1 requires it to be written as <tt>1.0e+2</tt>. Being an invalid number, the YAML parser will treat it as a string.</p><blog-code syntax=commands prompt=">>" output-prefix=@><pre>
@$ irb-3.1.2
require 'json'
@=> true
require 'yaml'
@=> true
JSON.load '{"a": 1e2}'
@=> {"a"=>100.0}
YAML.load '{"a": 1e2}'
@=> {"a"=>"1e2"}</pre></blog-code><p></p></blog-section><blog-section><h2>YAML 1.2 won't save you</h2><p>YAML 1.2 is a revision to the YAML spec that (among other goals) aims to make YAML a proper superset of JSON. To maintain backwards compatibility with existing YAML documents, the version is specified in a <tt>%YAML</tt> directive.</p><blog-code syntax=yaml><pre>
---
a: 1e2 # document["a"] == "1e2"
b: no # document["b"] == false
</pre></blog-code><p></p><blog-code syntax=yaml><pre>
%YAML 1.2
---
a: 1e2 # document["a"] == 100
b: no # document["b"] == "no"
</pre></blog-code><p>Regardless of whether YAML 1.2 has been (or will be) widely adopted, it does not help those who want to parse a JSON document with a YAML parser. JSON documents do not start with <tt>%YAML</tt>, and therefore cannot opt-in to the YAML parser behavior that would permit correct parsing of JSON.</p></blog-section></blog-article>2022-05-17T05:40:23ZStateless Kubernetes overlay networks with IPv62021-02-20T07:08:48Zurn:uuid:355f5ef2-a16c-4340-8290-46f74229b251<style type=text/css scoped>li{margin:.5em 0}p,li{line-height:1.5}tt{white-space:nowrap}table.packet{color:#000;background-color:#fff;width:600px;border:1px solid #000;border-collapse:collapse;table-layout:fixed;margin:0 auto}table.packet th{background-color:lightgrey}table.packet tr,table.packet td,table.packet th{border:1px solid #000;text-align:center}table.packet td{padding:2px}</style><blog-article posted=2021-02-20T07:08:48Z><h1 slot=title>Stateless Kubernetes overlay networks with IPv6</h1><div slot=summary><p>The <a href=https://kubernetes.io/docs/concepts/cluster-administration/networking/>Kubernetes network model</a> is typically implemented by an overlay network, which allows pods to have an IP address decoupled from the underlying fabric. There's dozens of different overlay network implementations that combine a stateful IPv4 address allocator with VXLAN as a transport layer. IPv4 overlay networks have a number of well-documented drawbacks, which contributes to Kubernetes' reputation as difficult to operate beyond small cluster sizes (~10,000 machines).</p><p>This page describes an overlay network based on stateless IPv6 tunnels, which have better reliability and scalability characteristics than stateful IPv4 overlays. It uses IETF protocols that are natively supported by the Linux kernel, and since it is independent of Kubernetes itself can support communcication between processes both inside and outside of containers.</p></div><blog-section><h2 slot=title>Wire protocol</h2><table class=packet style="float:right;margin:0 0 3em 1em"><col style=width:32px><tbody><tr><td></td><th colspan=4>IPv4 header</th></tr><tr><th>0</th><td rowspan=2 colspan=4>(other IPv4 control fields)</td></tr><tr><th>4</th></tr><tr><th>8</th><td>TTL</td><td>IP protocol (UDP)</td><td colspan=2>IP checksum</td></tr><tr><th>12</th><td colspan=4>IPv4 source address</td></tr><tr><th>16</th><td colspan=4>IPv4 destination address</td></tr><tr><td></td><th colspan=4>UDP header</th></tr><tr><th>20</th><td colspan=2>source port</td><td colspan=2>destination port (3544)</td></tr><tr><th>24</th><td colspan=2>UDP length</td><td colspan=2>UDP checksum</td></tr><tr><td></td><th colspan=4>IPv6 header</th></tr><tr><th>28</th><td rowspan=2 colspan=4>IPv6 control fields</td></tr><tr><th>32</th></tr><tr><th>36</th><td rowspan=4 colspan=4>IPv6 source address</td></tr><tr><th>40</th></tr><tr><th>44</th></tr><tr><th>48</th></tr><tr><th>52</th><td rowspan=4 colspan=4>IPv6 destination address</td></tr><tr><th>56</th></tr><tr><th>60</th></tr><tr><th>64</th></tr><tr><td></td><th colspan=4>IPv6 payload</th></tr></tbody></table><p><a href=https://en.wikipedia.org/wiki/6to4>6to4</a> (<a href=https://tools.ietf.org/html/rfc3056>RFC 3056</a>) is a standard for routing IPv6 traffic over an IPv4 network. It was originally designed as part of the IPv6 migration strategy, allowing isolated IPv6-only networks to use existing internet infrastructure. The protocol is extremely simple – the IPv6 packet is treated as an IPv4 payload, using protocol number 41.</p><p><a href=https://en.wikipedia.org/wiki/Teredo_tunneling>Teredo</a> (<a href=https://tools.ietf.org/html/rfc4380>RFC 4380</a>) extends 6to4 by adding a layer of UDP encapsulation, which can improve compatibility with intermediate network devices that have compatibility issues with non-TCP/UDP protocols. This page assumes use of Teredo, but if the underlying network allows 6to4 (protocol 41) then the UDP encap can be turned off to save 8 bytes per packet.</p><p>The Linux kernel has built-in support for creating 6to4 tunnels in the <tt>sit</tt> driver. Such tunnels can optionally use the Teredo protocol by enabling the <a href=https://lwn.net/Articles/614348/>Foo Over UDP</a> (FOU) mode, which is a setting for Linux tunnel drivers that encapsulates packets in UDP. FOU computes synthetic source ports for outbound packets based on the encapsulated packet's connection tuple, thus allowing intermediate routers to distinguish underlying streams (e.g. for link aggregation or flow control).</p></blog-section><blog-section><h2 slot=title>Pod address allocation</h3><p>The 6to4 wire protocol describes how to encapsulate IPv6 packets, but doesn't mandate how IPv6 addresses should be assigned or how a router should calculate the IPv4 address of the destination<blog-footnote-ref>[<a href=#fn:1>1</a>]</blog-footnote-ref>. For that we can use <a href=https://en.wikipedia.org/wiki/IPv6_rapid_deployment>6rd</a> (<a href=https://tools.ietf.org/html/rfc5969>RFC 5969</a>), which is a flexible embedding of the IPv4 address space into IPv6.</p><p>Allocating pod addresses with 6rd has a number of helpful properties:</p><ul><li>Given a pod's IPv6 address, its host's IPv4 address can be computed mechanically by the kernel. There is no userspace routing component.</li><li>Each host IPv4 address maps to a 64-bit IPv6 network prefix. Pod IPs can be allocated from this prefix by the CNI <tt>host-local</tt> IPAM plugin without any risk of conflict.</li><li>IPv6 addresses can be allocated from a <a href=https://en.wikipedia.org/wiki/Unique_local_address>Unique Local Address</a> (<a href=https://tools.ietf.org/html/rfc4193>RFC 4193</a>) range, which is similar to IPv4 private address ranges (e.g. 10.0.0.0/8).</li><li>A host's IPv4 address can have its high bits masked off, which is useful when every IPv4 address is being allocated from the same CIDR block (e.g. a private network).</li></ul><p>Unfortunately the 6rd functionality of iproute2 is not well documented, and the error messages are opaque netlink error codes. When in doubt, I recommend examining the iproute2 and Linux kernel source code to understand how <tt>ip tunnel 6rd</tt> commands map to netlink parameters.</p></blog-section><blog-section><h2 slot=title>Setting up a 6to4 overlay</h2><blog-section><h3 slot=title>Generate a network prefix</h3><p>To use a ULA range as a 6rd prefix, each IPv4 address must be masked to 16 bits or less. For this page I'll be using IPv4 addresses in the 10.0.0.0/8 range, which masks to 24 bits, so the ULA prefix needs to have its length fudged a bit (40 bits => 32).</p><blog-code syntax=commands><pre>
python3 -c 'import os; print("".join("%02x" % b for b in os.urandom(4)))'
# 8ce4b05e</pre></blog-code><p>Converting this to ULA yields an IPv6 network prefix of <tt>fd8c:e4b0:5e00::/40</tt>.</p></blog-section><blog-section><h3 slot=title>Create the SIT interface</h3><p>I'm not going to be stepping through each of these commands, so for folks not familar with Linux networking I recommend opening the <a href=https://man7.org/linux/man-pages/man8/ip-link.8.html>ip-link(8)</a> and <a href=https://man7.org/linux/man-pages/man8/ip-tunnel.8.html>ip-tunnel(8)</a> manpages to follow along. The only thing to note is that the SIT interface is being created <i>without</i> a remote address – this is an overlay, not a tunnel.</p><p>There's two machines here, <tt>node-a</tt> and <tt>node-b</tt>, which will be set up with identical configuration (adjusted for their different IPv4 addresses):</p><ul><li>IPv4 address <tt>10.1.1.100</tt> maps to IPv6 prefix <tt>fd8c:e4b0:5e01:0164::/40</tt>.</li><li>IPv4 address <tt>10.1.1.101</tt> maps to IPv6 prefix <tt>fd8c:e4b0:5e01:0165::/40</tt>.</li></ul><blog-code syntax=commands prompt=root@node-a:~#><pre>
ip addr show ens37
# 3: ens37: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
# link/ether 00:50:56:36:1e:3d brd ff:ff:ff:ff:ff:ff
# inet 10.1.1.100/24 brd 10.1.1.255 scope global ens37
# valid_lft forever preferred_lft forever
# inet6 fe80::250:56ff:fe36:1e3d/64 scope link
# valid_lft forever preferred_lft forever
ip tunnel add kubetunnel0 \
# mode sit \
# local '10.1.1.100' \
# ttl 64
ip tunnel 6rd dev kubetunnel0 \
# 6rd-prefix 'fd8c:e4b0:5e00::/40' \
# 6rd-relay_prefix '10.0.0.0/8'
ip -6 addr add 'fd8c:e4b0:5e01:0164::1/40' dev kubetunnel0
ip link set kubetunnel0 up
ip -6 addr delete '::10.1.1.100/96' dev kubetunnel0
ip addr show dev kubetunnel0
# 6: kubetunnel0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
# link/sit 10.1.1.100 brd 0.0.0.0
# inet6 fd8c:e4b0:5e01:164::1/40 scope global
# valid_lft forever preferred_lft forever</pre></blog-code><p> </p><blog-code syntax=commands prompt=root@node-b:~#><pre>
ip addr show ens37
# 3: ens37: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
# link/ether 00:50:56:3f:94:51 brd ff:ff:ff:ff:ff:ff
# inet 10.1.1.101/24 brd 10.1.1.255 scope global ens37
# valid_lft forever preferred_lft forever
# inet6 fe80::250:56ff:fe3f:9451/64 scope link
# valid_lft forever preferred_lft forever
ip tunnel add kubetunnel0 \
# mode sit \
# local '10.1.1.101' \
# ttl 64
ip tunnel 6rd dev kubetunnel0 \
# 6rd-prefix 'fd8c:e4b0:5e00::/40' \
# 6rd-relay_prefix '10.0.0.0/8'
ip -6 addr add 'fd8c:e4b0:5e01:0165::1/40' dev kubetunnel0
ip link set kubetunnel0 up
ip -6 addr delete '::10.1.1.101/96' dev kubetunnel0
ip addr show dev kubetunnel0
# 5: kubetunnel0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
# link/sit 10.1.1.101 brd 0.0.0.0
# inet6 fd8c:e4b0:5e01:165::1/40 scope global
# valid_lft forever preferred_lft forever</pre></blog-code></blog-section><blog-section><h3 slot=title>Test 6to4 functionality</h3><p>Before going further, let's take the <tt>kubetunnel0</tt> devices for a spin and make sure they're able to route packets. Any protocol encapsulated by IPv6 should work (here I test ICMPv6, TCP, and UDP).</p><blog-code syntax=commands prompt=root@node-a:~#><pre>
ping6 -c 1 fd8c:e4b0:5e01:165::1
# PING fd8c:e4b0:5e01:165::1(fd8c:e4b0:5e01:165::1) 56 data bytes
# 64 bytes from fd8c:e4b0:5e01:165::1: icmp_seq=1 ttl=64 time=0.456 ms
#
# --- fd8c:e4b0:5e01:165::1 ping statistics ---
# 1 packets transmitted, 1 received, 0% packet loss, time 0ms
# rtt min/avg/max/mdev = 0.456/0.456/0.456/0.000 ms</pre></blog-code><p></p><blog-code syntax=commands prompt=root@node-b:~#><pre>
nc -6lvn -p 1234
# Listening on [::] (family 10, port 1234)</pre></blog-code><p></p><blog-code syntax=commands prompt=root@node-a:~#><pre>
echo 'Hello, world!' | nc -N fd8c:e4b0:5e01:165::1 1234</pre></blog-code><p></p><blog-code syntax=commands prompt=root@node-b:~#><pre>
nc -6lvn -p 1234
# Listening on [::] (family 10, port 1234)
# Connection from fd8c:e4b0:5e01:164::1 39182 received!
# Hello, world!</pre></blog-code><p>If anything goes wrong – for example UDP works but TCP doesn't – you can use pretty much any packet capture tool to debug the overlay. Since 6to4 is a widely-deployed protocol, tools such as <tt>tcpdump</tt> know how to de-encapsulate the underlying flows.</p><blog-code syntax=commands prompt=root@node-b:~#><pre>
tcpdump -i ens37 -nn --no-promiscuous-mode
# listening on ens37, link-type EN10MB (Ethernet), capture size 262144 bytes
# 05:04:16.648264 IP 10.1.1.100 > 10.1.1.101: IP6 fd8c:e4b0:5e01:164::1.39182 > fd8c:e4b0:5e01:165::1.1234: Flags [S], seq 1134988924, win 65320, options [mss 1420,sackOK,TS val 3326468574 ecr 0,nop,wscale 7], length 0
# 05:04:16.648496 IP 10.1.1.101 > 10.1.1.100: IP6 fd8c:e4b0:5e01:165::1.1234 > fd8c:e4b0:5e01:164::1.39182: Flags [S.], seq 2975133435, ack 1134988925, win 64768, options [mss 1420,sackOK,TS val 3023145745 ecr 3326468574,nop,wscale 7], length 0
# 05:04:16.648794 IP 10.1.1.100 > 10.1.1.101: IP6 fd8c:e4b0:5e01:164::1.39182 > fd8c:e4b0:5e01:165::1.1234: Flags [.], ack 1, win 511, options [nop,nop,TS val 3326468575 ecr 3023145745], length 0
# 05:04:16.648889 IP 10.1.1.100 > 10.1.1.101: IP6 fd8c:e4b0:5e01:164::1.39182 > fd8c:e4b0:5e01:165::1.1234: Flags [P.], seq 1:15, ack 1, win 511, options [nop,nop,TS val 3326468575 ecr 3023145745], length 14
# 05:04:16.648906 IP 10.1.1.101 > 10.1.1.100: IP6 fd8c:e4b0:5e01:165::1.1234 > fd8c:e4b0:5e01:164::1.39182: Flags [.], ack 15, win 506, options [nop,nop,TS val 3023145746 ecr 3326468575], length 0
# 05:04:16.648982 IP 10.1.1.100 > 10.1.1.101: IP6 fd8c:e4b0:5e01:164::1.39182 > fd8c:e4b0:5e01:165::1.1234: Flags [F.], seq 15, ack 1, win 511, options [nop,nop,TS val 3326468575 ecr 3023145745], length 0
# 05:04:16.649088 IP 10.1.1.101 > 10.1.1.100: IP6 fd8c:e4b0:5e01:165::1.1234 > fd8c:e4b0:5e01:164::1.39182: Flags [F.], seq 1, ack 16, win 506, options [nop,nop,TS val 3023145746 ecr 3326468575], length 0
# 05:04:16.649677 IP 10.1.1.100 > 10.1.1.101: IP6 fd8c:e4b0:5e01:164::1.39182 > fd8c:e4b0:5e01:165::1.1234: Flags [.], ack 2, win 511, options [nop,nop,TS val 3326468576 ecr 3023145746], length 0
#
# 8 packets captured
# 8 packets received by filter
# 0 packets dropped by kernel</pre></blog-code></blog-section><blog-section><h3 slot=title>Enable Teredo mode (UDP encapsulation)</h3><p>Since Teredo is 6to4 in UDP, we enable FOU mode to turn a 6to4 overlay into a Teredo overlay. FOU mode can be configured to use any destination port – I'm using 3544 because that's the official Teredo port, and it helps packet capture tools figure out what's going on.</p><blog-code syntax=commands prompt=#><pre>
modprobe fou
ip fou add port 3544 ipproto 41
ip link set \
# name kubetunnel0 \
# type sit \
# encap fou \
# encap-sport auto \
# encap-dport 3544</pre></blog-code><p>Note that Teredo support is not as widespread as 6to4. In particular, <tt>tcpdump</tt> doesn't know how to de-encapsulate it.</p><blog-code syntax=commands prompt=root@node-b:~#><pre>
root@node-b:~# tcpdump -i ens37 -nn --no-promiscuous-mode
# listening on ens37, link-type EN10MB (Ethernet), capture size 262144 bytes
# 05:35:26.040293 IP 10.1.1.100.54772 > 10.1.1.101.3544: UDP, length 80
# 05:35:26.040348 IP 10.1.1.101.54181 > 10.1.1.100.3544: UDP, length 80
# 05:35:26.040868 IP 10.1.1.100.54772 > 10.1.1.101.3544: UDP, length 72
# 05:35:26.041134 IP 10.1.1.100.54772 > 10.1.1.101.3544: UDP, length 86
# 05:35:26.041140 IP 10.1.1.100.54772 > 10.1.1.101.3544: UDP, length 72
# 05:35:26.041185 IP 10.1.1.101.54181 > 10.1.1.100.3544: UDP, length 72
# 05:35:26.041412 IP 10.1.1.101.54181 > 10.1.1.100.3544: UDP, length 72
# 05:35:26.042305 IP 10.1.1.100.54772 > 10.1.1.101.3544: UDP, length 72
#
# 8 packets captured
# 8 packets received by filter
# 0 packets dropped by kernel</pre></blog-code><p>Wireshark works fine.</p><img src=https://john-millikin.com/by-sha256/f20ec1f4bbad03d112370e7bfedc6b5a303c93972c9f8a1cf0865681dae66ad0/Screen%20Shot%202021-01-31%20at%2022.39.23.png style=max-width:1024px></blog-section></blog-section>
<blog-section><h2 slot=title>Persistent network configuration</h2><blog-section><h3 slot=title>Debian (ifupdown)</h3><p>Create a file named <tt>/etc/network/interfaces.d/kubetunnel0</tt> in <a href=https://manpages.debian.org/stretch/ifupdown/interfaces.5.en.html>interfaces(5)</a> format. These commands are the same ones run earlier by hand.</p><p>If you don't want to use templating to inject the right local IPv4 address, or need something dynamic (e.g. if the host IPv4 is from DHCP), then move the commands into a helper binary and invoke it from this config file.</p><blog-code><pre>
auto kubetunnel0
iface kubetunnel0 inet6 manual
pre-up ip tunnel add "${IFACE}" \
mode sit \
local '10.1.1.100' \
ttl 64
pre-up ip tunnel 6rd dev "${IFACE}" \
6rd-prefix 'fd8c:e4b0:5e00::/40' \
6rd-relay_prefix '10.0.0.0/8'
pre-up ip -6 addr add 'fd8c:e4b0:5e01:0164::1/40' dev "${IFACE}"
up ip link set "${IFACE}" up
post-up ip -6 addr delete '::10.1.1.100/96' dev "${IFACE}"
down ip link set "${IFACE}" down
post-down ip tunnel delete "${IFACE}"</pre></blog-code><p>You may also want to create a bridge device, so that pods can have NAT'd IPv4 IPs. This lets them talk to existing infrastructure that isn't part of the overlay network. I'll use <tt>192.168.1.1/24</tt> as the NAT range in this example. Put the following into <tt>/etc/network/interfaces.d/kubebridge0</tt>.</p><blog-code><pre>
auto kubebridge0
iface kubebridge0 inet manual
pre-up \
iptables -t nat -C POSTROUTING -s 192.168.1.1/24 -j MASQUERADE || \
iptables -t nat -A POSTROUTING -s 192.168.1.1/24 -j MASQUERADE
pre-up ip link add name "${IFACE}" type bridge0
pre-up ip addr add 192.168.1.1/24 brd + dev "${IFACE}"
pre-up ip -6 addr add fd8c:e4b0:5e01:0164::1:1/112 dev "${IFACE}"
up ip link set "${IFACE}" up
down ip link set "${IFACE}" down
post-down ip link delete "${IFACE}"</pre></blog-code></blog-section></blog-section><blog-section><h2 slot=title>Kubelet configuration</h2><p>After creating the interface, you still need to make the Kubelet use it for pod networking. Create a file <tt>/etc/cni/net.d/10-kubernetes-overlay.conf</tt>, using the <a href=https://www.cni.dev/plugins/ipam/host-local/>host-local</a> IPAM plugin to allocate addresses out of the host's IPv6 prefix:</p><ul><li>The <tt>"ranges"</tt> section sets which subnets to use for pod IPs. The following example allocates two IPs, one from the IPv6 overlay and one from the IPv4 bridge NAT.</li><li>The <tt>"routes"</tt> section configures the pod's network namespace to use the bridge for outbound packets.</li></ul><p>Here there are fewer options if you want to avoid templating the config file. You might need to write a custom CNI plugin that queries the network state and invokes the other CNI binaries.</p><blog-code syntax=json><pre>{
"cniVersion": "0.3.1",
"name": "kubernetes-overlay",
"type": "bridge",
"bridge": "kubebridge0",
"hairpinMode": true,
"ipam": {
"type": "host-local",
"ranges": [
[ { "subnet": "fd8c:e4b0:5e01:0164::1:0/112" } ],
[ { "subnet": "192.168.1.0/24" } ]
],
"routes": [
{ "dst": "0.0.0.0/0", "gw": "192.168.1.1" },
{ "dst": "fd8c:e4b0:5e00::/40", "gw": "fd8c:e4b0:5e01:0164::1:1" }
],
"dataDir": "/var/run/cni/networks/kubernetes-overlay"
}
}</pre></blog-code></blog-section><blog-section><h2 slot=title>Other notes</h2><blog-section><h3 slot=title>Jumbo packets</h3><p>If you're using jumbo packets on your network, be aware that the kernel creates <i>two</i> SIT interfaces: <tt>kubetunnel0</tt> and <tt>sit0</tt>.</p><blog-code syntax=commands prompt=#><pre>
ip addr show sit0
# 4: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
# link/sit 0.0.0.0 brd 0.0.0.0</pre></blog-code><p>This mostly doesn't matter, except that some <tt>sit0</tt> settings (including MTU) seem to affect all SIT tunnels on the machine. You'll want to adjust the <tt>sit0</tt> and <tt>kubetunnel0</tt> MTUs at the same time during interface creation. The MTU of <tt>sit0</tt> should match the physical interface.</p><blog-code syntax=commands prompt=#><pre>
ip link set dev kubetunnel0 mtu 8950
ip link set dev sit0 mtu 9001</pre></blog-code></blog-section></blog-section><blog-footnotes><hr><ol><li id=fn:1><p><a href=https://tools.ietf.org/html/rfc3068>RFC 3068</a> reserved the <tt>2002::/16</tt> anycast prefix for 6to4 tunnels, so that each IPv4 address would convert to a 48-bit "routing prefix". This original scheme wasn't widely adopted because the user experience depends on both sides of the connection having access to high-quality tunnel routers. <a href=https://tools.ietf.org/html/rfc7526>RFC 7526</a> officially deprecated the <tt>2002::/16</tt> prefix.</p></li></ol></blog-footnotes></blog-article>2021-02-20T07:08:48ZExtending VSCode with WebAssembly2020-12-30T03:15:46Zurn:uuid:f01bd1b8-8608-4afc-a4ab-9964dd5a00d4<style type=text/css scoped>li{margin:.5em 0}p,li{line-height:1.5}tt{white-space:nowrap}</style><blog-article posted=2020-12-28T11:57:02Z><h1 slot=title>Extending VSCode with WebAssembly</h1><div slot=tableofcontents></div><p>Two years ago I filed <a href=https://github.com/Microsoft/vscode/issues/65559>Microsoft/vscode#65559</a> asking for WebAssembly support in VSCode extensions. At the time, WASM was supported by Node.JS but the <tt>WebAssembly</tt> symbol wasn't available in the extension's evaluation scope. That issue didn't get much activity from upstream but the other day I tried it again, and … it worked!</p><p>Below is a small "hello world" LSP-based extension that loads a WASM module in <tt>onInitialize()</tt>. It uses the <a href=https://yarnpkg.com/package/vscode-languageserver>vscode-languageserver</a> library; readers new to VSCode extensions can follow along using Microsoft's <a href=https://code.visualstudio.com/api/get-started/your-first-extension>Your First Extension</a> and <a href=https://code.visualstudio.com/api/language-extensions/language-server-extension-guide>Language Server Extension Guide</a> tutorials.</p><div style=text-align:center><img src=https://john-millikin.com/by-sha256/fb136f5feeb60d7cafaaa1fda1967602f097fec4644db18c69d22b62980fcd12/Screen%20Shot%202020-12-28%20at%2017.11.18.png style=max-width:800px;margin:2em></div><blog-section><h2>server.wasm</h2><p>First up, we'll need the WASM file itself. I wrote two flavors (C and Rust) with equivalent API, returning a single static string. More complex APIs can use Emscripten or wasm-bindgen or whatever to deal with the FFI.</p><p>Option 1: C</p><blog-code syntax=c><pre>
char *greeting() {
return "Hello world (from C)!";
}</pre></blog-code><p></p><blog-code syntax=commands><pre>
clang --target=wasm32 \
# --no-standard-libraries \
# -Wl,--export-all \
# -Wl,--no-entry \
# -o out/server.wasm \
# src/server.c</pre></blog-code><p>Option 2: Rust</p><blog-code syntax=rust><pre>
#![no_std]
#[no_mangle]
pub extern "C" fn greeting() -> *const u8 {
const HELLO: &'static str = "Hello world (from Rust)!\0";
HELLO.as_ptr()
}
#[panic_handler]
fn panic(_info: &core::panic::PanicInfo) -> ! {
loop {}
}
</pre></blog-code><p></p><blog-code syntax=commands><pre>
rustc -O \
# --target wasm32-unknown-unknown \
# --crate-type=cdylib \
# -o out/server.wasm \
# src/server.rs</pre></blog-code><p>Either way you'll get a WASM module looking something like this:</p><blog-code syntax=commands><pre>
wasm2wat out/server.wasm
# (module
# (type (;0;) (func (result i32)))
# (func $greeting (type 0) (result i32)
# i32.const 1048576)
# (table (;0;) 1 1 funcref)
# (memory (;0;) 17)
# (global (;0;) (mut i32) (i32.const 1048576))
# (global (;1;) i32 (i32.const 1048601))
# (global (;2;) i32 (i32.const 1048601))
# (export "memory" (memory 0))
# (export "greeting" (func $greeting))
# (export "__data_end" (global 1))
# (export "__heap_base" (global 2))
# (data (;0;) (i32.const 1048576) "Hello world (from Rust)!\00"))</pre></blog-code></blog-section><blog-section><h2>server.ts</h2><p>This file implements the server side of an LSP-based VSCode extension. For the sake of brevity I won't have it implement any handlers, so it just loads the WASM module and sends a message once successfully initialized.</p><blog-code syntax=javascript><pre>
import * as fs from "fs";
import * as path from "path";
import {
createConnection,
ProposedFeatures,
InitializeParams,
TextDocumentSyncKind,
InitializeResult
} from 'vscode-languageserver/node';
declare const WebAssembly: any;
</pre></blog-code><p>The only unusual part is the <tt>declare</tt>, which is necessary because TypeScript doesn't have type definitions for standalone WASM yet (<a href=https://github.com/DefinitelyTyped/DefinitelyTyped/issues/48648>DefinitelyTyped/DefinitelyTyped#48648</a>). Since the WebAssembly API is small we can stub out the type checks.</p><p>Next up is a helper class to load the module from disk, compile it, and call its exported functions.</p><blog-code syntax=javascript><pre>
class ServerImpl {
instance: any;
constructor(instance: any) {
this.instance = instance;
}
public static instantiate(): Promise<ServerImpl> {
const wasmPath = path.resolve(__dirname, "server.wasm");
return new Promise((resolve, reject) => {
fs.readFile(wasmPath, (err, data) => {
if (err) {
reject(err);
return;
}
const buf = new Uint8Array(data);
resolve(WebAssembly.instantiate(buf, {})
.then((result: any) => (new ServerImpl(result.instance)))
);
});
});
}
public greeting(): String {
const exports = this.instance.exports;
const result_off = exports.greeting();
const result_ptr = new Uint8Array(exports.memory.buffer, result_off);
let result = "";
for (let ii = 0; result_ptr[ii]; ii++){
result += String.fromCharCode(result_ptr[ii]);
}
return result;
}
}
</pre></blog-code><p>Lastly the server startup and initialization logic, which calls into the helper to fetch the greeting string.</p></blog-code><blog-code syntax=javascript><pre>
let impl: ServerImpl;
let connection = createConnection(ProposedFeatures.all);
connection.onInitialize((params: InitializeParams) => {
const result: InitializeResult = {
capabilities: {
textDocumentSync: TextDocumentSyncKind.Incremental,
}
};
return ServerImpl.instantiate()
.then((loadedImpl: ServerImpl) => {
impl = loadedImpl;
return result;
});
});
connection.onInitialized(() => {
connection.window.showInformationMessage(`greeting: ${impl.greeting()}`);
});
connection.listen();
</pre></blog-code></blog-section><p>When the "Hello World" command is invoked via the command menu, the extension will be initialized and the greeting will pop up.</p></blog-article>2020-12-30T03:15:46ZNotes on cross-compiling Rust2020-12-12T09:34:16Zurn:uuid:f29c4af8-f526-4d49-bb46-b9f6a96ae93f<style type=text/css scoped>li{margin:.5em 0}p,li{line-height:1.5}tt{white-space:nowrap}</style><blog-article posted=2020-12-12T09:34:16Z><h1 slot=title>Notes on cross-compiling Rust</h1><div slot=summary><div style="float:right;padding:0 0 0 2em"><img src=https://john-millikin.com/by-sha256/a09f17ed006ab5ca090818f22f7efdbc9eafa5ccf28bc9b85429733bf4960ebb/raspberry-pi.jpg style=width:400px;height:400px></div><p>One of my current hobby projects involves running Rust binaries on a Raspberry Pi. There are three computers involved: the Pi itself (ARMv7 Linux), my desktop (x86-64 Linux), and sometimes my laptop (x86-64 macOS).</p><p>The release of Cyberpunk 2077 means that my desktop will be spending more time booted into Windows, so I needed to figure out how to get the macOS machine to build binaries for ARMv7 Linux. I had hoped this would be straightforward because <tt>rustc</tt> is a native cross-compiler, and I've had good experiences with cross-compiling other modern languages (e.g. Go).</p><p>Unfortunately when I did a websearch for [cross-compiling rust] the results were universally terrible<blog-footnote-ref>[<a href=#fn:1>1</a>]</blog-footnote-ref>. This page contains my notes on how to get cross-compilation working with either Cargo or Bazel, plus some suggestions for the rustup and rules_rust projects that could make cross-compilation simpler in the future.</p></div><blog-section><h2 slot=title>Background</h2><p>In the early days of software engineering, when high-level languages like C were just starting to displace assembly, compilers used build-time configuration to select a target platform. This meant that any given build of the compiler could only generate object code for a single platform. The concept of <a href=https://en.wikipedia.org/wiki/Cross_compiler>cross-compilation</a> was introduced to describe compilers that could be built to run on Platform A but generate object code for Platform B.</p><p>Times change, and nowadays every major compiler<blog-footnote-ref>[<a href=#fn:2>2</a>]</blog-footnote-ref> is what's called a "native cross compiler", allowing the target platform to be selected at runtime (e.g. with a CLI flag). This includes the Rust compiler <tt>rustc</tt>, which as of v1.48 supports well over a hundred distinct targets.</p><blog-code syntax=commands><pre>
rustc --version
# rustc 1.48.0 (7eac88abb 2020-11-16)
rustc --print target-list | wc -l
# 156
rustc --print target-list | sort -R | head -n 10 | sort
# aarch64-apple-darwin
# i686-uwp-windows-msvc
# msp430-none-elf
# powerpc-unknown-linux-gnuspe
# powerpc-wrs-vxworks
# sparc64-unknown-linux-gnu
# sparc64-unknown-openbsd
# thumbv4t-none-eabi
# thumbv7a-pc-windows-msvc
# x86_64-pc-windows-msvc</pre></blog-code><p>In practice cross-compilation requires more than simply generating object code, but with a bit of effort from the toolchain developers it's possible to make this nearly seamless. Go is the gold standard here; it ships its own linker and the sources for its standard library, so a normal installation can directly build executables for any supported target.</p></blog-section><blog-section><h2 slot=title>Rustup and Cargo</h2><div style="float:right;padding:0 0 0 2em"><img src=https://john-millikin.com/by-sha256/b049b899f6e55fbbd9a80a31a44c7689068b1ac7050ec5a1a6d425e50cfde69f/Cargo-Logo-Small.png style=width:306px;height:275px></div><p>The first build tool I tried is <a href=https://github.com/rust-lang/cargo>Cargo</a>, which I installed with <a href=https://rustup.rs/>rustup</a>. I dislike building with Cargo because it's primitive and inflexible, but since it's the official Rust build tool I hoped it would be the best documented.</p><div style=padding-right:330px><blog-code syntax=toml><pre>
# Cargo.toml
[package]
name = "helloworld"
version = "0.0.1"
edition = "2018"
[[bin]]
name = "helloworld"
path = "helloworld.rs"
</pre></blog-code></div><p>Cargo uses the <tt>--target</tt> flag to enable cross-compilation.</p><blog-code syntax=commands><pre>
cargo build --target armv7-unknown-linux-gnueabihf
# Compiling helloworld v0.0.1 (/Users/john/src/rust-cross-compilation)
# error[E0463]: can't find crate for `std`
# |
# = note: the `armv7-unknown-linux-gnueabihf` target may not be installed</pre></blog-code><p>Whereas Go will build its standard library from source when cross-compiling, Rust relies on precompiled libraries<blog-footnote-ref>[<a href=#fn:3>3</a>]</blog-footnote-ref>. We can use <tt>rustup</tt> to fetch a prebuilt <tt>std</tt> for Linux on ARMv7.</p><blog-code syntax=commands><pre>
rustup target add armv7-unknown-linux-gnueabihf
# info: downloading component 'rust-std' for 'armv7-unknown-linux-gnueabihf'
# info: installing component 'rust-std' for 'armv7-unknown-linux-gnueabihf'
# info: using up to 500.0 MiB of RAM to unpack components
# 18.2 MiB / 18.2 MiB (100 %) 11.5 MiB/s in 1s ETA: 0s</pre></blog-code><p></p><blog-code syntax=commands><pre>
cargo build --target armv7-unknown-linux-gnueabihf
# Compiling helloworld v0.0.1 (/Users/john/src/rust-cross-compilation)
# error: linking with `cc` failed: exit code: 1
# |
# = note: "cc" "-Wl,--as-needed" "-Wl,-z,noexecstack" "-Wl,--eh-frame-hdr" "-L"
# [...]
# "-Wl,-Bdynamic" "-lgcc_s" "-lc" "-lm" "-lrt" "-lpthread" "-lutil" "-ldl" "-lutil"
# = note: clang: warning: argument unused during compilation: '-pie' [-Wunused-command-line-argument]
# ld: unknown option: --as-needed
# clang: error: linker command failed with exit code 1 (use -v to see invocation)</pre></blog-code><p>The source file was successfully compiled, but it couldn't be linked into an executable. It looks like Cargo is trying to use the host system's linker, which will sometimes work, but fails in this particular case because the macOS linker only supports Apple targets.</p><p>Luckily the LLVM project, in addition to the compilation framework, also distributes the cross-platform <a href=https://lld.llvm.org/>LLD</a> linker. While it doesn't cover every platform supported by <tt>rustc</tt>, it does support the common ones. We can configure Cargo to use it for linking our ARMv7 Linux binary.</p><p>I downloaded <a href=https://github.com/llvm/llvm-project/releases/download/llvmorg-11.0.0/clang+llvm-11.0.0-x86_64-apple-darwin.tar.xz><tt>clang+llvm-11.0.0-x86_64-apple-darwin.tar.xz</tt></a> from <a href=https://releases.llvm.org/download.html>https://releases.llvm.org/download.html</a> and extracted it to <tt>~/.opt/</tt>, then added a <tt>.cargo/config.toml</tt> to my workspace.</p><blog-code syntax=toml><pre>
# .cargo/config.toml
[build]
[target.armv7-unknown-linux-gnueabihf]
linker = "/Users/john/.opt/clang+llvm-11.0.0-x86_64-apple-darwin/bin/lld"
</pre></blog-code><p></p><blog-code syntax=commands><pre>
cargo build --target armv7-unknown-linux-gnueabihf
# Compiling helloworld v0.0.1 (/Users/john/src/rust-cross-compilation)
# error: linking with `/Users/john/.opt/clang+llvm-11.0.0-x86_64-apple-darwin/bin/lld` failed: exit code: 1
# |
# = note: "/Users/john/.opt/clang+llvm-11.0.0-x86_64-apple-darwin/bin/lld" "-flavor" "gnu" "--eh-frame-hdr" "-L"
# [...]
# "-Bdynamic" "-lgcc_s" "-lc" "-lm" "-lrt" "-lpthread" "-lutil" "-ldl" "-lutil"
# = note: lld: error: unable to find library -lgcc_s
# lld: error: unable to find library -lc
# lld: error: unable to find library -lm
# lld: error: unable to find library -lrt
# lld: error: unable to find library -lpthread
# lld: error: unable to find library -lutil
# lld: error: unable to find library -ldl
# lld: error: unable to find library -lutil</pre></blog-code><p>Getting closer!</p><p>The linker is being told to build an executable that dynamically links against the GNU libc, which I don't have a copy of. One option here is to download it from (for example) the Ubuntu package hosting, but I don't want to do that because I don't think a Rust binary should be depending on <tt>libc</tt> at all. Rust ought to be considered a replacement for C, rather than a thin layer on top.</p><p>Therefore I'm going to switch the Cargo target to the MUSL variant, which treats <tt>libc</tt> as an implementation detail rather than a core component of the platform.</p><blog-code syntax=commands><pre>
rustup target add armv7-unknown-linux-musleabihf
# info: downloading component 'rust-std' for 'armv7-unknown-linux-musleabihf'
# info: installing component 'rust-std' for 'armv7-unknown-linux-musleabihf'
# info: using up to 500.0 MiB of RAM to unpack components
# 15.8 MiB / 15.8 MiB (100 %) 12.1 MiB/s in 1s ETA: 0s</pre></blog-code><p></p><blog-code syntax=toml><pre>
# .cargo/config.toml
[build]
[target.armv7-unknown-linux-musleabihf]
linker = "/Users/john/.opt/clang+llvm-11.0.0-x86_64-apple-darwin/bin/lld"
</pre></blog-code><p></p><blog-code syntax=commands><pre>
cargo build --target armv7-unknown-linux-musleabihf
# Compiling helloworld v0.0.1 (/Users/john/src/rust-cross-compilation)
# Finished dev [unoptimized + debuginfo] target(s) in 1.50s</pre></blog-code><p>Success! The resulting binary is a valid executable for ARMv7 Linux, and can be run as-is on the Raspberry Pi.</p><blog-code syntax=commands><pre>
file target/armv7-unknown-linux-musleabihf/debug/helloworld
# target/armv7-unknown-linux-musleabihf/debug/helloworld: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, with debug_info, not stripped</pre></blog-code></blog-section><blog-section><h2 slot=title>Bazel</h2><div style="float:right;padding:0 0 0 2em"><img src=https://john-millikin.com/by-sha256/05daef8103f981c102f1b8486bd7c97f625bdffb14e0ce4875dc4a2ea2b5941e/bazel-icon.svg style=width:300px;height:300px></div><p><a href=https://bazel.build/>Bazel</a> is a language-agnostic build system. Its configuration language deals in actions and dependency graphs, rather than executables and libraries, which gives it some interesting scaling properties:</p><ul><li>Building single-language projects with Bazel can be more difficult than using language-specific tools.</li><li>Building multi-language projects is substantially easier in Bazel than in any other build system.</li></ul><p>This makes Bazel a natural choice of build tool for any system that involves (1) FFI, (2) generated code, or (3) well-factored subsystems. It is uniquely capable when compared to Cargo because it can build multiple Rust libraries ("crates") within a single workspace.</p><p style=clear:both>The first step to build Rust with Bazel is to configure the <tt>WORKSPACE</tt> to depend on <a href=https://github.com/bazelbuild/rules_rust>rules_rust</a>. This will also define the default Rust version and edition. There's no need to install toolchains or targets, because Bazel will fetch them on demand.</p><blog-code syntax=python><pre>
# WORKSPACE
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
http_archive(
name = "io_bazel_rules_rust",
# HEAD commit as of 2020-12-05
urls = ["https://github.com/bazelbuild/rules_rust/archive/67f0c5ec0397d24ccc14264a0eda86915ddf63e8.tar.gz"],
sha256 = "c587d402e4502100b01e4ba7d9584809cf4f4eb2d2f6634097883637bfb512b1",
strip_prefix = "rules_rust-67f0c5ec0397d24ccc14264a0eda86915ddf63e8",
)
load("@io_bazel_rules_rust//rust:repositories.bzl", "rust_repositories")
rust_repositories(
edition = "2018",
version = "1.48.0",
)
</pre></blog-code><p>Next we need to create a top-level <tt>BUILD</tt> file. This will define a <tt>rust_binary</tt> target for our hello-world executable, and also a <tt>platform</tt> describing what sort of system we want to build for.</p><blog-code syntax=python><pre>
# BUILD.bazel
load("@io_bazel_rules_rust//rust:rust.bzl", "rust_binary")
rust_binary(
name = "helloworld",
srcs = ["helloworld.rs"],
)
platform(
name = "linux-armv7",
constraint_values = [
"@platforms//os:linux",
"@platforms//cpu:arm",
],
)
</pre></blog-code><p>In the future the Platform would use a more specific <tt>"cpu:armv7"</tt> constraint (<a href=https://github.com/bazelbuild/rules_rust/pull/509>bazelbuild/rules_rust#509</a>) and support constraining on the Rust release channel (<a href=https://github.com/bazelbuild/rules_rust/pull/510>bazelbuild/rules_rust#510</a>).</p><p>Anyway, that should be enough, but if we try running it we'll hit an error about missing toolchains.</p><blog-code syntax=commands><pre>
bazel build //:helloworld --platforms=//:linux-armv7
# [...]
# ERROR: While resolving toolchains for target //:helloworld: no matching toolchains found for types @io_bazel_rules_rust//rust:toolchain</pre></blog-code><p>This is because rules_rust doesn't pre-register toolchains for all supported target platforms – it makes the user register each (host, target) mapping explicitly. We need to tell rules_rust to register a toolchain that can run on macOS (Darwin) and build for ARMv7 Linux.</p><blog-code syntax=python><pre>
# WORKSPACE
load("@io_bazel_rules_rust//rust:repositories.bzl", "rust_repository_set")
rust_repository_set(
name = "rust_linux_armv7",
edition = "2018",
exec_triple = "x86_64-apple-darwin",
extra_target_triples = ["arm-unknown-linux-musleabihf"],
rustfmt_version = "1.4.20",
version = "1.48.0",
)
</pre></blog-code><p></p><blog-code syntax=commands><pre>
bazel build //:helloworld --platforms=//:linux-armv7
# [...]
# INFO: From Compiling Rust bin helloworld (1 files):
# error: linking with `external/local_config_cc/cc_wrapper.sh` failed: exit code: 1
# |
# = note: "external/local_config_cc/cc_wrapper.sh" "-Wl,--as-needed" "-Wl,-z,noexecstack" "-Wl,--eh-frame-hdr" "-nostartfiles"
# = note: clang: warning: argument unused during compilation: '-no-pie' [-Wunused-command-line-argument]
# ld: unknown option: --as-needed
# clang: error: linker command failed with exit code 1 (use -v to see invocation)</pre></blog-code><p>This is the same linker error as we saw with Cargo, and the solution is to tell rules_rust that it should use LLD. However, there's a problem – rules_rust doesn't have its own linker toolchain, it uses the C/C++ toolchain to find a linker.</p><p>We must now contend with the Bazel C/C++ configuration system, which is designed to handle the world's wide range of strange C compilers. I'm not going to give a blow-by-blow here because none of it is relevant to Rust, but a summary is:</p><ul><li>We create a new Bazel package <tt>//cc-toolchain</tt> that will contain the C/C++ configuration. I'm just going to pull in the linker from the filesystem rather than properly <tt>repository_rule</tt> it, so the toolchain file sets will be empty stubs.</li><li>The <tt>CcToolchainConfigInfo</tt> itself requires the path to a bunch of different tools; since the only one needed here is <tt>lld</tt> I'll hardcode the rest to <tt>/bin/false</tt>.</li><li>This project doesn't need to build any C/C++ code for the host (e.g. for codegen), so I'm going to override <tt>--host_crosstool_top</tt> rather than define a true host-compatible toolchain.</li></ul><p>A more complete solution would probably involve the Clang-based toolchains defined in <a href=https://github.com/bazelbuild/bazel-toolchains>https://github.com/bazelbuild/bazel-toolchains</a>.</p><blog-code syntax=python><pre>
# cc-toolchain/BUILD
load(":config.bzl", "cc_toolchain_config")
filegroup(name = "empty")
cc_toolchain_suite(
name = "clang_suite",
toolchains = {
"armv7": ":armv7_toolchain",
},
)
cc_toolchain(
name = "armv7_toolchain",
all_files = ":empty",
compiler_files = ":empty",
dwp_files = ":empty",
linker_files = ":empty",
objcopy_files = ":empty",
strip_files = ":empty",
supports_param_files = 0,
toolchain_config = ":armv7_toolchain_config",
toolchain_identifier = "armv7-toolchain",
)
cc_toolchain_config(name = "armv7_toolchain_config")
</pre></blog-code><p></p><blog-code syntax=python><pre>
# cc-toolchain/config.bzl
load(
"@bazel_tools//tools/cpp:cc_toolchain_config_lib.bzl",
"action_config",
"tool",
"tool_path",
)
load(
"@bazel_tools//tools/build_defs/cc:action_names.bzl",
"CPP_LINK_EXECUTABLE_ACTION_NAME",
)
LLD = "/Users/john/.opt/clang+llvm-11.0.0-x86_64-apple-darwin/bin/lld"
def _cc_toolchain_config_impl(ctx):
return cc_common.create_cc_toolchain_config_info(
ctx = ctx,
toolchain_identifier = "armv7-toolchain",
host_system_name = "local",
target_system_name = "armv7-unknown-linux-musleabihf",
target_cpu = "armv7",
target_libc = "unknown",
compiler = "clang",
abi_version = "unknown",
abi_libc_version = "unknown",
action_configs = [
action_config(
action_name = CPP_LINK_EXECUTABLE_ACTION_NAME,
enabled = True,
tools = [tool(path = LLD)],
),
],
tool_paths = [
tool_path(
name = "ld",
path = LLD,
),
tool_path(
name = "ar",
path = "/usr/bin/ar",
),
tool_path(
name = "cpp",
path = "/bin/false",
),
tool_path(
name = "gcc",
path = "/usr/bin/clang",
),
tool_path(
name = "gcov",
path = "/bin/false",
),
tool_path(
name = "nm",
path = "/bin/false",
),
tool_path(
name = "objdump",
path = "/bin/false",
),
tool_path(
name = "strip",
path = "/bin/false",
),
],
)
cc_toolchain_config = rule(
implementation = _cc_toolchain_config_impl,
attrs = {},
provides = [CcToolchainConfigInfo],
)
</pre></blog-code><p>Whew. With that mess dealt with, rules_rust will now link with LLD and produce valid ARMv7 Linux binaries.</p><blog-code syntax=commands><pre>
bazel build //:helloworld --platforms=//:linux-armv7 \
# --cpu=armv7 \
# --crosstool_top=//cc-toolchain:clang_suite \
# --host_crosstool_top=@bazel_tools//tools/cpp:toolchain
# INFO: Invocation ID: f6c497d9-48db-4240-85b5-c8bfa675c49b
# INFO: Analyzed target //:helloworld (10 packages loaded, 274 targets configured).
# INFO: Found 1 target...
# Target //:helloworld up-to-date:
# bazel-bin/helloworld
# INFO: Elapsed time: 33.660s, Critical Path: 0.45s
# INFO: 10 processes: 5 remote cache hit, 5 internal.
# INFO: Build completed successfully, 10 total actions</pre></blog-code><p></p><blog-code syntax=commands><pre>
file bazel-bin/helloworld
# bazel-bin/helloworld: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, with debug_info, not stripped</pre></blog-code></blog-section><blog-section><h2 slot=title>Suggestions</h2><ol><li>rules_rust has some work to do on making its toolchains ergonomic. Right now they couple the host binaries and target libraries into a single <tt>ToolchainInfo</tt>, which means Bazel can't resolve them separately based on host and target constraints. If they were split up (<a href=https://github.com/bazelbuild/rules_rust/issues/523>bazelbuild/rules_rust#523</a>) then the entire set of supported targets could be pre-registered by a <tt>rust_toolchains()</tt> macro.</li><li>rules_rust should decouple its linker command from the C/C++ toolchain. I shouldn't have to touch anything related to <tt>cc</tt> to get a working <tt>rustc</tt> + <tt>lld</tt> combo.</li><li>Both rustup and rules_rust should integrate support for LLD. While I'm not sure if it should be the default for all platforms, it should definitely be the default (or strongly recommended) for cross-compilation.</li><li><p>The LLVM project should offer some non-monolithic downloads for some tools, or alternatively the Rust project should host a stripped-down archive for LLD. The full LLVM binary distribution is huge and it doesn't make sense to make users download a complete copy of Clang just so they can link ELF binaries on macOS.</p><blog-code syntax=commands><pre>
du -sh clang+llvm-11.0.0-x86_64-apple-darwin/
# 2.4G clang+llvm-11.0.0-x86_64-apple-darwin/
du -sh clang+llvm-11.0.0-x86_64-apple-darwin/bin/lld
# 81M clang+llvm-11.0.0-x86_64-apple-darwin/bin/lld</pre></blog-code><p>It doesn't even use any of the bundled dylibs!</p><blog-code syntax=commands><pre>
otool -L clang+llvm-11.0.0-x86_64-apple-darwin/bin/lld
# clang+llvm-11.0.0-x86_64-apple-darwin/bin/lld:
# /usr/lib/libxml2.2.dylib (compatibility version 10.0.0, current version 10.9.0)
# /usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11)
# /usr/lib/libncurses.5.4.dylib (compatibility version 5.4.0, current version 5.4.0)
# /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.100.1)
# /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 902.1.0)</pre></blog-code></li><li>Cross-compilation should be covered by official Rust documentation. The Rust Book's maintainers have declined to add a chapter about it (<a href=https://github.com/rust-lang/book/issues/2367>rust-lang/book#2367</a>), which makes me sad, but I am hopeful that it might one day be covered in the <a href=https://rust-embedded.github.io/book/>Embedded Rust book</a>.</li></ol></blog-section><blog-footnotes><hr><ol><li id=fn:1><p>If a tutorial on cross-compiling Rust starts off with installing Docker or Vagrant then I'm not fucking reading it. And stop linking me to <tt>rust-embedded/cross</tt>, hiding these insane dependency stacks behind a "magical" wrapper doesn't help anybody worth helping.</p></li><li id=fn:2><p>Except for GCC, which like most GNU software chooses to remain frozen in a grotesque parody of mid-80s UNIX.</p></li><li id=fn:3><p>I've heard this is due to the Rust standard library's dependency on <tt>libc</tt>, thus requiring a C toolchain and headers to build <tt>std</tt> for a given platform.</p></li></ol></blog-footnotes></blog-article>2020-12-12T09:34:16ZFirst impressions of Rust2020-08-06T22:23:05Zurn:uuid:a3152b0f-5e2a-49c7-9477-0e2f7ebef489<style type=text/css scoped>li{margin:.5em 0}p,li{line-height:1.5}.visible-tab{color:#c6d4e0}tt{white-space:nowrap}</style><blog-article posted=2020-08-06T22:23:05Z><h1 slot=title>First impressions of Rust</h1><div slot=tableofcontents></div><div style="float:right;margin:0 0 2em 2em"><img src=https://john-millikin.com/by-sha256/ab42a08a18def418ac77f16a96e8fee54ccd823ed3b0d8ebdc74bee9dba01121/crab.jpg style=max-width:400px></div><p>I've been wanting to write a big project in <a href=https://www.rust-lang.org/>Rust</a> for a while as a learning exercise, and actually started one in late 2018 (a FUSE server implementation). But then life happened and I got busy and never went anywhere with it. Due to certain world circumstances I'm currently spending a lot of time indoors so <a href=https://github.com/jmillikin/rust-fuse>rust-fuse</a> (<a href=https://jmillikin.github.io/rust-fuse/fuse-v0.0.1-a6ad16d1127d36f80e6d02f36e48da56920ca693/fuse/>docs</a>) now exists and is good enough to write basic hello-world filesystems. I plan to polish it up a bit more with the goal of releasing a v1.0 that supports the same use cases as <a href=https://github.com/libfuse/libfuse>libfuse</a>.</p><p>I took some notes along the way about things that struck me as especially good or bad. Overall I quite like Rust the language, have mixed feelings about the quality of ancillary tooling, and have strong objections to some decisions made by the packaging system (Cargo + crates.io).</p><blog-section><h2 slot=title>Background</h2><p>I've been programming professionally for 15 years, primarily network servers and GUIs on Linux. Between roughly 2009 and 2015 I experimented with using Haskell for systems programming, writing several projects in pure Haskell (<a href=/software/haskell-dbus>haskell-dbus</a>, <a href=/software/anansi>Anansi</a>) and as bindings (<a href=/software/haskell-ncurses>haskell-ncurses</a>, <a href=/software/haskell-cpython>haskell-cpython</a>). However, I couldn't achieve the sorts of reliability improvements over bread-and-butter C++ that I had hoped for:<ul><li>Haskell has a lot of tools for reasoning about the structure of computation, notably monads for declarative I/O, but it doesn't do much to help the programmer with non-algorithmic concerns such as memory lifetimes. I spent a lot of time debugging dangling pointers and race conditions.</li><li>I found it very difficult to write Haskell code that could run as fast as C. Avoiding allocation, auto-boxing, etc felt like it required a deep knowledge of undocumented or unspecified GHC behavior.</li></ul></p><p>In late 2015 I started rough sketches for a new language, Funk, that would combine the type-safety of Haskell with the low-level precision of C/C++. Funk was strongly influenced by Google's internal dialect of C++, which uses smart pointers and sum types (e.g. <tt>StatusOr<T></tt>) to improve memory safety – many of its features later became part of the C++11 and C++14 standards. To this foundation I bolted on Haskell-style typeclasses and modules, then started writing a Funk-to-C translator based on <a href=https://wiki.gnome.org/Projects/Vala/Documentation>Vala</a>.</p><p>At some point I was looking around for inspiration on how to handle memory allocation (I planned to use scoped arenas as the fundamental dynamic memory system) and I discovered Rust. Here was a language that was solving the same problems as Funk but (1) better designed (2) already implemented and (3) supported by an entire team of compiler experts. So that was that, Funk went to <tt>/dev/null</tt> and I logged a TODO to learn Rust.</p></blog-section><blog-section><h2 slot=title>The Rust language</h2><p>It shouldn't come as a surprise that someone looking for a cross between C++ and Haskell would like Rust, but I want to be clear: I really <i>really</i> enjoy using Rust. It is nearly everything I want in a systems programming language, and the few parts it's missing are due to legitimate technical difficulty. The amount of careful thought I've seen put into its design – crafting <tt>async/await</tt> to work in <tt>no_std</tt> environments, or the <a href=https://blog.rust-lang.org/inside-rust/2020/06/08/new-inline-asm.html>new inline assembly syntax</a> – has produced a language that is not only better than what I could have designed, it's better among axes I was not even aware existed prior to reading the Rust RFCs.</p><p>The "nightly" release channel is an excellent idea that I wish more infrastructure software made use of. Stabilizing individual features on their own schedules lets the compiler maintain a blistering release cadence (stable releases every <i>six weeks</i>!). Users are empowered to choose their own preferred point on the maintenance/velocity curve, opting in to higher upgrade costs in exchange for early access to new features. The "editions" system goes a bit further, derisking backwards-incompatible syntax changes that would have stymied C++ for decades (see: trigraphs).</p><blog-section><h3 slot=title>Type system</h2><p>Rust has a reasonable amount of Haskell-style type programming, though I wouldn't mind a <i>bit</i> more. Some parts of its type system are limited in non-intuitive ways – for example only lifetime-kinded type parameters can be universally quantified in a trait bound. I hit a lot of compiler errors that recognized exactly what I wanted to do but wouldn't let me do it.</p><p>I wish Rust's type system supported:<ul><li>Closed-world ("sealed") traits. Rust's rules against private types in the public API are good civilization but they make it difficult to define pseudo-private traits like <tt>Mount</tt> that I want users to name but not implement or call into.</li><li>Associated types in structs. Rust lets traits have associated types, and structs can have associated <i>values</i>, but there's no equivalent to the nested type names found in C++ or Java.</li><li>Very basic dependent typing, or maybe something like Eiffel's contracts, for the purpose of eliminating array bounds checks. I'd like to be able to say "this function accepts a <tt>&[u8]</tt> of at least <tt>size_of<SomeType></tt>" so I can do safe unchecked byte poking.</li></ul></p></blog-section><blog-section><h3 slot=title>Standard library</h2><p>There's a lot of standard UNIX functionality that's missing from the Rust standard library. Some of it is more-or-less available from separate packages like <a href=https://crates.io/crates/nix>nix</a>, but I shouldn't have to depend on four crates plus a C compiler to get access to <tt>getuid()</tt>. I shouldn't have to depend on <i>anything</i> to get the definition of <tt>ENOSYS</tt> or the size of <tt>c_ulong</tt>. Go is the gold standard here – it can cross-compile to a Linux target from macOS using its own copies of the Linux syscall table – and even Haskell has <a href=http://hackage.haskell.org/package/base-4.14.0.0/docs/Foreign-C-Types.html><tt>Foreign.C.Types</tt></a>.</p><p>A <tt>std::os::unix</tt> without <tt>getuid()</tt> is incomplete but can be worked around with a small <tt>extern "C"</tt> block. Much worse is the lack of macro-dependent functions like <tt>recvmsg()</tt>, which is not a great API to begin with, or functions with OS-dependent arity like <tt>mount()</tt>. Rust is not averse to providing clean wrappers around the OS library – the <tt>std::fs</tt> and <tt>std::process</tt> modules contain little else – so it's frustrating to see these very basic functions left out.</p></blog-section></blog-section><blog-section><h2 slot=title>Tooling</h2><blog-section><h3 slot=title>rustdoc</h3><p>I categorize documentation generators into two basic groups:<ul><li>First is the <a href=https://www.sphinx-doc.org/>Sphinx</a> group, which consumes prose and uses embedded pragmas to reference symbols of the library being documented. The output layout tends to be textbook-like, containing long "chapters" that might cover entire modules in one HTML file. Sphinx-style docs are popular among Python programmers.</li><li>Second is the <a href=https://www.doxygen.nl/>Doxygen</a> group, which consumes source code and generates a rigidly-structured catalog of symbols with optional attached prose. The output feels more like an encyclopedia or reference manual.</li></ul></p><p><tt>rustdoc</tt> is obviously in the second category. It is designed to consume doc comments, which are special-cased by the Rust compiler, and produces output closely matching the structure of the exported API. At this task <tt>rustdoc</tt> does a reasonable job: the page layout is navigable, the markup format (<tt>rustdoc</tt> uses Markdown) isn't great but it could be worse, and it doesn't hardcode absolute file paths into the output like Haddock.</p><p>Some of its annotations, like whether a symbol is OS-specific (<a href=https://github.com/rust-lang/rust/issues/43781>rust-lang/rust#43781</a>), are gated to the Nightly toolchain. It's not obvious to me why they do this – it's a documentation generator, why does it care what version of the Rust compiler I'm using? What's more, some of its functionality is reserved for the standard library only. I can't mark fields as unstable (subject to change in future library versions) because that annotation is based on the <tt>#[unstable]</tt> attribute, which the compiler reserves for its own use. Ditto for annotations about which version a symbol was added in. If I'm going to use a Doxygen-group tool then I don't want it to get too fussy about what libraries it's documenting.</p></blog-section><blog-section><h3 slot=title>rustfmt</h3><p>Something like a cross between <tt>gofmt</tt>, <tt>clang-format</tt>, and GNU indent. It has a lot of configuration options but all the interesting ones are gated to Nightly, and most of those are much less useful than you might expect.</p><p>As a representative sample, consider <tt>rustfmt</tt>'s handling of hard tabs. Given the following input there are two basic ways you might use hard tabs to indent it, depending on whether struct value alignment should apply to nested structs:</p><blog-code><pre>
MyStruct{
field_with_long_name: (some_big_complex_variable_name + another_big_complex_variable_name),
another_field: 123,
nested_struct: &NestedStruct{
nested_struct_field: 456,
},
final_field: 123,
}
</pre></blog-code><p>The first is to treat the nested struct as a "break" in the alignment (<tt>gofmt</tt> does this). I've drawn the tabs as <span class=visible-tab>████</span> for clarity:</p><blog-code><pre class=language-rust>
<code class=language-rust>MyStruct{
<span class=visible-tab>████</span>field_with_long_name: (some_big_complex_variable_name
<span class=visible-tab>████</span> + another_big_complex_variable_name),
<span class=visible-tab>████</span>another_field: 123,
<span class=visible-tab>████</span>nested_struct: &NestedStruct{
<span class=visible-tab>████████</span>nested_struct_field: 456,
<span class=visible-tab>████</span>},
<span class=visible-tab>████</span>final_field: 123,
}
</code></pre></blog-code><p>The second is to align all the values, including the nested struct, and introduce a nested layer of tabs:</p><blog-code><pre class=language-rust>
<code class=language-rust>MyStruct{
<span class=visible-tab>████</span>field_with_long_name: (some_big_complex_variable_name
<span class=visible-tab>████</span> + another_big_complex_variable_name),
<span class=visible-tab>████</span>another_field: 123,
<span class=visible-tab>████</span>nested_struct: &NestedStruct{
<span class=visible-tab>████</span> <span class=visible-tab>████</span>nested_struct_field: 456,
<span class=visible-tab>████</span> },
<span class=visible-tab>████</span>final_field: 123,
}
</code></pre></blog-code><p>But what <tt>rustfmt</tt> produces is an indecisive and poorly formatted combo of the two – it doesn't even properly align the parenthesized expression after line-breaking it:</p><blog-code><pre class=language-rust>
<code class=language-rust>MyStruct {
<span class=visible-tab>████</span>field_with_long_name: (some_big_complex_variable_name
<span class=visible-tab>████████</span>+ another_big_complex_variable_name),
<span class=visible-tab>████</span>another_field: 123,
<span class=visible-tab>████</span>nested_struct: &NestedStruct {
<span class=visible-tab>████████</span>nested_struct_field: 456,
<span class=visible-tab>████</span>},
<span class=visible-tab>████</span>final_field: 123,
}
</code></pre></blog-code><p>I eventually gave up on trying to make the formatted rust-fuse code look pretty, and settled for "consistent".</p></blog-section></blog-section><blog-section><h2 slot=title>Cargo and crates.io</h2><p>While the Rust language feels carefully designed to combine the best parts of multiple popular and interesting languages, Rust's default build system (Cargo) and package repository (crates.io) are the opposite. They combine the worst parts of Cabal/Hackage and NPM, resulting in a user experience that is somehow inferior to both.</p><blog-section><h3 slot=title>Package naming</h3><p>crates.io has no namespacing. If a user uploads a package named <tt>fuse</tt>, that name is taken forever and no other person can upload a package named <tt>fuse</tt> unless the first developer transfers ownership. It so happens that someone did in fact upload <a href=https://crates.io/crates/fuse>crates.io/crates/fuse</a> in 2014 (last updated: 2017), which means I'm going to have to publish mine under some stupid codename or contrived <tt>rusty-libfuse-for-rust-lib</tt> nonsense.</p><p>How did this happen? It's not like package registries are a new invention – PyPI launched in 2003, and CPAN has been running since 1995 (!). NPM has had optional namespaces ("scopes") since at least 2014.</p><p>Go demonstrates how to do distributed package naming well. A Go package is identified by a hierarchial path rooted at a DNS domain, which both solves the issue of ownership (defer to DNS) and lets big shared hosting providers like GitHub cleanly subdivide their namespace. If Cargo had done the same we might have package names like <tt>"github.com/rust-lang/git2-rs"</tt>, which while not <i>great</i> at least avoids staking a claim on the very concept of Git.</p><p>But since crates.io is centralized, it can be terser than Go. Cargo could use crates.io as the <i>default</i>, using the presence of a period to distinguish non-default registries in <tt>Cargo.toml</tt>. And if you combined it with NPM's sigils, the official libgit2 binding <tt>"@rust/git2"</tt> could be registered at the same time as Jane Doe's experimental <tt>"~jdoe/git2"</tt> package, and could live on the same Internet as <tt>"example.com/rust-stuff/git2"</tt>. Everyone would have the chance to contribute their code to the commons under a reasonable name.</p></blog-section><blog-section><h3 slot=title>Single-crate packaging</h3><p>Cargo's unit of distribution is the crate, which is a problem because Rust's compilation unit is <i>also</i> the crate. Large libraries are easier to work on when parts of the build graph can be cached, but if you try to split up a library you pretty immediately run into problems:<ul><li>Cargo won't let you define more than one <tt>[lib]</tt> per <tt>Cargo.toml</tt>, so what would be a minor refactoring requires converting the project repository into a "workspace". As a side effect this breaks many common commands, for example <tt>cargo test</tt> must be replaced with <tt>cargo test --all</tt>.</li><li>Cargo can handle release archives containing multiple crates (via path dependencies), but crates.io rejects uploads containing crates with path deps. This leads to an explosion of crate registrations, as each project needs to upload its internal organs as separate packages. Good luck with figuring out semver for <tt>mypkg-internal-macros</tt> – might as well version them all <tt>"v0.0.$(date +%s)"</tt>.</li></ul></p><p>When I was writing Rust without Cargo I was confused about why people complained about slow build times, but now I get it. Of course build times are slow if changing one line of a leaf file requires rebuilding dozens of modules. I've found Bazel and <a href=https://github.com/bazelbuild/rules_rust>rules_rust</a> provide a good alternative to Cargo, since Bazel can twist your build into any DAG you want, but most Rust users are unlikely to be excited about injecting 50MB of Java build system into the middle of their workflow.</p></blog-section></blog-article>2020-08-06T22:23:05ZCommentary on “Stop Using Encrypted Email”2020-02-22T03:40:51Zurn:uuid:1c2a16e0-f644-4b98-a69b-bb25da340852<style type=text/css scoped>li{margin:.5em 0}p,li{line-height:1.5}blockquote{border-left:10px solid #ccc;margin:.5em;padding:.1em .5em}</style><blog-article posted=2020-02-22T03:40:51Z><h1 slot=title>Commentary on “Stop Using Encrypted Email”</h1><div slot=tableofcontents></div><p>Latacora’s recent article <a href=https://latacora.micro.blog/2020/02/19/stop-using-encrypted.html>Stop Using Encrypted Email</a> prompted a lot of comments on <a href=https://tildes.net/~comp/m0o/stop_using_encrypted_email>Tildes</a> and <a href="https://news.ycombinator.com/item?id=22368888">Hacker News</a>, which is not surprising considering the author’s blunt approach to a delicate topic. I agree with its recommendations and also think it would be good if other people did so, thus, I’m going to try expanding on it a little. Think of this as comments from the peanut gallery.</p><p>In summary:</p><ul><li>Existing end-to-end encryption systems for email (including PGP and S/MIME) do not provide useful protection against interception.</li><li>Designing a useful end-to-end encryption system for email is infeasible due to the design of the SMTP protocol and the behavior of existing email clients.</li><li>If a message needs end-to-end encryption then it should not be sent via email. <a href=https://signal.org/>Signal</a> is a reasonable choice for sending encrypted messages.</li><li>If a message does not need end-to-end encryption then send it via normal unencrypted email.</li><li>Since encrypted email does not provide useful protection, and unencrypted email is easier to use, there is no reason to send encrypted email.</li></ul><blog-section><h2>“Email is unsafe”</h2><blockquote><p>Email is unsafe and cannot be made safe. The tools we have today to encrypt email are badly flawed. Even if those flaws were fixed, email would remain unsafe. Its problems cannot plausibly be mitigated. Avoid encrypted email.</p></blockquote><p>The basic problem of email (meaning specifically SMTP) is that it was never designed to be secure. The address must be sent in a form readable by the recipient’s mail server, and auxiliary data like the subject comes along for the ride because that’s just how the wire protocol works.</p><p>There have been two major security retrofits added to SMTP since the original <a href=https://tools.ietf.org/html/rfc821>RFC 821</a> was published in 1982:</p><ul><li>STARTTLS (2002) is intended to protect emails from being intercepted while being sent between mail servers. Unfortunately it is not considered a strong security layer because it doesn’t protect against man-in-the-middle<blog-footnote-ref>[<a href=#fn:1>1</a>]</blog-footnote-ref> and is not widely adopted among smaller mail service operators<blog-footnote-ref>[<a href=#fn:2>2</a>]</blog-footnote-ref>.</li><li>Message-level encryption such as PGP (1991) and S/MIME (2002) is intended to (a) authenticate the sender of emails, and (b) protect the content of emails from interception by any third party (including the mail services). This is what people mean when they say “encrypted email”. Both PGP and S/MIME have design flaws that make them ineffective against interception<blog-footnote-ref>[<a href=#fn:3>3</a>]</blog-footnote-ref>, and these flaws cannot be plausibly fixed.</li></ul><p>Even if these two security-oriented additions worked as intended – if emails were protected in transit, and message bodies end-to-end – the design of SMTP still makes email unsuitable for secure communication. More on this below.</p></blog-section><blog-section><h2>“Most email encryption […] is performative”</h2><blockquote><p>Most email encryption on the Internet is performative, done as a status signal or show of solidarity. […] It doesn’t matter whether or not these emails are safe, which is why they’re encrypted so shoddily.</p></blockquote><p>It is common, among geeks of a certain age<blog-footnote-ref>[<a href=#fn:4>4</a>]</blog-footnote-ref>, to have a GnuPG key. In bygone times conference and meetups would host “key-signing parties”, where people brought their laptops and photo IDs and solemnly cross-signed each others’ keys. Hex-formatted key fingerprints were inspected. I swear I am not making this up.</p><p>In retrospect none of it was real, or rather, none of it mattered. We never put much thought into the actual cryptography – the few GPG’d emails I sent could have been unpadded CBC for all I know. The point of signing your coworker’s GnuPG key was not to establish a secure comms channel. It was all about the ceremony: the grown-up version of children passing notes in Pig Latin.</p></blog-section><blog-section><h2>“PGP is a deeply broken system”</h2><blockquote><p>The least interesting problems with encrypted email have to do with PGP. <a href=https://latacora.micro.blog/2019/07/16/the-pgp-problem.html>PGP is a deeply broken system</a>. It was designed in the 1990s, and in the 20 years since it became popular, cryptography has advanced in ways that PGP has not kept up with.</p></blockquote><p>The early ‘90s were not good years for computer security. A lot of primitives and protocols designed back then turned out to be either too complex (ASN.1) or not sophisticated enough (MD5, SSL v2/v3). PGP managed to be both at once, combining a challenging<blog-footnote-ref>[<a href=#fn:5>5</a>]</blog-footnote-ref> packet-oriented record format with a small list of supported ciphers.</p><p>The problems with PGP are not limited to the protocol, but extend to the usability and security properties of related projects such as GnuPG<blog-footnote-ref>[<a href=#fn:6>6</a>]</blog-footnote-ref> and the SKS keyserver network<blog-footnote-ref>[<a href=#fn:7>7</a>]</blog-footnote-ref>. If starting from scratch, no contemporary security engineer would design something as slapdash as PGP.</p><blockquote><p>So, for example, it recently turned out to be possible for eavesdroppers to decrypt messages without a key, simply by tampering with encrypted messages.</p></blockquote><p>This is referencing the <a href=https://efail.de/>EFAIL</a> attack, in which most mail clients can be convinced to send the entire contents of an encrypted message to an arbitrary remote server.</p></blog-section><blog-section><h2>“All mainstream email software expects plaintext”</h2><blockquote><p>The foundations of electronic mail are plaintext. All mainstream email software expects plaintext. In meaningful ways, the Internet email system is simply designed not to be encrypted.</p><p>The clearest example of this problem is something every user of encrypted email has seen: the inevitable unencrypted reply. In any group of people exchanging encrypted emails, someone will eventually manage to reply in plaintext, usually with a quoted copy of the entire chain of email attached.</p></blockquote><p>Some people reading this might have GPG keys; a smaller set might even have exchanged GPG-encrypted emails with a contact. If you were to send an encrypted email, and receive a reply in plaintext that quoted your plaintext message, would you be upset?</p><p>If the answer is “probably not”, then what security value is GPG providing?</p><p>The strength of a security system is not only in the low-level parts, the ciphers and key sizes and so on, but in how it interacts with humans and guides them towards safety. A tool designed to send un-interceptable messages should make it difficult to accidentally leak the entire conversation. This is not true of PGP (as typically used<blog-footnote-ref>[<a href=#fn:8>8</a>]</blog-footnote-ref>), which is why PGP should be avoided.</p></blog-section><blog-section><h2>“Metadata is as important as content, and email leaks it”</h2><blockquote><p>Leave aside the fact that the most popular email encryption tool doesn’t even encrypt subject lines, which are message content, not metadata.</p><p>The email “envelope” that includes the sender, the recipient, and timestamps – is unencrypted and always will be. Court cases (and lists of arrest targets) have been won or lost on little more than this. Internet email creates a durable log of metadata, one that every serious adversary is already skilled at accessing.</p></blockquote><p>In security circles there is often talk of an “adversary”, which summarizes what the system is supposed to protect against. The adversary of a child-proof medicine bottle is a curious toddler. The adversary of a journalist reporting on organized crime is the criminal organization. The adversary of a group of anti-government rebels is the local government’s law enforcement agency. In each case the identity, capabilities, and goals of the adversary are used to decide what the system must protect against.</p><p>A typical “threat modeling” exercise might start off with something like this:</p><ul><li><b>Adversary:</b> a ten-year-old child</li><li><b>Goal:</b> find out what present they’re getting for Christmas</li><li><b>Capability:</b> physical access to parent’s locked iPhone</li></ul><p>In this case, organizing the purchase of a Christmas present over encrypted email is probably not worth the effort. A passcode on the phone, or at most some basic precautions (e.g. not putting the word ‘Christmas’ in the planning emails) is probably sufficient.</p><p>Alternatively:</p><ul><li><b>Adversary:</b> the United States federal government</li><li><b>Goal:</b> obtain evidence of contact between Jane Doe and John Smith</li><li><b>Capability:</b> can demand a full copy of Jane’s email account pursuant to an authorized search warrant</li></ul><p>In this case the use of encrypted email isn’t sufficient. The presence of emails between Jane and John is satisfies the adversary’s goal, and they can always come back for round two by demanding (from Jane and/or Joe) the emails be decrypted. The best option here is a messaging system with encrypted metadata, forward secrecy, and an enforced retention period – none of which is a feature of encrypted email.</p><p>What adversary is PGP protecting against? It’s not clear. The adversary has access to the ciphertext (otherwise PGP never enters the picture), which implies some significant real-world power – to wiretap an ISP, or compromise one of the mail providers<blog-footnote-ref>[<a href=#fn:9>9</a>]</blog-footnote-ref>. But they don’t have the power to compel Jane or John to decrypt their side of the thread? Outside of a novel, this particular combination of capabilities and limitations is rare.</p></blog-section><blog-section><h2>“Stop using encrypted email”</h2><blockquote><p>There are reasons people use and like email. We use email, too! It’s incredibly convenient. You can often guess people’s email addresses and communicate with them without ever being introduced. Every computing platform in the world supports it. Nobody needs to install anything new, or learn how to use a new system. Email is not going away.</p><p>[…]</p><p>Stop using encrypted email.</p></blockquote><p>It’s important to remember that this advice to avoid encrypted email is specifically about avoiding <i>encrypted</i> email. It’s not a problem to send ordinary, non-sensitive messages over email. Some messages will get logged forever by some shadowy agency. Some will get STARTTLS, as a treat. So long as messages needing secure handling are sent only over secured channels, it’s fine to send the rest over email or iMessage or LINE or WeChat or whatever.</p><p>Just please don’t PGP it.</p></blog-section><blog-footnotes slot=footnotes><hr><ol><li id=fn:1><p>See <a href=https://blog.filippo.io/the-sad-state-of-smtp-encryption/>The sad state of SMTP encryption</a> and <a href=https://aykevl.nl/2017/10/smtp-starttls>Enforced STARTTLS for SMTP</a>. The implementation of <code>MX</code> DNS records reduces STARTTLS to more of a public hygiene benefit than a strong security layer.</p></li><li id=fn:2><p>Google’s transparency report summarizes adoption of <a href=https://transparencyreport.google.com/safer-email/overview>email encryption in transit</a>.</p></li><li id=fn:3><p>As far as I know they’re still OK for authentication of plain-text unencrypted messages, if for some reason this is your use case.</p></li><li id=fn:4><p>I will admit I have a GPG key, and it’s published to the keyservers. I used to have its public part on a page of this website, in case someone needed to contact me on urgent Snow Crash business. I have never attended a key-signing party, but there was a time when I <i>would</i> have if the opportunity arose.</p></li><li id=fn:5><p>Parsers are a rich source of security problems. When designing a serizalization format, especially for a security-critical protocol, “challenging” is a synonym for “bad”.</p></li><li id=fn:6><p><a href=https://blog.cryptographyengineering.com/2014/08/13/whats-matter-with-pgp/>What’s the matter with PGP?</a></p></li><li id=fn:7><p><a href="https://news.ycombinator.com/item?id=20315633">https://news.ycombinator.com/item?id=20315633</a></p></li><li id=fn:8><p>My own personal workflow for GPG was command-line based; I would copy ciphertext out of my mail client into a terminal, decrypt it, encrypt the response, and the copy it back. This workflow is atypical; most users of GPG interact with it via a mail client plugin such as <a href=https://enigmail.net/index.php/en/>Enigmail</a>, which transparently decrypts messages in the mail client.</p></li><li id=fn:9><p>Another difference between 1990 and 2020 is that mail providers – hosted services in general, really – are more professionalized now. Apple, Google, and Microsoft have security departments larger than the entire headcount of a mid-90s ISP. For someone who is worried about the security of their mail provider, the best solution is to switch to one of these large and well-regarded options.</p></li></ol></blog-footnotes></blog-article>2020-02-22T03:40:51ZBy any other CNAME2020-02-05T10:25:15Zurn:uuid:9980c159-7b44-4026-a48b-1da601dbd384<style type=text/css scoped>a>img{border:2px solid;padding:2px}blockquote{border-left:10px solid #ccc;margin:.5em;padding:.1em .5em}p{line-height:1.5em}</style><blog-article posted=2020-02-05T10:25:15Z><h1 slot=title>By any other CNAME</h1><p>When an engineer joins Google, they are issued a workstation – a physical computer in tower<blog-footnote-ref>[<a href=#fn:1>1</a>]</blog-footnote-ref> form-factor that sits under their desk. Workstations have names, which the engineer gets to choose, and the fully-qualified hostname consists of this name plus an office-specific suffix. Workstations in Mountain View are in <code>.mtv.corp.google.com</code>, workstations in New York are in <code>.nyc.corp.google.com</code>, and so on – this is all tracked in various databases and synced to DNS. Intranet services that weren't office-specific, like the go/ URL shortener, were on <code>.corp.google.com</code> directly.</p><p><img src=https://john-millikin.com/by-sha256/08300bed6d03506eff8a806a4b27a2f8d9b572570fc8ed2a87f5ad918faaf1cc/google-hostnames.png style="margin:0 auto;display:block"></p><p>For convenience, Google uses a DNS feature named the <q>DNS search path</q> to let users reference workstations by short names. If I am sitting next to you, and I want to SSH into your workstation<blog-footnote-ref>[<a href=#fn:2>2</a>]</blog-footnote-ref>, I can type <code>ssh yourbox</code> instead of <code>ssh yourbox.mtv.corp.google.com</code> and it'll work. This wasn't only for workstation names, you could also use it for any sort of <code>.corp.google.com</code> name. In a browser you could type <code>http://go/somelink</code> and it would resolve to <code>http://go.corp.google.com/somelink</code>. In a PAM module you could <code>#define LDAP_ADDRESS "ldap"</code> and it would direct queries to <code>ldap.corp.google.com</code>. The halls rang with the sound of protobuf engineering, and all was at peace.</p><p>Some of you have noticed the problem.</p><p>One day I arrived at the office and discovered that I couldn't unlock my screen. This wasn't especially alarming, because nobody else in the building could log in and network outages happen sometimes. But then news started trickling in over working comms<blog-footnote-ref>[<a href=#fn:3>3</a>]</blog-footnote-ref> that this wasn't a network outage. A few early risers had working desktop sessions, and the network was fine – only attempts to log in, SSH, or sudo were hanging.</p><p>Also, it was affecting the entire Mountain View campus.</p><p>The usual debugging process was followed with unusual haste and the issue was narrowed down to DNS. One of the new hires starting that day had requested their workstation be named <code>ldap</code>, per their initials, and as soon as that hostname hit the network it hijacked every LDAP client that had been configured to talk to <code>"ldap"</code>. Unfortunately that was a big 'every' because (1) if the wrong value is easier to type then it outcompetes the correct value, and (2) the chances of a misconfiguration being discovered are the inverse of how often it happens. So pretty much everything was broken.</p><p>This story has a happy ending because Google does regular disaster recovery tests. The tests are always something outlandish, like <q>aliens have invaded and all contact with California has been lost</q>, and everyone has a good laugh around the coffee robot. The recovery procedure for total DNS outage involves taking a laptop into the <i>panic room</i>, a locked room with a direct connection to the Prod network. This was done, the owners of the machine inventory were able to delete the bad record, and new safety checks were installed around the important hostnames.</p><blog-footnotes slot=footnotes><hr><ol><li id=fn:1><p>There was a brief experiment involving decommissioned Warp 19s, which were Google-designed rackmount machines with infamously sharp edges and the sound profile of a motorrad.</p></li><li id=fn:2><p>This is not as alarming as it sounds; workstations at Google are (were?) fairly untrusted, and it was common to SSH into a coworker's machine if your own was overloaded or doing software updates or whatever.</p></li><li id=fn:3><p>Engineers almost universally used IRC for quick conversations, team chatter, and coordinating incident response. Although Slack has some good points, I find myself missing good ol' port 6697 every time loading a channel makes my fan spin up.</p></li></ol></blog-footnotes></blog-article>2020-02-05T10:25:15ZSRE School: No Haunted Forests2018-11-01T06:19:20Zurn:uuid:e3ad5fad-6f1b-498b-b5c8-f4557d8b14d9<blog-article posted=2018-11-01T06:19:20Z><h1 slot=title>SRE School: No Haunted Forests</h1><div slot=tableofcontents></div><div style="float:left;margin:0 2em 2em 0"><img src=/sre-school/no-haunted-forests/322330_20181030192733_1.png style=max-width:400px><p style=margin-top:.5em><i>Engineer debugging a Puppet manifest (2018, colorized)</i></p></div><p>All industrial codebases contain bad code. To err is human, and situations get very human when you're staring down the barrel of a launch deadline. You've heard the euphemism <i>tech debt</i>, where like a car loan you hold a recurring obligation in exchange for immediate liquidity. But this is misleading: bad code is not merely overhead, it also reduces optionality for all teams that come in contact with it. Imagine being unable to get indoor plumbing because your neighbor has a mortgage!</p><p>Thus a better analogy for bad code is a haunted forest. Bad code negatively affects everything around it, so engineers will write ad-hoc scripts and shims to protect themselves from direct contact with the bad code. After the authors move to other projects, their hard work will join the forest.</p><p>Healthy engineering orgs do not tolerate the presence of haunted forests. When one is discovered you must move vigorously to contain, understand, and eradicate it.</p><p>Make this the motto of your team: No Haunted Forests!</p><blog-section><h2 slot=title>Identifying a Haunted Forest</h2><p>Not all intimidating or unmaintained codebases are haunted forests. Code may be difficult for a newcomer to come up to speed, or it might be a stable implementation of some RFC. A couple rules of thumb to identify code worthy of a complete rewrite:</p><ul><li>Nobody at the company understands how the code should<blog-footnote-ref>[<a href=#fn:1>1</a>]</blog-footnote-ref> behave.</li><li>It is obvious to everyone on the team<blog-footnote-ref>[<a href=#fn:2>2</a>]</blog-footnote-ref> that the current implementation is not acceptable.</li><li>The project's missing features or erroneous behavior is impacting other teams.</li><li>At least one competent engineer has attempted to improve the existing codebase, and failed for technical reasons.</li><li>The codebase is resistant to static analysis, unit testing, interactive debuggers, and other fundamental tooling.</li></ul></blog-section><blog-section><h2 slot=title>Haunted Environmentalists</h2><p>Fresh graduates often push for a rewrite at the first sign of complexity, because they've spent the last four years in an environment where codebase lifetimes are measured in weeks. After their first unsuccessful rewrite they will evolve into Junior Engineers, repeating the parable of <a href=https://www.chesterton.org/taking-a-fence-down/>Chesterton's Fence</a> and linking to that old Joel Spolsky thunkpiece about Netscape<blog-footnote-ref>[<a href=#fn:3>3</a>]</blog-footnote-ref>.</p><p>Be careful not to confuse this reactive anti-rewrite sentiment with true objections to your particular rewrite. Remind them that Joel wrote that when source control meant <a href=https://en.wikipedia.org/wiki/Concurrent_Versions_System>CVS</a>.</p></blog-section><blog-section><h2 slot=title>Clearing Haunted Forests</h2><p>Rewriting an existing codebase should be modeled as a special case of a migration. Don't try to replace the whole thing at once: systematize how users interact with the existing code, insert strong API boundaries between subsystems, and make changes intentionally.</p><p><b>User Interaction</b> will make or break your rewrite. You must understand what the touch-points are for users of the existing system to avoid exposing them to maintain <a href=https://danluu.com/ui-compatibility/>UI Compatibility</a>. Often rewrites mandate some changes, so try to put them all near the start (if you know what the final state should be) or delay them to the end (when you can make it seem like a big-bang migration). If the user-facing changes are significant, see if you can arrange for separate opt-in and opt-out periods during which both interaction modes co-exist.</p><p><b>Subsystem API Boundaries</b> let you carve up the old system into chunks that are easier to reason about. Be fairly strict about this: run the components in separate processes, separate machines, or whatever is needed to guarantee that your new API is the only mechanism they have to communicate. Do this recursively until the components are small enough that rewriting them from scratch is tedious instead of frightening.</p><p><b>Intentional Changes</b> happen when the new codebase's behavior is forced to deviate from the old. At this point you should have a good idea which behavior, if either, is correct. If there's no single correct behavior, it's fine to settle for "predictable" or (in the limit) "deterministic". By making changes intentionally you minimize the chances of forced rollbacks, and may even be able to detect users depending on the old behavior.</p><p>Work incrementally. A good rewrite is valid and fully functional at any given checkpoint, which might be commits or nightly builds or tagged releases. The important thing is that you never get into a state where you're forced to roll back a functional part of the new system due to breakage in another part.</p></blog-section><blog-section><h2 slot=title>Common Features of Haunted Forests</h2><p>All bad code is bad in its own special way, but there are some properties that are especially likely to make it hard to refactor incrementally. These are generally programming styles that hide state, obscure control flow, or permit type confusion.</p><p><b>Hidden State</b> means mutable <a href=https://en.wikipedia.org/wiki/Global_variable>global variables</a> and <a href=https://en.wikipedia.org/wiki/Scope_(computer_science)#Dynamic_scoping>dynamic scoping</a>. Both of these inhibit a reader's understanding of what code will do, and forces them to resort to logging or debuggers. They're like catnip for junior developers, who value succinct code but haven't yet been forced to debug someone else's succinct code at 3 AM on a Sunday.</p><p><b>Non-Local Control Flow</b> prevents a reader from understanding what path execution will take. In the old times this meant <code>setjmp</code> and <code>longjmp</code>, but nowadays you'll see it in the form of callbacks and event loops. Python's <a href=https://en.wikipedia.org/wiki/Twisted_(software)>Twisted</a> and Ruby's <a href=https://en.wikipedia.org/wiki/EventMachine>EventMachine</a> can easily turn into global callback dispatchers, preventing static analysis and rendering stack traces useless.</p><p><b>Dynamic Types</b> require careful and thoughtful programming practices to avoid turning into "type soup". Highly magical metaprogramming like <code>__getattr__</code> or <code>method_missing</code> are trivially easy to abuse in ways that make even trivial bug fixes too risky to attempt. Tooling such as <a href=http://mypy-lang.org/>Mypy</a> and <a href=https://flow.org/>Flow</a> can help here, but introducing them into an existing haunted forest is unlikely to have significant impact. Use them in the new codebase from the start, and they might be able to reclaim portions of the original code.</p><p><b>Distributed Systems</b> can become haunted forests through sheer size, if no single person is capable of understanding the entire API surface they provide. Note that microservices don't automatically prevent this, because merely splitting up a monolith turns the internal structure into API surface. Each of the above per-process issues has distributed analogues, for example S3 is global mutable state and JSON-over-HTTP is dynamically typed.</p></blog-section><blog-footnotes slot=footnotes><hr><ol><li id=fn:1><p>A codebase where nobody knows what behavior it <i>currently has</i> is materially different from one where nobody understands what behavior it <i>should have</i>. The former don't need to be rewritten, because you can grind their test coverage up and then safely refactor.</p></li><li id=fn:2><p>You will sometimes hear objections from people who have not worked directly on the bad code, but have opinions about it anyway. Let them know that they're welcome to help out and you can arrange for a temporary rotation into the role of Forest Ranger.</p></li><li id=fn:3><p>The <i>real</i> reason Netscape failed is they wrote a dreadful browser, then spent three years writing a second dreadful browser. The fourth rewrite (Firefox) briefly had a chance at being the most popular browser, until Google's rewrite of <a href=https://en.wikipedia.org/wiki/Konqueror>Konqueror</a> took the lead. The moral of this story: rewrites are a good idea if the new version will be better.</p></li></ol></blog-footnotes></blog-article>2018-11-01T06:19:20Z(More) Effective Go2018-08-05T15:13:29Zurn:uuid:8c12efbf-7f85-49dd-9fae-b79d71d51422<blog-article posted=2018-08-05T15:13:29Z><h1 slot=title>(More) Effective Go</h1><blog-section><h2 slot=title>Unbounded Iteration</h2><p>"Unbounded iteration" is when you need to iterate over a sequence without knowing its total length. For example, receiving rows from a database query or data chunks from an HTTP response. Other languages have a native concept of "iterators" such that iterating over an array and stream use the same syntax, but Go doesn't do this.</p><p>The best approach to unbounded iteration in Go is a callback:</p><blog-code syntax=go><pre>
func stream(cb func(int)) {
for _, x := range []int{1, 2, 3} {
cb(x)
time.Sleep(time.Second)
}
}
func main() {
stream(func(x int) {
fmt.Println(x)
})
}
</pre></blog-code><p>Some developers new to Go may try to use a channel and background thread for unbounded iteration. Don't do this:</p><blog-code syntax=go><pre>
func stream() <-chan int {
ch := make(chan int)
go func() {
defer close(ch)
for _, x := range []int{1, 2, 3} {
ch <- x
time.Sleep(time.Second)
}
}()
return ch
}
func main() {
for x := range stream() {
fmt.Println(x)
}
}
</pre></blog-code><p>Threads and thread-safe communication are cheap in Go, but not free. They add runtime and mental overhead – you need to think about the lifetime of any temporary channels backing your loops, and make sure they get properly drained so their backing threads can terminate.</p><p>The channel approach is also more difficult to extend. If a callback needs to change to return an error, or a non-error early exit, it's straightforward to add a return type. Channels have no mechanism to return data from the receiver.</p></blog-section><blog-section><h2 slot=title>Option Interfaces</h2><p>Keyword arguments ("kwargs") are commonly used in other languages for passing optional parameters to a complicated API. Go doesn't have kwargs, and they can be awkward to imitate using an "options struct" because the receiver can't easily tell whether an option was explicitly set to its zero value:</p><blog-code syntax=go><pre>
type Options struct {
ConcurrencyToken uint32
}
func Fetch(opts Options) {
if opts.ConcurrencyToken == 0 {
// is this fetch being run without a concurrency token? or did the
// caller set a token, but it happens to be 0x00000000 ?
}
}
</pre></blog-code><p>Option interfaces let the options themselves be defined by functions, so that presence/absence, validation, and complex defaults are expressed naturally (with a cost of increased boilerplate):</p><blog-code syntax=go><pre>
type Option interface {
apply(*options)
}
type fnOption func(*options)
func (fn fnOption) apply(opts *options) { fn(opts) }
type options struct {
concurrencyToken *uint32
}
func ConcurrencyToken(token uint32) Option {
return fnOption(func(opts *options) {
opts.concurrencyToken = &token
})
}
func Fetch(opts ...Option) {
appliedOpts := options{}
for _, opt := range opts {
opt.apply(&appliedOpts)
}
if appliedOpts.ConcurrencyToken == nil {
// definitely being run without a concurrency token
}
}
</pre></blog-code></blog-section><blog-section><h2 slot=title>Prefer POSIX Flags</h2><p>Go's standard library contains a <code>flags</code> package for parsing command-line flags. It uses Plan 9 flag semantics, which are alien to advanced users with a Linux, UNIX, or Windows background (i.e. all of them).</p><p>The <a href=https://godoc.org/github.com/spf13/pflag><code>"github.com/spf13/pflag"</code></a> package is API-compatible with the stdlib `flags` package, has extra API for features like "short" flags, and can automatically import flag definitions from libraries that use the stdlib.</p><blog-code syntax=go><pre>
import (
goflag "flag"
flag "github.com/spf13/pflag"
)
func main() {
flag.CommandLine.AddGoFlagSet(goflag.CommandLine)
flag.Parse()
}
</pre></blog-code></blog-section><blog-section><h2 slot=title>Dynamic Flag Defaults</h2><p>Sometimes you'll want a command-line flag with a default value that can't be hardcoded, like <code>--config-path</code> that defaults to somewhere in the user's home directory. A common patttern is to let the flag's zero value mean "use computed default", but this makes <code>--help</code> output less useful.</p><p>It's better to use a computed value for the flag's default at definition time, then (1) <code>--help</code> will show that default value and (2) code consuming the flag doesn't need to special-case it.</p><blog-code syntax=go><pre>
configPath = flag.String("config-path", defaultConfigPath(), "[your wonderful documentation here]")
func defaultConfigPath() {
path := os.ExpandEnv("${HOME}/.config/my-client-config"
if _, err := os.Stat(path); err == nil {
return path
}
return ""
}
</pre></blog-code></blog-section><blog-section><h2 slot=title>Errors Should Include Stack Traces</h2><p>If your code constructs errors with <code>fmt.Errorf()</code> or similar standard library functions, you're implicitly dropping the stack trace of where that error happened. Prefer to use the <a href=https://godoc.org/github.com/pkg/errors><code>"github.com/pkg/errors"</code></a> package, which records stack traces when the error is created and can preserve them as explanatory text is added in callers.</p><p>Custom error types can also use this library to obtain and propagate stack traces:</p><blog-code syntax=go><pre>
import "github.com/pkg/errors"
type myCustomError struct {
code int32
trace errors.StackTrace
}
func (err *myCustomError) StackTrace() errors.StackTrace {
return err.trace
}
func fail(code int32) error {
trace := errors.New("").(stackTrace).StackTrace()
return &myCustomError{
code: code,
trace: trace[1:],
}
}
</pre></blog-code></blog-section><blog-section><h2 slot=title>Avoid Mutable Globals</h2><p>This is standard good programming practice, but I want to specifically call it out here because the Go standard library is full of these things. You must be careful.</p><p>For example, the <code>net/http</code> package has functions <code>Handle()</code>, <code>ListenAndServe()</code>, etc that operate on <code>http.DefaultServeMux</code>. You don't want to use these. Prefer to explicitly create your own <code>*http.ServeMux</code> and pass it around as an explicit parameter. Then when you want to write tests you won't need to go back and figure out all the places you're poking at mutable global state.</p><blog-code syntax=go><pre>
// BAD
http.Handle("/foo", fooHandler)
http.ListenAndServe(":8080", nil)
// GOOD
mux := http.NewServeMux()
mux.Handle("/foo", fooHandler)
http.ListenAndServe(":8080", mux)
</pre></blog-code></blog-section><blog-section><h2 slot=title>Don't Mutate or Invalidate Parameters</h2><p>This is a hard rule for your public API. It's also helpful to comply with in private APIs, but you don't <i>need</i> to if you're willing to accept the risk of weird bugs.</p><p>A public function defined like <code>Listen(addrs []string)</code> shouldn't mutate or invalidate the value passed in for <code>addrs</code>:</p><blog-code syntax=go><pre>
// BAD!
func Listen(addrs []string) {
for ii, addr := range addrs {
addrs[ii] = addr + ":1234"
}
}
// BAD!
func Listen(addrs []string) {
addrs = append(addrs, "localhost:1234")
}
</pre></blog-code><p>If you need to make adjustments to a user-provided value, copy it first:</p><blog-code syntax=go><pre>
func Listen(addrs []string) {
addrs = append([]string{}, addrs...)
// safe
addrs = append(addrs, "localhost:1234")
}
</pre></blog-code></blog-section></blog-article>2018-08-05T15:13:29ZError Beneath the WAVs2018-08-04T23:52:33Zurn:uuid:b4c81f06-faeb-4803-95a5-770b1bcf9aa6<blog-article posted=2018-08-04T23:52:33Z updated=2018-08-11T06:15:47Z><h1 slot=title>Error Beneath the WAVs</h1><div slot=summary><p>This is a follow-up to <a href=/🤔/why-i-ripped-the-same-cd-300-times>Why I Ripped The Same CD 300 Times</a>. By the end of that page I'd identified a fragment of audio data that could cause read errors even if it was isolated and burned to a fresh CD. Further testing yielded a "cursed WAV" that consistently prevents perfect rips on different brands of optical drive, ripping software, and operating system.</p><p><b>EDIT (2018-08-10):</b> It worked! With the power of the two-sheep LTR-40125S I can successfully rip the original discs, with bit-exact audio data and a matching AccurateRip report.</p><p><b>🐻 превед!</b> Arrived from IXBT? This is similar to the post «Магия чисел», but the source of errors is a little different. Please see the <a href=#cdda-vs-cd-rom>CDDA vs CD-ROM</a> section.</p></div><blog-section><h2 slot=title>“History became legend. Legend became myth.”</h2><div style="float:right;padding:0 0 0 2em"><img src=/🤔/error-beneath-the-wavs/gandalf.jpg></div><p>The root cause would have been forever mysterious and unknown to me, but for <a href="https://news.ycombinator.com/item?id=17658515">this Hacker News comment</a> by <a href="https://news.ycombinator.com/user?id=userbinator">userbinator</a>
<blog-footnote-ref>[<a href=#fn:1>1</a>]</blog-footnote-ref>:</p><blockquote>It is likely "weak sectors", the bane of copy protection decades ago and of which plenty of detailed articles used to exist on the 'net, but now I can find only a few:<br><br><a href=http://ixbtlabs.com/articles2/magia-chisel/index.html>http://ixbtlabs.com/articles2/magia-chisel/index.html</a><br><a href=https://hydrogenaud.io/index.php/topic,50365.0.html>https://hydrogenaud.io/index.php/topic,50365.0.html</a><br><a href=http://archive.li/rLugY>http://archive.li/rLugY</a><br></blockquote><p>This page explores how "weak sectors" are caused by bad encoding logic in a CD burner. Probably what happened is the artist gave the factory a master on CD-R, which had been burned on a drive with affected firmware. The master contained the bad EFM encoding and was accurately duplicated into the pressed CDs.</p></blog-section><blog-section><h2 slot=title>TODO</h2><div style="float:left;padding:0 2em 1em 0"><div><img src=/🤔/error-beneath-the-wavs/efm.jpg style=max-width:350px></div></div><p>Physically a CD's data track is a spiral of "pits" and "lands", where at each clock cycle a transition is "1" and lack of transition is "0". Directly encoding data bytes in this format would cause some transitions to occur too quickly for a detector to track, so bytes are "stretched" to 14 bits using <a href=https://en.wikipedia.org/wiki/Eight-to-fourteen_modulation>eight-to-fourteen modulation</a>. The sequential "EFM codewords" are separated by three "merging bits", which are chosen by the writing device under two constraints:</p><ol><li>The bitstream may not have two consecutive 1s, or more than ten consecutive 0s.</li><li>The bitstream should avoid "<a href=https://en.wikipedia.org/wiki/DC_bias>DC bias</a>" by maintaing roughly equal counts of 1s and 0s.</li></ol><p>It appears that in the ~15 years between the optical disc's invention and the spread of home burning, knowledge of the EFM modulator's role in reducing DC bias was lost.</p><div style=clear:both></div><p><a href=https://patents.google.com/patent/US4603413>US patent US06614934</a> (granted 1986-07-29):</p><div style="float:right;padding:0 0 0 2em"><img src="/🤔/error-beneath-the-wavs/US4603413 figure 2.png" style=max-width:350px></div><blockquote><p>In order to maintain at least the minimum run length when the channel bits of successive symbols are merged into a single channel bit stream, at least two additional "merging bits" are added to the channel bits for each symbol. As a result of this, however, the digital sum value (DSV) of the channel bits of successive symbols may become appreciable, [...]</p><p>It has been found that under certain conditions, despite the addition of merging bits to minimize the d.c. unbalance (or DSV) of the channel bits, the DSV may become sufficiently significant to adversely affect read-out of the channel bits.</p></blockquote><div style=clear:both></div><p>Contrast with this quote from later material (circa 2002):</p><blockquote><p>The CD-Reader has trouble reading CD's with a high DSV, because (Not sure about this info, this is just an idea from Pio2001, a trusted source), the pits return little light when they are read.</p></blockquote></blog-section><blog-section><h2 slot=title>Come Sing Along With the Pirate Song</h2><div style="float:left;padding:0 2em 0 0"><iframe type=text/html width=320 height=180 src=https://www.youtube.com/embed/kY-pUxKQMUE frameborder=0></iframe></div><p>All of this would have remained an obscure detail of CD manufacturing until some Macrovision employee circa 2000 realized that consumer-grade CD burners didn't implement DSV scrambling. Instead of <a href=https://thenextweb.com/plugged/2016/03/30/amazon-bans-faulty-usb-c-cables-google-engineer-reviewed-hundreds/>responsibly reporting defective hardware</a>, they decided to build digital restrictions products around bit patterns known to trigger bad EFM modulation. The resulting data corruption could be used to detect pirated copies, with a minor side effect of <a href=https://forum.paradoxplaza.com/forum/index.php?threads/why-safedisc-2-is-a-flawed.4115/>preventing customers from using legally purchased software</a>.</p><p>The piracy scene named these difficult-to-modulate bit patterns "weak sectors".</p><div style=clear:both></div><p>userbinator's comment links to <a href=https://web.archive.org/web/20090603002402/http://sirdavidguy.coolfreepages.com/SafeDisc_2_Technical_Info.html>http://sirdavidguy.coolfreepages.com/SafeDisc_2_Technical_Info.html</a>, which contains a concrete example of a "weak sector" pattern:</p><div style="float:right;padding:0 0 0 2em"><img src=/🤔/error-beneath-the-wavs/dabab14de94000f608f6cf75f43aa5a26a6cb49d.png style=max-width:350px></div><blockquote><p>Feeding a regular bit pattern into the EFM encoder can cause a situation in which the merging bits are not sufficient to keep the DSV low. For example, if the EFM encoder were fed with the bit pattern "D9 04 D9 04 D9 04 D9 04 D9 04 D9 04 D9 04 D9 04 D9 04 D9 04 D9 04" [...]</p></blockquote><p>It also speculates that correct EFM modulation was abandoned due to performance concerns:</p><blockquote><p>The algorithm for calculating the merging bits is far too slow to be viable in an actual CD-Burner. Therefore, CD designers had to come up with their own algorithms, which are faster. The problem is, when confronted with the weak sectors, the algorithms cannot produce the correct the merging bits. This results in sectors filled with incorrect EFM-Codes. This means that every byte in the sector will be interpreted as a read error. The error correction is not nearly enough to correct every byte in the sector (obviously).</p></blockquote><p>This seems plausible: the early 2000s was a time of fierce competition between optical drive vendors, and speed was king. A drive that could only write at 4x would lose sales to its 16x competitors, even if its output was technically more correct.</p></blog-section><blog-section><h2 slot=title>Counting Sheep</h2><div style="float:left;padding:0 2em 1em 0"><img src=/🤔/error-beneath-the-wavs/sheep.png></div><p>The quality of a CD drive's EFM algorithm was of mostly academic interest when it came to the general population<blog-footnote-ref>[<a href=#fn:2>2</a>]</blog-footnote-ref>, but extremely important to pirates. As contemporary games' DRM schemes relied on more and more subtle properties of the physical media, pirates researched which drives were capable of duplicating them. A discerning pirate bought their CD burner based on its "sheep rating" (named after the <a href=https://en.wikipedia.org/wiki/CloneCD>CloneCD</a> mascot). Drives were "<a href="https://web.archive.org/web/20050307031140/http://club.cdfreaks.com/showthread.php?t=101608">sheep tested</a>" with data files generated by Alexander Noé's <a href=http://www.alexander-noe.com/weaksectors/index-eng.html>Weak Sector Utility</a>.</p><table><thead><tr><th>Sheep Rating</th><th>Capability</th></tr></thead><tbody><tr><td>0</td><td>Can't duplicate CDs containing weak sectors</td></tr><tr><td>1</td><td>Can duplicate CDs containing SafeDisc up to version 2.4.x</td></tr><tr><td>2</td><td>Can duplicate CDs containing SafeDisc up to version 2.5.x</td></tr><tr><td>3</td><td>Can duplicate CDs containing any possible weak sectors</td></tr></tbody></table><p>My next objective was to purchase a CD drive capable of burning and re-reading the original track. Such a drive would be able to rip the entire CD in one pass, thereby providing a clean rip log<blog-footnote-ref>[<a href=#fn:3>3</a>]</blog-footnote-ref>. But contemporary reviews of optical drives no longer include a sheep rating – copying a Blu-ray is a matter of cryptography rather than error correction gimmicks, and CD ripping doesn't drive ad impressions.</p><div style="float:right;padding:0 0 0 2em"><img src="/🤔/error-beneath-the-wavs/Lite-On LTR-40125S.jpg"></div><p>The best resource I found is <a href=https://web.archive.org/web/20050825194656/http://www.makeabackup.com/burners.html>makeabackup.com/burners.html</a>, which contains lists of optical drives categorized by sheep rating. The brand I found mentioned most commonly in archived piracy forums was Plextor, so I was surprised to see no Plextor drives on the 2-sheep list. Instead, I bought a Lite-On LTR-40125S<blog-footnote-ref>[<a href=#fn:4>4</a>]</blog-footnote-ref>. Once it arrives I'll expand this page with my findings (check back on 2018-08-11).</p><p><b>EDIT (2018-08-10):</b> It worked! With the power of the two-sheep LTR-40125S I can successfully rip the original discs, with bit-exact audio data and a matching AccurateRip report.</p><p>I found several references to "three-sheep burners" as a semi-mythical achievement, but no concrete evidence of such a device ever being sold. It's possible that the Yamaha CRW3200 in "Audio Master Quality" mode might have been able to duplicate certain discs at a three-sheep level, by writing physically larger data tracks at the cost of reduced capacity<blog-footnote-ref>[<a href=#fn:5>5</a>]</blog-footnote-ref>. Since the high DC bias manifests as tracking errors, a disc that's easier to track may be the solution.</p></blog-section><blog-section><h2 slot=title>Making WAVs</h2><p>If you've been following along at home with your favorite hex editor, you'll notice that <a href=https://github.com/jmillikin/john-millikin.com/blob/master/%F0%9F%A4%94/why-i-ripped-the-same-cd-300-times/minimal.flac>the "cursed" portion of the original audio</a>
<blog-footnote-ref>[<a href=#fn:6>6</a>]</blog-footnote-ref> has very long sequences with a <code>__ 0x04 __ 0x04</code> pattern. This matches the <code>0xD9 0x04 0xD9 0x04</code> sample above. Was there something special about 0x04? Was there a correlation between weak sectors and EFM patterns? To answer these questions I generated a synthetic test file, containing single <a href=https://en.wikipedia.org/wiki/Track_(optical_disc)#Audio_tracks>CDDA frames</a> full of a suspect pattern, joined by long runs of 0x00 padding<blog-footnote-ref>[<a href=#fn:7>7</a>]</blog-footnote-ref>. Burning and ripping the "cursed wav" verified that some of these patterns were sufficient on their own to cause rip errors<blog-footnote-ref>[<a href=#fn:8>8</a>]</blog-footnote-ref>.</p><p>I wasn't able to figure out how to predict the behavior of a particular byte pattern. Of the patterns tested, many were harmless (or at least don't affect any of my drives). Stretches of identical bytes were also harmless, so it wasn't <i>just</i> repetition in play. Given the difficulty of measuring voltage levels in an optical drive's ICs, it's likely I'll never figure out exactly what causes particular patterns to cause errors.</p><div style="float:left;padding:0 2em 0 0"><div><img src="/🤔/error-beneath-the-wavs/Screenshot from 2018-08-04 15-33-24.png" style=max-width:350px></div></div><blog-code syntax=python style=display:inline-block><pre>
import contextlib, wave
FRAME_BYTES = 2352
SILENCE = "\x00" * (FRAME_BYTES * 3)
with contextlib.closing(wave.open("all-cursed.wav", "w")) as out:
out.setnchannels(2)
out.setsampwidth(2)
out.setframerate(44100)
out.writeframes(SILENCE)
for b1 in range(0, 255):
for b2 in range(0, 255):
frame = "%s\x01%s\x01" % (chr(b1), chr(b2))
out.writeframes(frame * (FRAME_BYTES / len(frame)))
out.writeframes(SILENCE)
</pre></blog-code></blog-section><blog-section><h2 slot=title>CDDA vs CD-ROM</h2><p>The issue of repeated patterns causing silent data corruption appears to be specific to audio CDs (CDDA) – SafeDisc operated by detecting their presence, not by preventing the rip. However, silent data corruption could affect data CDs (CD-ROM) as documented by <a href="http://forum.ixbt.com/topic.cgi?id=31:17979">a forum post on IXBT</a> (summaries: <a href=https://www.ixbt.com/optical/magia-chisel.shtml>Russian</a>, <a href=http://ixbtlabs.com/articles2/magia-chisel/index.html>English</a>) – specific sequences of bytes can be mistaken for the sector synchronization header, and cause portions of individual files on an optical disc to become unreadable.</p><div style="float:right;padding:0 0 1em 2em"><img src=/🤔/error-beneath-the-wavs/scrambler1.png style=max-width:350px></div><blockquote><p>Over 2 / 3 of the drives tested fail to read the file if it contains a signature that turns into a data sequence identical to Sync Header! Except Toshiba and HP, all manufacturers use sync header as a key sign of the sector start at data reading.</p></blockquote><p>The behavior users see is a little different. For a data CD, the sequence must scramble to exactly match the sector sync header. In audio CDs, the responsible byte patterns must be repeated many times to drive the DSV value high enough to force read errors.</p><div style=clear:both></div></blog-section><blog-footnotes><hr><ol><li id=fn:1><p>"Only a few" was an understatement. What looks to have been a flourishing community of must archivists and game pirates has nearly vanished from the Internet, losing years of reports and research about how CD error handling works in real-world conditions. This writeup was made possible by Internet Archive.</p></li><li id=fn:2><p>A typical high-end hard drive of that era might store around 100 GB of data. Music was typically ripped to MP3 or AAC; there was no point in worrying about the exact value of bits fed into a lossy encoder.</p></li><li id=fn:3><p>All of my personal CD rips are archived along with their rip log, so I can verify the audio CRCs during backup tests.</p></li><li id=fn:4><p>This thing's a blast from the past. The CD ripping scene imploded before SATA had reached optical drives, so it's got PATA pins and one of those four-pin Molex power sockets.</p></li><li id=fn:5><p>I found several reviews praising how AMQ discs skipped less often, but no reviews specifically about its capabilities with weak sectors. This is unsurprising given that the capacity differences would render it unusable for software piracy.</p></li><li id=fn:6><p>I say "original", but it turns out that's wrong at this level of detail. I used <a href=https://www.audacityteam.org/>Audacity</a> to cut out that section, because I assume it was capable of moving bytes from one lossless file to another without changing those bytes. Not so! Audacity will happily _mangle the shit_ out of audio data. You can verify this by opening a .wav, writing it back out, and comparing the two files. It's also fun to look at the spectrogram of an "empty" audio file that got passed through Audacity.</p></li><li id=fn:7><p>I also ran tests with other padding bytes, to determine that the corruptions were caused by inter-frame copying (vs dumb "zero out bad bytes" logic).</p></li><li id=fn:8><p>A variant wrote patterns to separate short tracks. Ripping these with the Lite-On in EAC reported "OK" CRCs for all of the tracks, despite being very obviously mangled. This is alarming behavior from a ripping program designed to detect corrupt data. I suspect there's an edge condition in the cache busting that doesn't work right when tracks are very short.</p></li></ol></blog-footnotes></blog-article>2018-08-04T23:52:33ZWhy I Ripped The Same CD 300 Times2018-07-30T23:54:21Zurn:uuid:0d33134a-13e0-4a5c-83bc-9e9d698ca14c<blog-article posted=2018-07-30T23:54:21Z updated=2018-08-11T06:15:47Z><h1 slot=title>Why I Ripped The Same CD 300 Times</h1><div slot=summary><p>I collect music by buying physical CDs, digitizing them with <a href=http://www.exactaudiocopy.de>Exact Audio Copy</a>, and scanning the artwork. This is sometimes challenging if the CD was self-published in a limited run in a foreign country ten years ago. It is very challenging if the CDs have an innate defect that renders some tracks unreadable.</p><p>(Русский перевод: <a href=https://habr.com/post/418995/>Зачем я рипнул один компакт-диск 300 раз</a>)</p><p><b>UPDATE (2018-08-10):</b> See the follow-up post <a href=/🤔/error-beneath-the-wavs>Error Beneath the WAVs</a> for more info about what exactly is wrong with my discs, and info about which CD drives are capable of reading them. I got a perfect rip by upgrading to a "two-sheep" CD drive.</p></div><blog-section id=kaerubeki-shiro><h2 slot=title>帰るべき城</h2><img src=/🤔/why-i-ripped-the-same-cd-300-times/shiro_jacket.jpg style="float:left;margin:0 2em 2em 0"><p>The piano arrangement album <a href=https://altneuland.net/gallery/the-citadel-where-to-return/>帰るべき城</a> by <a href=https://altneuland.net/>Altneuland</a> was published in 2005. I discovered it in 2008 (probably on YouTube), downloaded the best copy I could find, and filed it away in the TODO list. Recent advances in international parcel forwarding technology let me buy a used copy last year, but when it arrived none of my CD drives could read track #3. This sort of thing is common when buying used CDs, especially if they need to transit a USPS international shipping center. I shelved it and kept on the lookout for another copy, which I located last month. It arrived on Friday, I immediately tried to rip it, and hit the <i>exact same error</i>. This didn't seem to be an issue of wear or damage – the CD itself was probably defective from the factory.</p><p>I had three choices: accept an imperfect rip in my archives<blog-footnote-ref>[<a href=#fn:1>1</a>]</blog-footnote-ref>, hope to find another copy some day that would rip successfully (unlikely), or somehow regenerate the original audio data from my corrupt copies. You already know which branch I took.</p></blog-section><blog-section><h2 slot=title>How Ripping Works</h2><div style="float:right;padding:0 0 2em 2em"><img src="/🤔/why-i-ripped-the-same-cd-300-times/Screenshot from 2018-07-30 09-20-48.png" style=max-width:350px><p style=margin-top:0><i>EAC failing to read track #3 of 「帰るべき城」</i></p></div><p>CDs store digital data, but the interface between CDs, lasers, and optical diodes is very analog. Read errors can be caused by anything from dirty media, to scratches on the protective polycarbonate layer, to vibration from the optical drive itself. The primitive error correction codes in the <a href=https://en.wikipedia.org/wiki/Compact_Disc_Digital_Audio>CDDA standard</a>, designed to minimize audible distortions on lightly used disks, are not capable of fully recovering the bitstream on CDs with a significant error rate. Contemporary CD ripping software works around this with two important error detection techniques: redundant reads and AccurateRip.</p><p>The page <a href=http://www.exactaudiocopy.de/en/index.php/overview/basic-technology/extraction-technology/>EAC: Extraction Technology</a> describes EAC's approach to redundant reads:</p><blockquote><p>In secure mode this program either reads every audio sector at least twice [...] If an error occurs (read or sync error), the program keeps on reading this sector, until eight of 16 retries are identical, but at maximum one, three or five times (according to the selected error recovery quality) these 16 retries are read. So, in the worst case, bad sectors are read up to 82 times!</p></blockquote><p>Simple enough. If a read request sometimes returns bad data, read everything twice, and then be extra careful if the first two reads didn't match. <a href="https://wiki.hydrogenaud.io/index.php?title=AccurateRip">AccurateRip</a> is the same principle, but distributed – it's a service to which rippers can submit checksums of their ripped audio files. The idea is that if you rip a track and see that a thousand other people got the same bits for the same track, then your rip was probably good.</p><p>This article is about what happens with both techniques fail. EAC can't make progress if every single read returns different data, and because it's rare the AccurateRip database only had a single entry<blog-footnote-ref>[<a href=#fn:2>2</a>]</blog-footnote-ref>.</p></blog-section><blog-section><h2 slot=title>“I walked ten thousand aisles, ten thousand aisles to see you”</h2><div style="max-width:350px;float:left;margin:0 2em 2em 0"><img src=/🤔/why-i-ripped-the-same-cd-300-times/IMG_20180730_123418.jpg style=max-width:350px><p style=margin-top:0><i>Optical drives from Asus, LG, Lite-On, Pioneer, and an unknown OEM</i></p></div><p>A practical solution to CDs that won't rip is to use a different drive. Sometimes a particular model is especially lenient with the CDDA spec or has better error correction firmware or whatever. The DBpoweramp forums maintain a <a href=https://forum.dbpoweramp.com/showthread.php?37706-CD-DVD-Drive-Accuracy-List-2016>CD/DVD Drive Accuracy List</a> for rippers to select a good drive.</p><p>On Saturday morning I bought five new CD drives from different brands<blog-footnote-ref>[<a href=#fn:3>3</a>]</blog-footnote-ref>, tried rips on all of them, and found one that could maintain sync through the broken track. Unfortunately the confirmation rip failed to verify – there were about 20,000 bytes different betwen each rip.</p><p>But now I had a .wav file sitting on disk, and a way to get more of them. Reasoning that the read errors on a bad disk will fluctuate around the "correct" value, I figured I'd rip it a couple times, find the most "voted" value for unstable bytes, and use that as a correction. This approach was ultimately successful, but was far more work than I expected.</p></blog-section><blog-section><h2 slot=title>“Quantity has a quality all its own”</h2><div style="max-width:350px;float:right;margin:0 2em"><div><a href="/🤔/why-i-ripped-the-same-cd-300-times/Screenshot from 2018-07-30 15-24-15.png"><img src="/🤔/why-i-ripped-the-same-cd-300-times/Screenshot from 2018-07-30 15-24-15.png" style=max-width:350px></a></div><p style=margin-top:0><i>Corrected and uncorrectable errors per rip count</i></p></div><p>I started by ripping one of the CDs repeatedly, recording all the values for each byte, and declaring an error "correctable" if more than half of the rips had used a particular byte value at that position. Initial behavior was good, the number of uncorrectable errors dropped from almost ~6900 bytes at N=4 to ~5000 bytes at N=10. The per-rip benefit slowly decreased over time, until at around N=80 the uncorrectable error count stabilized at ~3700. I stopped ripping at N=100.</p><div style=clear:both></div><div style="max-width:350px;float:right;margin:0 2em"><div><a href="/🤔/why-i-ripped-the-same-cd-300-times/Screenshot from 2018-07-30 15-24-20.png"><img src="/🤔/why-i-ripped-the-same-cd-300-times/Screenshot from 2018-07-30 15-24-20.png" style=max-width:350px></a></div><p style=margin-top:0><i>Same, but for two disks with cross-checked corrections</i></p></div><p>Next I tried ripping the second CD 100 times and using the two correction maps to "fill in" uncorrectable error positions in the other disk. This was a failure: each disk had thousands of corrections that disagreed with corrections on the other disk! It turns out you can't fix noisy data by intersecting it with a different-but-related noise source.</p></blog-section><blog-section><h2 slot=title>Arts and Crafts</h2><div style="max-width:350px;float:left;margin:0 2em 2em 0"><img src=/🤔/why-i-ripped-the-same-cd-300-times/IMG_20180730_104806.jpg style=max-width:350px></div><p>The EAC site has another nice resource: the <a href=http://www.exactaudiocopy.de/en/index.php/other-projects/dae-quality/>DAE Quality Test</a>, which quantifies the error correction capability of an optical disk drive's firmware. This is a different, lower-level type of error handling that can <i>fix</i> read errors instead of merely reporting them. The catch is that EAC's "secure mode" works by disabling or avoiding this built-in correction code, on the assumption that it doesn't work well.</p><p>The test is prepared by burning a provided waveform to a CD-R, cutting some divots in the data surface, then carefully coloring part of it with black marker. That's it – guaranteed unrecoverable errors in a deterministic pattern.</p><p>I ran the test on all of the drives, obtaining two interesting results:</p><div style=clear:both></div><div style="float:left;margin:0 2em 2em 0"><img src=/🤔/why-i-ripped-the-same-cd-300-times/dae-liteon.png style=max-width:350px></div><p>The Lite-On drive here is what I used to get past the sync error. It happily chews through the magic marker, but gets <i>really</i> confused by straight lines cut in the data surface. You can see how what should be three distinct peaks on the right get merged into one giant error blob.</p><pre>
Errors total Num : 206645159
Errors (Loudness) Num : 965075 - Avg : -21.7 dB(A) - Max : -5.5 dB(A)
Error Muting Num : 154153 - Avg : 99.1 Samples - Max : 3584 Samples
Skips Num : 103 - Avg : 417.3 Samples - Max : 2939 Samples
Total Test Result : 45.3 points (of 100.0 maximum)
</pre><div style=clear:both></div><div style="float:left;margin:0 2em 2em 0"><img src=/🤔/why-i-ripped-the-same-cd-300-times/dae-pioneer.png style=max-width:350px></div><p>The Pioneer drive scored the highest on the DAE test. To my eye the chart doesn't look like anything special, but the analysis tool judged it the best error-correction firmware in my little fleet.</p><pre>
Errors total Num : 2331952
Errors (Loudness) Num : 147286 - Avg : -77.2 dB(A) - Max : -13.2 dB(A)
Error Muting Num : 8468 - Avg : 1.5 Samples - Max : 273 Samples
Skips Num : 50 - Avg : 6.5 Samples - Max : 30 Samples
Total Test Result : 62.7 points (of 100.0 maximum)
</pre></blog-section><blog-section><h2 slot=title>“At some point numbers do count”</h2><div style="max-width:350px;float:right;margin:0 2em"><div><a href="/🤔/why-i-ripped-the-same-cd-300-times/Screenshot from 2018-07-30 16-03-18.png"><img src="/🤔/why-i-ripped-the-same-cd-300-times/Screenshot from 2018-07-30 16-03-18.png" style=max-width:350px></a></div><p style=margin-top:0><i>Corrected and uncorrectable errors per rip count (Pioneer)</i></p></div><p>How can I use the Pioneer's good innate error handing when EAC's "secure mode" works by bypassing a drive's error logic? That's easy, switch EAC to "burst mode" and let it write the bits to disk just as the firmware reported them. How can we turn that heap of unchecked wavs into a file of "secure mode" quality? The same error analysis tooling built for the Lite-On rips!</p><p>A few EAC config tweaks and another hundred rips later, we get this beautiful chart. A few things to note:</p><ul><li>The uncorrectable bit errors quickly approach zero, but never quite get there.</li><li>There's a huge jump in corrected errors in the 53rd or 54th rip.</li><li>The error counts before and after that big jump have some flat areas, indicating areas of stability in the ripped data.</li></ul></blog-section><blog-section><h2 slot=title>0xA595BC09</h2><p>Using the nearly-perfect correction data from the Pioneer, I generated a "best guess" file and started comparing it to the Pioneer rips. As expected there were some bad outliers, which I fixed by ripping ten more times:</p><blog-code syntax=commands><pre>
for RIP_ID in $(seq -w 1 100); do echo -n "rip$RIP_ID: "; cmp -l analysis-out.wav rips-cd1-pioneer/rip${RIP_ID}/*.wav | wc -l ; done | sort -rgk2 | head -n 10
# rip054: 2865
# rip099: 974
# rip007: 533
# rip037: 452
# rip042: 438
# rip035: 404
# rip006: 392
# rip059: 381
# rip043: 327
# rip014: 323
</pre></blog-code><p>I also found something really interesting, a handful of rips had come out with <i>exactly</i> the same audio content! Remember that this is what the EAC "secure mode" is designed to test for as a success criteria. That <code>shncat -q -e | rhash --print="%C"</code> snippet is used to calculate the CRC32 checksum of the raw audio data, and it's what EAC uses.</p><blog-code syntax=commands><pre>
for wav in rips-cd1-pioneer/*/*.wav; do shncat "$wav" -q -e | rhash --printf="%C $wav\n" - ; done | sort -k1
# [...]
# 9DD05FFF rips-cd1-pioneer/rip059/rip.wav
# 9F8D1B53 rips-cd1-pioneer/rip072/rip.wav
# A2EA0283 rips-cd1-pioneer/rip082/rip.wav
# A595BC09 rips-cd1-pioneer/rip021/rip.wav
# A595BC09 rips-cd1-pioneer/rip022/rip.wav
# A595BC09 rips-cd1-pioneer/rip023/rip.wav
# A595BC09 rips-cd1-pioneer/rip024/rip.wav
# A595BC09 rips-cd1-pioneer/rip025/rip.wav
# A595BC09 rips-cd1-pioneer/rip026/rip.wav
# A595BC09 rips-cd1-pioneer/rip027/rip.wav
# A595BC09 rips-cd1-pioneer/rip028/rip.wav
# A595BC09 rips-cd1-pioneer/rip030/rip.wav
# A595BC09 rips-cd1-pioneer/rip031/rip.wav
# A595BC09 rips-cd1-pioneer/rip040/rip.wav
# A595BC09 rips-cd1-pioneer/rip055/rip.wav
# A595BC09 rips-cd1-pioneer/rip058/rip.wav
# AA3B5929 rips-cd1-pioneer/rip043/rip.wav
# ABAAE784 rips-cd1-pioneer/rip033/rip.wav
# [...]
</pre></blog-code><p>Setting that aside for now, re-ripping the outliers let the analysis complete with zero uncorrectable errors. And when I checked that file, it had the same audio content as the "common" rip!</p><p>I am 99% confident that I have correctly digitised this troublesome CD, with 0xA595BC09 being the correct CRC of track #3.</p></blog-section><blog-section><h2 slot=title>UPDATE: A Perfect Rip</h2><p>After a Hacker News comment pointed me in the right direction (see <a href=/🤔/error-beneath-the-wavs>Error Beneath the WAVs</a>), I bought another CD drive (this one's drive #6) that was reported to have better handling of this particular problem. It was able to successfully rip the original disc with no issues.</p><p>At last, victory!</p><img src="/🤔/why-i-ripped-the-same-cd-300-times/Screenshot from 2018-08-10 22-31-33.png"><pre>
Track 3
Filename C:\Archive\アルトノイラント - 帰るべき城 [ANL-001]\03 - The End of Theocratic Era.wav
Pre-gap length 0:00:02.00
Peak level 100.0 %
Extraction speed 8.2 X
Track quality 100.0 %
Test CRC A595BC09
Copy CRC A595BC09
Accurately ripped (confidence 1) [84B9DD1A] (AR v2)
Copy OK
</pre></blog-section><blog-section><h2 slot=title>Appendix A: compare.rs</h2><p>This is the tool I used to calculate suspected byte errors. It wasn't intended to live long so it's a bit ugly, but may be of interest to anyone who stumbles across this page with the same goal.</p><blog-code syntax=rust><pre>
extern crate memmap;
use std::cmp;
use std::collections::HashMap;
use std::env;
use std::fs;
use std::sync;
use std::sync::mpsc;
use std::thread;
use memmap::Mmap;
const CHUNK_SIZE: usize = 1 << 20;
fn suspect_positions(
mmaps: &HashMap<String, Mmap>,
start_idx: usize,
end_idx: usize,
) -> Vec<usize> {
let mut positions = Vec::new();
for ii in start_idx..end_idx {
let mut first = true;
let mut byte: u8 = 0;
for (_file_name, file_content) in mmaps {
if first {
byte = file_content[ii];
first = false;
}
else if byte != file_content[ii] {
positions.push(ii);
break;
}
}
}
positions
}
fn main() {
let mut args: Vec<String> = env::args().collect();
args.remove(0);
let mut first = true;
let mut size: usize = 0;
let mut files: Vec<fs::File> = Vec::new();
let mut mmaps: HashMap<String, Mmap> = HashMap::new();
for filename in args {
let mut file = fs::File::open(&filename).unwrap();
files.push(file);
let mmap = unsafe { Mmap::map(files.last().unwrap()).unwrap() };
if first {
first = false;
size = mmap.len();
} else {
assert!(size == mmap.len());
}
mmaps.insert(filename, mmap);
}
let (suspects_tx, suspects_rx) = mpsc::channel();
let mut start_idx = 0;
let mmaps_ref = sync::Arc::new(mmaps);
loop {
let t_start_idx = start_idx;
let t_end_idx = cmp::min(start_idx + CHUNK_SIZE, size);
if start_idx == t_end_idx {
break;
}
let mmaps_ref = mmaps_ref.clone();
let suspects_tx = suspects_tx.clone();
thread::spawn(move || {
let suspects = suspect_positions(mmaps_ref.as_ref(), t_start_idx, t_end_idx);
suspects_tx.send(suspects).unwrap();
});
start_idx = t_end_idx;
}
drop(suspects_tx);
let mut suspects: Vec<usize> = Vec::with_capacity(size);
for mut suspects_chunk in suspects_rx {
suspects.append(&mut suspects_chunk);
}
suspects.sort();
println!("{{\"files\": [");
let mut first_file = true;
for (file_name, file_content) in mmaps_ref.iter() {
let file_comma = if first_file { "" } else { "," };
first_file = false;
println!("{}{{\"name\": \"{}\", \"suspect_bytes\": [", file_comma, file_name);
for (ii, position) in suspects.iter().enumerate() {
let comma = if ii == suspects.len() - 1 { "" } else { "," };
println!("[{}, {}]{}", position, file_content[*position], comma);
}
println!("]}}");
}
println!("]}}");
}
</pre></blog-code></blog-section><blog-footnotes><hr><ol><li id=fn:1><p>I've had a couple people ask "what's so special about track #3 that you would go through this?". Funny thing – track #3 was the only non-piano track on this album, and I actually don't like it very much! I will probably not listen to it often, or ever! This effort was not about the music itself, but about making a perfect copy of an ephemeral physical artifact.</p></li><li id=fn:2><p><s>That single AccurateRip entry for this album matched my CRCs for all tracks <i>except</i> track #3 – they had 0x84B9DD1A, vs my result of 0xA595BC09. I suspect that original ripper didn't realize their disk was bad.</s><ul><li>My original footnote here was wrong. The AccurateRip report is correct (and matches my result) – I didn't realize AccurateRip uses a different checksum than EAC. The perfect rip reports an AccurateRip match on all tracks.</li></ul></p></li><li id=fn:3><p>The obvious question when buying a CD- or DVD-ROM drive, here in the year in 2018, is "lol where?". And I didn't want just one, I wanted <i>several</i>, from <i>different brands</i>. There is only one bricks-and-mortar store I know of that would have an inventory of 5.25" DVD drives. Only one that's big enough to spare the shelf space but crufty enough that they wouldn't be out of place. I speak, of course, of Frys Electronics.</p></li></ol></blog-footnotes></blog-article>2018-07-30T23:54:21ZEffective gRPC2018-07-02T05:04:40Zurn:uuid:dcfa7893-97cf-4c18-b594-1e232ad48b70<blog-article posted=2018-07-02T05:04:40Z><h1 slot=title>Effective gRPC</h1><p slot=summary>This page documents habits and styles I've found useful when working with <a href=https://grpc.io/>gRPC</a> and <a href=https://en.wikipedia.org/wiki/Protocol_Buffers>Protocol Buffers</a>.</p><blog-section><h2 slot=title>gRPC</h2><blog-section><h3 slot=title>Error Reporting</h3><p>Use the <code>google.protobuf.Status</code> message to report errors back to clients – this type should be special-cased by the gRPC library for your language (e.g. grpc-go has <a href=https://godoc.org/google.golang.org/grpc/status><code>"google.golang.org/grpc/status"</code></a>. This message can contain arbitrary sub-messages, so servers can offer basic error messages to all clients and structured errors to clients that can handle them.</p><p>See <a href=https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto><code>google/rpc/code.proto</code></a> for details on the meaning of each error code, and the <a href=https://cloud.google.com/apis/design/errors>Google Cloud Error Model</a> for good advice on how to write error messages.</p></blog-section><blog-section><h3 slot=title>Deadlines and Timeouts</h3><p>Server-side handlers should always propagate deadlines. Clients should almost always set deadlines. Prefer deadlines to timeouts, because the meaning of an absolute timestamp is less ambiguous than a relative time when working across a network boundary.</p><p>Depending on your implementation library, it may be possible to define default timeouts in the service schema. Don't do this – the schema author cannot predict what behavior will be appropriate for all implementations or users.</p></blog-section><blog-section><h3 slot=title>Addresses</h3><p>Always represent and store gRPC addresses as a full string, following the URL-like syntax used by <a href=https://github.com/grpc/grpc/blob/master/doc/naming.md>gRPC Name Resolution</a>. Restrictive formats like "IP+port tuple" will annoy users who want to run your code as part of a larger framework or integration test, which may have its own ideas about network addresses.</p><p>Let addresses be set in a command-line flag or config file, so users can configure them without having to patch your binary. Do this even if you're really really sure the entire world wants to run your service on port 80.</p></blog-section><blog-section><h3 slot=title>Streaming</h3><p>gRPC supports uni-directional and bi-directional message streams. Use streams if the amount of data being transferred is potentially large, or if the other side can meaningfully process data before the input has been fully received. For example, a service offering a SHA256 method could hash the input chunks as they arrive, then send back the final digest when the client closes the request stream.</p><p>Streaming is more efficient than sending a separate RPC for each chunk, but less efficient than a single RPC with all chunks in a repeated field. The overhead of streaming can be minimized by using a batched message type.</p><blog-code syntax=protobuf><pre>
service Foo {
rpc MyStream(FooRequest) returns (stream MyStreamItem);
}
message MyStreamItem {
repeated MyStreamValue values = 1;
}
message MyStreamValue {
// ... fields for each logical value
}
</pre></blog-code><p><b>WARNING:</b> In some implementations (e.g. grpc-go), the stream handles are not thread-safe even if the client stub is. Interacting with a stream handle from multiple threads may cause unpredictable behavior, including silent message corruption.</p></blog-section><blog-section><h3 slot=title>Request / Response Types</h3><p>Each method in your service should have its own Request and Response messages.</p><blog-code syntax=protobuf><pre>
service Foo {
rpc Bar(BarRequest) returns (BarResponse);
}
message BarRequest { ... }
message BarResponse { ... }
</pre></blog-code><p>Don't use the same message for multiple methods unless they're literally implementing the same method with a different API (e.g. unary and streaming variants accepting the same response). Even then, prefer a different type for the part of the API that may vary.</p><blog-code syntax=protobuf><pre>
service Foo {
rpc Bar(BarRequest) returns (BarResponse);
rpc BarStream(BarRequest) returns (stream BarResponseStreamItem);
}
message BarRequest { ... }
message BarResponse { ... }
message BarResponseStreamItem { ... }
</pre></blog-code><p><b>WARNING:</b> Do not use <code>google.protobuf.Empty</code> as a request or response type. The API documentation in <a href=https://github.com/google/protobuf/blob/master/src/google/protobuf/empty.proto><code>google/protobuf/empty.proto</code></a> is an anti-pattern. If you use Empty, then adding fields to your request/response will be a breaking API change for all clients and servers.</p></blog-section></blog-section><blog-section><h2 slot=title>Protobuf</h2><blog-section><h3 slot=title>Package Names</h3><p>Use a package name including your project name, company (if applicable), and <a href=https://semver.org/>Semantic Versioning</a> major version. The exact format depends on personal taste – popular formats include <a href=https://en.wikipedia.org/wiki/Reverse_domain_name_notation>reverse domain name notation</a> as used in Java, or <code>$COMPANY.$PROJECT</code> as used by core gRPC types.</p><ul><li><code>com.mycompany.my_project.v1</code></li><li><code>com.mycompany.MyProject.v1</code></li><li><code>mycompany.my_project.v1</code></li></ul><p>API versions that are not fully stabilized should have a version suffix like <code>v1alpha</code>, <code>v2beta1</code>, or <code>v3test</code> – see the <a href=https://kubernetes.io/docs/concepts/overview/kubernetes-api/#api-versioning>Kubernetes API versioning policy</a> for more thorough guidance.</p><p>Protobuf package names are used in generated code, so try to avoid name components that are commonly used for built-in types or keywords (like <code>return</code> or <code>void</code>). This is especially important for generating C++, which (as of protobuf 3.6) does not have a <code>FileOption</code> to override the default <code>namespace</code> name calculation.</p></blog-section><blog-section><h3 slot=title>Import Paths</h3><p>Try to structure your proto file's on-disk layout so that <code>import</code> paths match the package name: types in <code>mycompany.my_project.v1</code> should be imported with <code>import "mycompany/my_project/v1/some_file.proto"</code>. This is not required by the Protobuf toolchain, but does help humans remember what to type.</p><p>Note that if you're using Bazel's built-in <code>proto_library()</code> rule, it doesn't currently support adjusting the import paths (<a href=https://github.com/bazelbuild/bazel/issues/3867>bazelbuild/bazel#3867</a>). Until that feature is implemented, you'll need to either write your own <code>proto_library</code> in Starlark, or simply put the .proto sources in the desired directory structure.</p></blog-section><blog-section><h3 slot=title>Next-Number Comments</h3><p>In large protobuf messages, it can be annoying to figure out which field number should be used for new fields. To simplify the life of future editors, add a comment at the end of your messages and enums.</p><blog-code syntax=protobuf><pre>
message MyMessage {
// ... lots of fields here ...
// NEXT: 42
}
</pre></blog-code></blog-section><blog-section><h3 slot=title>Enums</h3><p>Enum symbol scoping follows old-style C/C++ rules, so that the defined names are not scoped to the enum name:</p><blog-code syntax=protobuf><pre>
// symbol `FUN_LEVEL_HIGH' is of type `FunLevel'.
enum FunLevel {
FUN_LEVEL_UNKNOWN = 0;
FUN_LEVEL_LOW = 1;
FUN_LEVEL_HIGH = 2;
// NEXT: 3
}
</pre></blog-code><p>This can be awkward for users accustomed to languages with more modern scoping rules. I like to wrap the enum in a message:</p><blog-code syntax=protobuf><pre>
// symbol `FunLevel::HIGH` is of type `FunLevel::Enum`.
message FunLevel {
enum Enum {
UNKNOWN = 0;
LOW = 1;
HIGH = 2;
// NEXT: 3
}
}
</pre></blog-code></blog-section><blog-section><h3 slot=title>Tombstones</h3><p>If a field has been deleted, its field number must not be reused by future field additions<blog-footnote-ref>[<a href=#fn:1>1</a>]</blog-footnote-ref>. Prevent accidental field number reuse by adding tombstones with the <a href=https://developers.google.com/protocol-buffers/docs/proto#enum_reserved><code>reserved</code> keyword</a>. I always reserve both the field name and number.</p><blog-code syntax=protobuf><pre>
enum FunLevel {
// removed -- too much fun
reserved "FUN_LEVEL_EXCESSIVE"; reserved 10;
}
message MyMessage {
reserved "crufty_old_field"; reserved 20;
}
</pre></blog-code></blog-section><blog-section><h3 slot=title>Documentation</h3><p>Protobuf doesn't have a built-in generator for API documentation. Of the available options, <a href=https://github.com/pseudomuto/protoc-gen-doc><code>protoc-gen-doc</code></a> seems the most mature. See the <a href=https://github.com/pseudomuto/protoc-gen-doc/blob/master/README.md><code>protoc-gen-doc</code> README</a> for syntax and examples.</p></blog-section><blog-section><h3 slot=title>Validation</h3><p>Protobuf doesn't have a built-in validation mechanism, other than the <code>required</code> in proto2 (removed in proto3). Lyft's <a href=https://github.com/lyft/protoc-gen-validate><code>protoc-gen-validate</code></a>tool is the best solution I know of for this, though it's in early alpha and currently only supports Go.</p></blog-section><blog-section><h3 slot=title>Optional Scalar Types</h3><p>In proto3, the ability to mark scalar fields (<code>int32</code>, <code>string</code>, etc) as optional was removed. Scalar fields are now always present, and will be a default "zero value" if not otherwise set. This can be frustrating when designing a schema for a system where <code>""</code> and <code>NULL</code> are logically distinct values.</p><p>The official workaround is a set of "wrapper types", defined in <a href=https://github.com/google/protobuf/blob/master/src/google/protobuf/wrappers.proto><code>google/protobuf/wrappers.proto</code></a>, that define single-valued messages. Your schema can use <code>.google.protobuf.Int32Value</code> instead of <code>int32</code> to get optionality.</p><blog-code syntax=protobuf><pre>
import "google/protobuf/wrappers.proto";
message MyMessage {
.google.protobuf.Int32Value some_field = 1;
}
</pre></blog-code><p>Another approach is to wrap the scalar field in <a href=https://developers.google.com/protocol-buffers/docs/proto3#oneof><code>oneof</code></a>, with no other choices. This forces even scalar fields to have optionality, and adds helper methods in generated code to detect if the field was set.</p><blog-code syntax=protobuf><pre>
message MyMessage {
oneof oneof_some_field {
int32 some_field = 1;
}
}
</pre></blog-code></blog-section></blog-section><blog-footnotes slot=footnotes><hr><ol><li id=fn:1><p>For a motivational lesson in reuse of field identifiers, see <a href=https://www.sec.gov/litigation/admin/2013/34-70694.pdf>SEC administrative proceeding 3-15570 against Knight Capital</a> regarding loss of $460 million USD in 45 minutes.</p></li></ol></blog-footnotes></blog-article>2018-07-02T05:04:40ZBazel School: Toolchains2018-05-25T14:34:27Zurn:uuid:af679ffe-285a-4dfe-a647-1b8fef9d5845<blog-article posted=2018-05-25T14:34:27Z><h1 slot=title>Bazel School: Toolchains</h1><div slot=summary><p>I've recently been using <a href=https://www.bazel.build/>Bazel</a> as a multi-platform distributed build system. Bazel itself supports this pretty well, but many of the user-contributed extension libraries don't make good use of Bazel's toolchains and therefore break when multiple OSes are involved in a build. I hope the situation can be improved by documenting nascent best practices.</p><p>This page is a bit advanced. It assumes background knowledge in <a href=https://en.wikipedia.org/wiki/Cross_compiler>cross compilation</a>, plus experience with Bazel's <a href=https://docs.bazel.build/versions/master/skylark/language.html>Starlark extension language</a>, <a href=https://docs.bazel.build/versions/master/skylark/rules.html>build rules</a>, and <a href=https://docs.bazel.build/versions/master/skylark/repository_rules.html>repository definitions </a>. Most users of Bazel shouldn't need to care about the details of compiler toolchains, but this is important stuff for maintainers of language rules.</p></div><blog-section><h2 slot=title>Constraints</h2><p>Bazel's package/toolchain design is based on <i>constraints</i>, which are simple text key/value pairs. Keys are defined <a href=https://docs.bazel.build/versions/master/be/platform.html#constraint_setting><code>constraint_setting</code></a>, and values by <a href=https://docs.bazel.build/versions/master/be/platform.html#constraint_value><code>constraint_value</code></a>. Settings and values are true targets, which means they're addressed by label, obey visibility, and can be aliased.</p><p>A couple basic constraints come predefined in <a href=https://github.com/bazelbuild/bazel/blob/0.13.0/tools/platforms/BUILD><code>@bazel_tools//platforms</code></a>:</p><blog-code><pre>
@bazel_tools//platforms:cpu
@bazel_tools//platforms:arm
@bazel_tools//platforms:ppc
@bazel_tools//platforms:s390x
@bazel_tools//platforms:x86_32
@bazel_tools//platforms:x86_64
@bazel_tools//platforms:os
@bazel_tools//platforms:freebsd
@bazel_tools//platforms:linux
@bazel_tools//platforms:osx
@bazel_tools//platforms:windows
</pre></blog-code><p>Note the limited selection and lack of precision. These definitions are (as of Bazel 0.13) useful only for getting started. Most language rules will want to define their own – see <a href=https://github.com/bazelbuild/rules_go/blob/0.12.0/go/toolchain/toolchains.bzl><code>@io_bazel_rules_go//go/toolchain:toolchains.bzl</code></a> for an example of custom values for the built-in settings.</p></blog-section><blog-section><h2 slot=title>Platforms</h2><p>Upstream docs:</p><ul><li><a href=https://docs.bazel.build/versions/master/platforms.html>https://docs.bazel.build/versions/master/platforms.html</a></li><li><a href=https://docs.bazel.build/versions/master/be/platform.html#platform>https://docs.bazel.build/versions/master/be/platform.html#platform</a></li></ul><p>A platform is a named set of constraint values (see above), plus some other metadata that I'm going to skip because it's part of the not-fully-implemented remote execution API. They can contain any number of constraint values, but at most one constraint value per constraint setting (i.e. you can't have a platform with two CPU types). Be specific – Autoconf's "<a href=https://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/autoconf-2.69/html_node/Specifying-Target-Triplets.html>GNU Triplets </a>" are a good model to imitate here.</p><blog-code syntax=python><pre>
# platforms/BUILD
platform(
name = "x86_64-apple-darwin",
constraint_values = [
"@bazel_tools//platforms:osx",
"@bazel_tools//platforms:x86_64",
],
)
platform(
name = "i686-linux-gnu",
constraint_values = [
"@bazel_tools//platforms:linux",
"@bazel_tools//platforms:x86_32",
],
)
</pre></blog-code><p>The rest of this page will use the standard platform definitions built into Bazel. Custom platforms are if you need to constrain on other dimensions, such as CPU vendor or libc version.</p></blog-section><blog-section><h2 slot=title>Defining Toolchains</h2><p>Upstream docs:</p><ul><li><a href=https://docs.bazel.build/versions/master/toolchains.html>https://docs.bazel.build/versions/master/toolchains.html</a></li><li><a href=https://docs.bazel.build/versions/master/be/platform.html#toolchain>https://docs.bazel.build/versions/master/be/platform.html#toolchain</a></li></ul><p>To work with cross-compilation, toolchains themselves need to be (1) capable of generating non-native output binaries and (2) must define their Bazel constraints.</p><blog-section><h3 slot=title>Toolchain Types</h3><p>Each category of toolchain is identified by a <i>toolchain type</i>, which is a string in the format of a build label. There is no requirement that the value actually match any defined label. I recommend using a <code>@</code>-prefixed toolchain type, to avoid potential conflicts in workspaces with multiple language rules loaded.</p></blog-section><blog-section><h3 slot=title>ToolchainInfo</h3><p>The <a href=https://docs.bazel.build/versions/master/skylark/lib/platform_common.html#ToolchainInfo><code>ToolchainInfo</code></a> provider is how your rules store toolchain configuration to Bazel. There's no special requirements about the values you can put in, so feel free to use whatever makes sense for your language.</p><p>Skylark doesn't have a public/private distinction for struct attributes, so a convention of underscore-prefixed attribute names is borrowed from Python. It's easy for rule implementations to get access to the <code>ToolchainInfo</code> for any registered toolchain, so be clear in your docs which attributes are part of your public API.</p><p>First you define a rule type for your toolchain info:</p><blog-code syntax=python><pre>
# demo_toolchain.bzl
DEMO_TOOLCHAIN = "@rules_demo//:demo_toolchain_type"
def _demo_toolchain_info(ctx):
return [
platform_common.ToolchainInfo(
compiler = ctx.attr.compiler,
cflags = ctx.attr.cflags,
),
]
demo_toolchain_info = rule(
_demo_toolchain_info,
attrs = {
"_compiler": attr.label(
executable = True,
default = "//:demo_compiler"
cfg = "host",
),
"cflags": attr.string_list(),
},
)
</pre></blog-code><p>Then use it to create toolchain info targets, one for each unique configuration you might want to build with:</p><blog-code syntax=python><pre>
# BUILD
load(":demo_toolchain.bzl", "DEMO_TOOLCHAIN", "demo_toolchain_info")
demo_toolchain_info(
name = "demo_toolchain_info/i686-linux-gnu",
cflags = ["--target-os=linux", "--target-arch=i686"],
)
demo_toolchain_info(
name = "demo_toolchain_info/x86_64-linux-gnu",
cflags = ["--target-os=linux", "--target-arch=amd64"],
)
</pre></blog-code></blog-section><blog-section><h3 slot=title>Registration</h3><p>Once you've got your <code>ToolchainInfo</code> rules defined, the next step is to register them. This is where the info is associated with the toolchain type and the constraint values so Bazel can auto-detect which toolchains are usable on a particular platform.</p><blog-code syntax=python><pre>
# BUILD
toolchain(
name = "demo_toolchain_linux_x86_32",
exec_compatible_with = [
"@bazel_tools//platforms:linux",
"@bazel_tools//platforms:x86_32",
],
target_compatible_with = [
"@bazel_tools//platforms:linux",
"@bazel_tools//platforms:x86_32",
],
toolchain = ":demo_toolchain_info/i686-linux-gnu",
toolchain_type = DEMO_TOOLCHAIN,
)
toolchain(
name = "demo_toolchain_linux_x86_64",
# [...]
)
</pre></blog-code><p>Finally, the toolchains Bazel can use are passed to <a href=https://docs.bazel.build/versions/master/skylark/lib/globals.html#register_toolchains><code>register_toolchains</code></a> in your <code>WORKSPACE</code>. Usually this is done in a helper macro defined in the language rules, so that both the <code>toolchain()</code> rules and <code>register_toolchains(...)</code> args can be generated by the same logic.</p><blog-code syntax=python><pre>
# WORKSPACE
register_toolchains(
"//:demo_toolchain_linux_x86_32",
"//:demo_toolchain_linux_x86_64",
)
</pre></blog-code></blog-section></blog-section><blog-section><h2 slot=title>Using Toolchains</h2><p>Rules can say which type toolchains they depend on, like "needs a C++ compiler". When defining the rule, set the <code>toolchains</code> param to all the toolchain types that will be needed to run the action. Then within the implementation, fetch the <code>ToolchainInfo</code> values (the same ones defined in the toolchain info rule) and inspect the content to implement your build.</p><blog-code syntax=python><pre>
# rules.bzl
def _demo_rule(ctx):
tc = ctx.toolchains[DEMO_TOOLCHAIN]
print("toolchain: %s %r" % (tc.compiler, tc.cflags))
demo_rule = rule(
_demo_rule,
toolchains = [DEMO_TOOLCHAIN],
)
</pre></blog-code></blog-section><blog-section><h2 slot=title>Cross-Compilation</h2><p>Toolchains can have different <code>exec_compatible_with</code> and <code>target_compatible_with</code> attrs. The execution compatibility is used for the platform that runs builds (i.e. the worker), and the target compatibility is the types that the toolchain can output.</p><p>Here's the definition of a cross-compiling toolchain that runs on 32-bit Linux but generates output for 64-bit Linux:</p><blog-code syntax=python><pre>
# BUILD
load(":demo_toolchain.bzl", "demo_toolchain_info")
demo_toolchain_info(
name = "demo_toolchain_info_linux_x86_32_cross64",
compiler = "@demo_prebuilt_compiler_linux_x86_32//:demo_compiler",
cflags = ["--target-os=linux", "--target-arch=amd64"],
)
toolchain(
name = "demo_toolchain_linux_x86_32_cross64",
exec_compatible_with = [
"@bazel_tools//platforms:linux",
"@bazel_tools//platforms:x86_32",
],
target_compatible_with = [
"@bazel_tools//platforms:linux",
"@bazel_tools//platforms:x86_64",
],
toolchain = ":demo_toolchain_info_linux_x86_32_cross64",
toolchain_type = DEMO_TOOLCHAIN,
)
</pre></blog-code><blog-section><h3 slot=title>Platform Selection Flags</h3><p>Bazel (as of 0.13) has two flags to override the platform selection, which are useful when the execution platform is custom-defined or different in some important way from the machine running Bazel. The most common reason is if you're building with remote workers.</p><ul><li>The <code>--platforms</code> flag specifies which platforms the final compiled binaries will run on. This flag can accept multiple platforms, in which case Bazel may generate multiple outputs for a build artifact.</li><li>The <code>--host_platform</code> flag overrides which platform is used for executing build commands. I'm hopeful that this flag could be split into <code>--host_platform</code> and <code>--remote_platform</code> in future versions of Bazel, so that some actions can be run locally even if the distributed build pool is different from the local workstation.</li></ul><p>There's also the <code>--cpu</code> and <code>--host_cpu</code> flags, which (if I understand correctly) are deprecated and exist only because the built-in C++ rules haven't been migrated to the toolchains system yet.</p></blog-section></blog-section><blog-section><h2 slot=title>Prebuilt Toolchains</h2><p>Compiler toolchains are often large, and take a while to build. Downloading prebuilt toolchains can materially improve your users' experience, but there's some extra details to be aware of:</p><ul><li>Do not use <code>uname</code>, inspection of <code>/proc</code>, or similar unsandboxed mechanisms to discover the execution platform. These interfere with user's customizations of the build environment, and can cause incorrect behavior when the execution platform is different from where the user is running Bazel.</li><li>If the toolchain is downloaded by a custom repository rule, put it in its own <code>.bzl</code> file. Repository rules are invalidated by any changes to the <code>.bzl</code> file they're defined in, and you don't want small changes to toolchains to force a re-download of large toolchain archives.</li></ul></blog-section></blog-article>2018-05-25T14:34:27ZMojibake in Surugaya Javascript2018-03-24T08:51:05Zurn:uuid:32dc7f3d-be60-4d31-8e82-e197c844c4a8<style>img{max-width:100%}</style><blog-article posted=2018-03-24T08:51:05Z><h1 slot=title>Mojibake in Surugaya Javascript</h1><div slot=summary><p>Yesterday I bought some used CDs from the online store <a href=https://www.suruga-ya.jp/>Surugaya</a>. The checkout process was broken in an interesting way: when I clicked the payment method confirmation button, nothing happened. I switched from Chrome to Firefox and was able to place an order successfully<blog-footnote-ref>[<a href=#fn:1>1</a>]</blog-footnote-ref>.</p><img src=/🤔/case-report-surugaya-mojibake/debugging.jpg alt="Oh boy, here I go debugging again" style=max-width:300px></div><blog-section><h2 slot=title>The bug</h2><p>A quick look in the web console showed some errors in loading the page's required Javascript:</p><img src=/🤔/case-report-surugaya-mojibake/checkout-page-console.png alt="jquery-supertextconverter-plugin.js:60: Uncaught SyntaxError: Invalid or unexpected token"><p>Indeed, line 60 of the script was obviously invalid:</p><blog-code syntax=javascript><pre>
/*! jQuery Super Text Converter 2014-03-03
* Vertion : 1.0.3
* Dependencies : jQuery *
* Author : MegazalRock (Otto Kamiya)
* Copyright (c) 2014 MegazalRock (Otto Kamiya);
* License : */
[...]
58 },{
59 zenkaku : /¥/g,
60 hankaku : '\'
61 },{
</pre></blog-code></blog-section><blog-section><h2 slot=title>Root cause analysis</h2><p>How did this happen? There are two important clues:</p><ul><li>First, Japanese editions of Windows use the Yen sign to render U+005C, instead of the backslash. This is backwards-compatibility behavior from pre-Unicode days when all characters needed to fit in a single byte – the <a href=https://en.wikipedia.org/wiki/JIS_X_0201>JIS X 0201</a> character set used 0x5C for the Yen sign, and so Japanese editions of DOS use ¥ for the directory separator. Even after Windows gained Unicode support, it still renders ¥ instead of \ when running in a Japanese locale.</li><li>Second, if the Surugaya version is compared with <a href=https://github.com/megazalrock/jquery-supertextconverter/blob/1.0.3/dist/jquery-supertextconverter-plugin.js>jquery-supertextconverter-plugin.js v1.0.3</a> we see two changes that look intentional, and several that look erroneous:</li></ul><blog-code syntax=diff><pre>
--- https://github.com/megazalrock/jquery-supertextconverter/blob/1.0.3/dist/jquery-supertextconverter-plugin.js
+++ https://www.suruga-ya.jp/js/jquery-supertextconverter-plugin.js
@@ -21,7 +21,7 @@
hyphen: true
},
zenkakuHyphen: 'ー',
- zenkakuChilda: '〜'
+ zenkakuChilda: '縲鰀'
}, options);
stc.regexp = {
hankaku : /[A-Za-z0-9#$%&\\()*+,.\/<>\[\]{}=@;:_\^`]/g,
@@ -57,16 +57,16 @@
type: 'space'
},{
zenkaku : /¥/g,
- hankaku : '¥'
+ hankaku : '\'
},{
- zenkaku : /[ー―‐−]/g,
+ zenkaku : /[ー―‐竏綻/g,
hankaku : '-',
type : 'hyphen'
},{
zenkaku : /|/g,
hankaku : '|'
},{
- zenkaku : /[~〜]/g,
+ zenkaku : /[~縲彎/g,
hankaku : '~',
type: 'tilda'
},{
@@ -99,7 +99,7 @@
zenkaku : ' ',
type: 'space'
},{
- hankaku : /[¥\\]/g,
+ hankaku : /[\\\]/g,
zenkaku : '¥'
},{
hankaku : /[\-ー]/g,
@@ -140,7 +140,7 @@
/ラ/g, /リ/g, /ル/g, /レ/g, /ロ/g,
/ワ/g, /ヲ/g, /ン/g,
/ァ/g, /ィ/g, /ゥ/g, /ェ/g, /ォ/g,
- /ャ/g, /ュ/g, /ョ/g,
+ /ャ/g, /ュ/g, /ョ/g, /ッ/g,
/゙/g, /゚/g, /。/g, /、/g
];
this.zenkakuKanaList = [
@@ -160,7 +160,7 @@
'ラ', 'リ', 'ル', 'レ', 'ロ',
'ワ', 'ヲ', 'ン',
'ァ', 'ィ', 'ゥ', 'ェ', 'ォ',
- 'ャ', 'ュ', 'ョ',
+ 'ャ', 'ュ', 'ョ', 'ッ',
'゛', '゜', '。', '、'
];
};
</pre></blog-code><p>This is a case of <a href=https://en.wikipedia.org/wiki/Mojibake><i>mojibake</i></a>!</p><blockquote><p>Mojibake (文字化け) (IPA: [mod͡ʑibake]; lit. "character transformation", from the Japanese 文字 (moji) "character" + 化け (bake, pronounced "bah-keh") "transform") is the garbled text that is the result of text being decoded using an unintended character encoding. The result is a systematic replacement of symbols with completely unrelated ones, often from a different writing system.</p></blockquote><p>What I think happened is someone wanted to add 「ッ」 to the replacement lists at the end, so they edited the source file with some basic text editor. The editor was running in a Japanese locale and interpreted the UTF-8 source as some other encoding, causing mojibake. When the new file was saved, the corruption was preserved.</p><blog-section><h3 slot=title>Identifying the mystery encoding</h3><p>Which encoding did the editor use? A web search for 「¥」 will obviously not find anything useful, so lets use one of the other replacements. <a href="https://www.google.com/search?q=%22%E7%B8%B2%E9%B0%80%22">https://www.google.com/search?q="縲鰀"</a> has some relevant results:</p><ul><li><a href=http://q.hatena.ne.jp/1247890339>「縲鰀」とはどういう意味ですか?</a> ("What is the meaning of 「縲鰀」?")</li><li><a href=http://rentan.org/blog/2012/02/05/wave-dash/>縲鰀の謎</a> ("The mystery of 縲鰀")</li></ul><p>These confirm other people have encountered this exact error before, but neither says which encoding is involved.</p><p>Note something interesting – two of the bad replacements consumed a trailing <code>]</code>. The unknown encoding must be variable-width.</p><p>We can construct a table of likely candidates:</p><table style=margin:auto><thead><tr><th>Character</th><th>Unicode</th><th>UTF-8</th><th>Shift JIS<blog-footnote-ref>[<a href=#fn:2>2</a>]</blog-footnote-ref></th><th>EUC-JP</th></tr></thead><tbody><style scoped>th,td{text-align:right;padding:0 1.5em}</style><tr><td style=text-align:left><code>'</code></td><td><code>U+0027</code></td><td><code>x27</code></td><td><code>x27</code></td><td><code>x27</code></td></tr><tr><td style=text-align:left><code>]</code></td><td><code>U+005D</code></td><td><code>x5D</code></td><td><code>x5D</code></td><td><code>x5D</code></td></tr><tr><td style=text-align:left><code>−</code></td><td><code>U+2212</code></td><td><code>xE2 x88 x92</code></td><td><code>x81 x7C</code></td><td><code>xA1 xDD</code></td></tr><tr><td style=text-align:left><code>〜</code></td><td><code>U+301C</code></td><td><code>xE3 x80 x9C</code></td><td><code>x81 x60</code></td><td><code>xA1 xC1</code></td></tr><tr><td style=text-align:left><code>彎</code></td><td><code>U+5F4E</code></td><td><code>xE5 xBD x8E</code></td><td><code>x9C x5D</code></td><td><code>xD7 xBE</code></td></tr><tr><td style=text-align:left><code>竏</code></td><td><code>U+7ACF</code></td><td><code>xE7 xAB x8F</code></td><td><code>xE2 x88</code></td><td><code>xE3 xE8</code></td></tr><tr><td style=text-align:left><code>綻</code></td><td><code>U+7DBB</code></td><td><code>xE7 xB6 xBB</code></td><td><code>x92 x5D</code></td><td><code>xC3 xBE</code></td></tr><tr><td style=text-align:left><code>縲</code></td><td><code>U+7E32</code></td><td><code>xE7 xB8 xB2</code></td><td><code>xE3 x80</code></td><td><code>xE5 xE0</code></td></tr><tr><td style=text-align:left><code>鰀</code></td><td><code>U+9C00</code></td><td><code>xE9 xB0 x80</code></td><td><code>xEF xCD</code></td><td><code>x8F xEB xA5</code></td></tr></tbody></table><p>That did it! We can see how some of the bytes match up:</p><ul><li><code>0x5D</code> shows up at the end of the Shift JIS encodings</li><li><code>0x9C</code> is at the end of <code>utf8("〜")</code> and the start of <code>shift_jis("彎")</code></li><li><code>utf("〜")</code> starts with <code>0xE3 0x80</code>, which is <code>shift_jis("縲")</code>.</li></ul><p>This file was encoded in UTF-8, but edited as Shift JIS. We can test this theory using Python:</p><blog-code><pre>
$ python
>>> print u"〜]".encode("utf8").decode("shift_jisx0213")
縲彎
>>> print u"−]".encode("utf8").decode("shift_jisx0213")
竏綻
>>> print u"〜".encode("utf8").decode("shift_jisx0213")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'shift_jisx0213' codec can't decode byte 0x9c in position 2: incomplete multibyte sequence
</pre></blog-code><p>Close, but not quite. Something else is going on. <code>shift_jis('縲鰀')</code> is <code>0xE3 0x80 0xEF 0xCD</code>, and <code>0xEF</code> doesn't show up anywhere else in the table. What if the editor was being <i>really</i> clever, and restarting the encoding autodetector each time it fails to decode a multi-byte sequence?</p><blog-code><pre>
>>> bytes = u"〜".encode('utf8')
>>> bytes += '\x00' # padding
>>> print bytes[0:2].decode('shift_jisx0213') + bytes[2:4].decode('utf-16-be')
縲鰀
</pre></blog-code><p>There it is. The unknown editor thought the best way to load a UTF-8 file was to parse it as a mix of Shift JIS and big-endian UTF-16.</p></blog-section><blog-section><h3 slot=title>Impact timeline</h3><p>How long has this online store been serving up invalid, payment-breaking Javascript on its checkout page?</p><blog-code><pre>
$ curl -v -o /dev/null https://www.suruga-ya.jp/js/jquery-supertextconverter-plugin.js 2>&1 | grep -E 'Date:|Last-Modified:'
< Date: Sat, 24 Mar 2018 07:46:01 GMT
< Last-Modified: Wed, 20 Jan 2016 07:38:19 GMT
</pre></blog-code><p>Over two years. Hmmm.</p></blog-section></blog-section><blog-section><h2 slot=title>Reporting to the webmaster</h2><p>Surugaya does not have a published email address, and their order confirmation mail helpfully notes that replies are not monitored. Their contact form is at <a href=https://www.suruga-ya.jp/toiawase>https://www.suruga-ya.jp/toiawase</a>. Since they're located in Japan and have no English text on their site, I figured mangled Japanglish would be more successful than English. Here's the best I could do with Google Translate and a dictionary:</p><blockquote><pre>
こんにちは、
「支払方法の選択」のページにjavascriptのエラーがありますから、Chromeの使いの顧客は購買できません。
エラーの写真: https://i.imgur.com/N9d0J08.png
このファイル: https://www.suruga-ya.jp/js/jquery-supertextconverter-plugin.js
},{
zenkaku : /¥/g,
hankaku : '\' <- これは悪い
},{
元のファイルは正しいかもしれないと思います。
https://github.com/megazalrock/jquery-supertextconverter/blob/master/src/superTextConverter.js#L60-L63
},{
zenkaku : /¥/g,
hankaku : '¥' <- これは良い
},{
僕の変な日本語はごめんあさい。返事なら、日本語も英語もいいです。
</pre></blockquote><p>Their contact form has a "reset" button right next to the submit button, and the message field gets cleared on navigate-back, so I got to type that up twice. いい練習ですね。</p><p>When I click the submit button, nothing happens. I switch to Firefox again and am able to submit their contact form.</p><blog-section><h3 slot=title>The second bug</h3><p>The contact form directs to a confirmation page, which notes that I'm about to submit an empty message. What?</p><img src=/🤔/case-report-surugaya-mojibake/dont-talk-to-us.png alt="contact form confirmation page"><p>I can see the POST values got sent over correctly, but the confirmation page thinks I tried to submit an empty message. It's not just a rendering problem either, the "confirm" button there just serves up a error about the missing fields. Whatever's happening seems to be server-side, and I have no visibility into it.</p></blog-section><blog-section><h3 slot=title>Another attempt at contact</h3><p>Looking at the source for the page, I notice it has <code><link rev="made" href="mailto:info@act-system.com"></code> in it. Maybe this "act system" is a web development firm responsible for the shop, and they will be able to fix the script?</p><img src=/🤔/case-report-surugaya-mojibake/act-system.png alt="ACTSYSTEM home page"><p>Looks like an SEO company rather than a web developer, and the last activity is from January 2016. Probably coincidental that their final blog post was written two days before the <code>Last-Modified</code> date on that broken script.</p></blog-section></blog-section><blog-section><h2 slot=title>What did we learn, Palmer?</h2><p>Text encoding is still hard.</p><p>After making changes to your website, consider diffing to make sure the delta is what you expected.</p><p>If you're going to ignore email in favor of a contact form, consider testing your contact form.</p><p>If your online store's sales funnel drops all users of the #1 most popular browser, you may be leaving money on the table from potential customers who don't know how to debug your Javascript.</p></blog-section><blog-footnotes><hr><ol><li id=fn:1><p>Via PayPal, obviously. I'm not about to type my credit card number into a site that behaves like this.</p></li><li id=fn:2><p><a href=https://en.wikipedia.org/wiki/Shift_JIS>Shift JIS</a> unified JIS X 0201 and <a href=https://en.wikipedia.org/wiki/JIS_X_0208>JIS X 0208</a> into a single character set.</p></li></ol></blog-footnotes></blog-article>2018-03-24T08:51:05ZUNIX Syscalls2022-08-02T02:50:36Zurn:uuid:1d18861b-38a9-48d2-991b-021315a32e2a<style type=text/css scoped>table{color:#000;background-color:#fff}th{background-color:lightgrey}</style><blog-article posted=2018-03-17T22:03:09Z updated=2022-08-02T02:50:36Z><h1 slot=title>UNIX Syscalls</h1><div slot=summary><p>On UNIX-like operating systems, userland processes invoke kernel procedures using the "syscall" feature. Each syscall is identified by a "syscall number" and has a short list of parameters, which both can vary between operating systems, hardware platforms, and configuration options.</p><p>Performing a syscall is usually done via a special assembly instruction, though some platforms use other mechanisms (e.g. a <a href=https://en.wikipedia.org/wiki/VDSO>vDSO</a>). This page is a catalog of how to invoke syscalls on different UNIX-like platforms.</p></div><blog-section id=int-0x80><h2 slot=title>int $0x80 (or int 80h)</h2><p><code>int $0x80</code> (also styled as <code>int 80h</code>) is the traditional syscall instruction on i386 UNIX-like platforms. It triggers a <a href=https://en.wikipedia.org/wiki/Interrupt>software interrupt</a> that transfers control to the kernel, which inspects its registers and stack to find the syscall number + parameters. It is obsolete since the mid 2000s for performance reasons, but can still be found in tutorials because it's easier to understand than more modern mechanisms.</p></blog-section><blog-section><h2 slot=title>Syscalls by OS</h2><p>(incomplete)</p><table><thead><tr><th>Name</th><th>Standard</th><th>Linux</th><th>Darwin</th><th>FreeBSD</th></tr></thead><tbody><tr><td>access</td><td><a href=http://pubs.opengroup.org/onlinepubs/9699919799/functions/access.html>POSIX</a></td><td><a href=http://man7.org/linux/man-pages/man2/access.2.html>access(2)</a></td><td><a href=https://www.unix.com/man-page/osx/2/access/>access(2)</a></td><td><a href="https://www.freebsd.org/cgi/man.cgi?query=access&sektion=2">access(2)</a></td></tr><tr><td>creat</td><td><a href=http://pubs.opengroup.org/onlinepubs/9699919799/functions/creat.html>POSIX</a></td><td><a href=http://man7.org/linux/man-pages/man2/creat.2.html>creat(2)</a></td><td><a href=https://www.unix.com/man-page/osx/2/creat/>creat(2)</a></td><td><a href="https://www.freebsd.org/cgi/man.cgi?query=creat&sektion=2">creat(2)</a></td></tr><tr><td>exchangedata</td><td></td><td></td><td><a href=https://www.unix.com/man-page/osx/2/exchangedata/>exchangedata(2)</a></td><td></td></tr><tr><td>extattr_delete_file</td><td></td><td></td><td></td><td><a href="https://www.freebsd.org/cgi/man.cgi?query=extattr&sektion=2">extattr(2)</a></td></tr><tr><td>extattr_get_file</td><td></td><td></td><td></td><td><a href="https://www.freebsd.org/cgi/man.cgi?query=extattr&sektion=2">extattr(2)</a></td></tr><tr><td>extattr_list_file</td><td></td><td></td><td></td><td><a href="https://www.freebsd.org/cgi/man.cgi?query=extattr&sektion=2">extattr(2)</a></td></tr><tr><td>extattr_set_file</td><td></td><td></td><td></td><td><a href="https://www.freebsd.org/cgi/man.cgi?query=extattr&sektion=2">extattr(2)</a></td></tr><tr><td>fallocate</td><td></td><td><a href=http://man7.org/linux/man-pages/man2/fallocate.2.html>fallocate(2)</a></td><td></td><td></td></tr><tr><td>fsync</td><td><a href=http://pubs.opengroup.org/onlinepubs/9699919799/functions/fsync.html>POSIX</a></td><td><a href=http://man7.org/linux/man-pages/man2/fsync.2.html>fsync(2)</a></td><td><a href=https://www.unix.com/man-page/osx/2/fsync/>fsync(2)</a></td><td><a href="https://www.freebsd.org/cgi/man.cgi?query=fsync&sektion=2">fsync(2)</a></td></tr><tr><td>stat</td><td><a href=http://pubs.opengroup.org/onlinepubs/9699919799/functions/stat.html>POSIX</a></td><td><a href=http://man7.org/linux/man-pages/man2/stat.2.html>stat(2)</a></td><td><a href=https://www.unix.com/man-page/osx/2/stat/>stat(2)</a></td><td><a href="https://www.freebsd.org/cgi/man.cgi?query=stat&sektion=2">stat(2)</a></td></tr><tr><td>fcntl</td><td><a href=http://pubs.opengroup.org/onlinepubs/9699919799/functions/fcntl.html>POSIX</a></td><td><a href=http://man7.org/linux/man-pages/man2/fcntl.2.html>fcntl(2)</a></td><td><a href=https://www.unix.com/man-page/osx/2/fcntl/>fcntl(2)</a></td><td><a href="https://www.freebsd.org/cgi/man.cgi?query=fcntl&sektion=2">fcntl(2)</a></td></tr><tr><td>flock</td><td></td><td><a href=http://man7.org/linux/man-pages/man2/flock.2.html>flock(2)</a></td><td><a href=https://www.unix.com/man-page/osx/2/flock/>flock(2)</a></td><td><a href="https://www.freebsd.org/cgi/man.cgi?query=flock&sektion=2">flock(2)</a></td></tr><tr><td>getxattr</td><td></td><td><a href=http://man7.org/linux/man-pages/man2/getxattr.2.html>getxattr(2)</a></td><td><a href=https://www.unix.com/man-page/osx/2/getxattr/>getxattr(2)</a></td><td></td></tr><tr><td>link</td><td><a href=http://pubs.opengroup.org/onlinepubs/9699919799/functions/link.html>POSIX</a></td><td><a href=http://man7.org/linux/man-pages/man2/link.2.html>link(2)</a></td><td><a href=https://www.unix.com/man-page/osx/2/link/>link(2)</a></td><td><a href="https://www.freebsd.org/cgi/man.cgi?query=link&sektion=2">link(2)</a></td></tr><tr><td>listxattr</td><td></td><td><a href=http://man7.org/linux/man-pages/man2/listxattr.2.html>listxattr(2)</a></td><td><a href=https://www.unix.com/man-page/osx/2/listxattr/>listxattr(2)</a></td><td></td></tr><tr><td>lseek</td><td><a href=http://pubs.opengroup.org/onlinepubs/9699919799/functions/lseek.html>POSIX</a></td><td><a href=http://man7.org/linux/man-pages/man2/lseek.2.html>lseek(2)</a></td><td><a href=https://www.unix.com/man-page/osx/2/lseek/>lseek(2)</a></td><td><a href="https://www.freebsd.org/cgi/man.cgi?query=lseek&sektion=2">lseek(2)</a></td></tr><tr><td>mkdir</td><td><a href=http://pubs.opengroup.org/onlinepubs/9699919799/functions/mkdir.html>POSIX</a></td><td><a href=http://man7.org/linux/man-pages/man2/mkdir.2.html>mkdir(2)</a></td><td><a href=https://www.unix.com/man-page/osx/2/mkdir/>mkdir(2)</a></td><td><a href="https://www.freebsd.org/cgi/man.cgi?query=mkdir&sektion=2">mkdir(2)</a></td></tr><tr><td>mknod</td><td><a href=http://pubs.opengroup.org/onlinepubs/9699919799/functions/mknod.html>POSIX</a></td><td><a href=http://man7.org/linux/man-pages/man2/mknod.2.html>mknod(2)</a></td><td><a href=https://www.unix.com/man-page/osx/2/mknod/>mknod(2)</a></td><td><a href="https://www.freebsd.org/cgi/man.cgi?query=mknod&sektion=2">mknod(2)</a></td></tr><tr><td>open</td><td><a href=http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html>POSIX</a></td><td><a href=http://man7.org/linux/man-pages/man2/open.2.html>open(2)</a></td><td><a href=https://www.unix.com/man-page/osx/2/open/>open(2)</a></td><td><a href="https://www.freebsd.org/cgi/man.cgi?query=open&sektion=2">open(2)</a></td></tr><tr><td>opendir</td><td><a href=http://pubs.opengroup.org/onlinepubs/9699919799/functions/opendir.html>POSIX</a></td><td><a href=http://man7.org/linux/man-pages/man3/opendir.3.html>opendir(3)</a></td><td><a href=https://www.unix.com/man-page/osx/3/directory/>directory(3)</a></td><td><a href="https://www.freebsd.org/cgi/man.cgi?query=directory&sektion=3">directory(3)</a></td></tr><tr><td>poll</td><td><a href=http://pubs.opengroup.org/onlinepubs/9699919799/functions/poll.html>POSIX</a></td><td><a href=http://man7.org/linux/man-pages/man2/poll.2.html>poll(2)</a></td><td><a href=https://www.unix.com/man-page/osx/2/poll/>poll(2)</a></td><td><a href="https://www.freebsd.org/cgi/man.cgi?query=poll&sektion=2">poll(2)</a></td></tr><tr><td>read</td><td><a href=http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html>POSIX</a></td><td><a href=http://man7.org/linux/man-pages/man2/read.2.html>read(2)</a></td><td><a href=https://www.unix.com/man-page/osx/2/read/>read(2)</a></td><td><a href="https://www.freebsd.org/cgi/man.cgi?query=read&sektion=2">read(2)</a></td></tr><tr><td>readdir</td><td><a href=http://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html>POSIX</a></td><td><a href=http://man7.org/linux/man-pages/man3/readdir.3.html>readdir(3)</a></td><td><a href=https://www.unix.com/man-page/osx/3/directory/>directory(3)</a></td><td><a href="https://www.freebsd.org/cgi/man.cgi?query=directory&sektion=3">directory(3)</a></td></tr><tr><td>readlink</td><td><a href=http://pubs.opengroup.org/onlinepubs/9699919799/functions/readlink.html>POSIX</a></td><td><a href=http://man7.org/linux/man-pages/man2/readlink.2.html>readlink(2)</a></td><td><a href=https://www.unix.com/man-page/osx/2/readlink/>readlink(2)</a></td><td><a href="https://www.freebsd.org/cgi/man.cgi?query=readlink&sektion=2">readlink(2)</a></td></tr><tr><td>removexattr</td><td></td><td><a href=http://man7.org/linux/man-pages/man2/removexattr.2.html>removexattr(2)</a></td><td><a href=https://www.unix.com/man-page/osx/2/removexattr/>removexattr(2)</a></td><td></td></tr><tr><td>rename</td><td><a href=http://pubs.opengroup.org/onlinepubs/9699919799/functions/rename.html>POSIX</a></td><td><a href=http://man7.org/linux/man-pages/man2/rename.2.html>rename(2)</a></td><td><a href=https://www.unix.com/man-page/osx/2/rename/>rename(2)</a></td><td><a href="https://www.freebsd.org/cgi/man.cgi?query=rename&sektion=2">rename(2)</a></td></tr><tr><td>renameat2</td><td></td><td><a href=http://man7.org/linux/man-pages/man2/rename.2.html>rename(2)</a></td><td></td><td></td></tr><tr><td>rmdir</td><td><a href=http://pubs.opengroup.org/onlinepubs/9699919799/functions/rmdir.html>POSIX</a></td><td><a href=http://man7.org/linux/man-pages/man2/rmdir.2.html>rmdir(2)</a></td><td><a href=https://www.unix.com/man-page/osx/2/rmdir/>rmdir(2)</a></td><td><a href="https://www.freebsd.org/cgi/man.cgi?query=rmdir&sektion=2">rmdir(2)</a></td></tr><tr><td>chmod</td><td><a href=http://pubs.opengroup.org/onlinepubs/9699919799/functions/chmod.html>POSIX</a></td><td><a href=http://man7.org/linux/man-pages/man2/chmod.2.html>chmod(2)</a></td><td><a href=https://www.unix.com/man-page/osx/2/chmod/>chmod(2)</a></td><td><a href="https://www.freebsd.org/cgi/man.cgi?query=chmod&sektion=2">chmod(2)</a></td></tr><tr><td>chown</td><td><a href=http://pubs.opengroup.org/onlinepubs/9699919799/functions/chown.html>POSIX</a></td><td><a href=http://man7.org/linux/man-pages/man2/chown.2.html>chown(2)</a></td><td><a href=https://www.unix.com/man-page/osx/2/chown/>chown(2)</a></td><td><a href="https://www.freebsd.org/cgi/man.cgi?query=chown&sektion=2">chown(2)</a></td></tr><tr><td>utime</td><td><a href=http://pubs.opengroup.org/onlinepubs/9699919799/functions/utime.html>POSIX</a></td><td><a href=http://man7.org/linux/man-pages/man2/utime.2.html>utime(2)</a></td><td><a href=https://www.unix.com/man-page/osx/3/utime/>utime(3)</a></td><td><a href="https://www.freebsd.org/cgi/man.cgi?query=utime&sektion=3">utime(3)</a></td></tr><tr><td>setxattr</td><td></td><td><a href=http://man7.org/linux/man-pages/man2/setxattr.2.html>setxattr(2)</a></td><td><a href=https://www.unix.com/man-page/osx/2/setxattr/>setxattr(2)</a></td><td></td></tr><tr><td>statfs</td><td></td><td><a href=http://man7.org/linux/man-pages/man2/statfs.2.html>statfs(2)</a></td><td><a href=https://www.unix.com/man-page/osx/2/statfs/>statfs(2)</a></td><td><a href="https://www.freebsd.org/cgi/man.cgi?query=statfs&sektion=2">statfs(2)</a></td></tr><tr><td>symlink</td><td><a href=http://pubs.opengroup.org/onlinepubs/9699919799/functions/symlink.html>POSIX</a></td><td><a href=http://man7.org/linux/man-pages/man2/symlink.2.html>symlink(2)</a></td><td><a href=https://www.unix.com/man-page/osx/2/symlink/>symlink(2)</a></td><td><a href="https://www.freebsd.org/cgi/man.cgi?query=symlink&sektion=2">symlink(2)</a></td></tr><tr><td>unlink</td><td><a href=http://pubs.opengroup.org/onlinepubs/9699919799/functions/unlink.html>POSIX</a></td><td><a href=http://man7.org/linux/man-pages/man2/unlink.2.html>unlink(2)</a></td><td><a href=https://www.unix.com/man-page/osx/2/unlink/>unlink(2)</a></td><td><a href="https://www.freebsd.org/cgi/man.cgi?query=unlink&sektion=2">unlink(2)</a></td></tr><tr><td>write</td><td><a href=http://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html>POSIX</a></td><td><a href=http://man7.org/linux/man-pages/man2/write.2.html>write(2)</a></td><td><a href=https://www.unix.com/man-page/osx/2/write/>write(2)</a></td><td><a href="https://www.freebsd.org/cgi/man.cgi?query=write&sektion=2">write(2)</a></td></tr></tbody></table></blog-section><blog-section><h2 slot=title>Linux</h2><p>Linux syscalls are defined in <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/syscalls.h?h=v5.19">include/linux/syscalls.h</a>. Syscalls use the same parameter order across platforms, but some (e.g. <code>sys_stat64</code>) are only defined on some platforms, and others (e.g. <code>sys_clone</code>) have different parameters depending on kernel compilation options. Syscall numbers are platform-dependent.</p><p>Manpage <a href=http://man7.org/linux/man-pages/man2/syscalls.2.html>syscalls(2)</a> lists syscalls and which kernel version they were added in. Manpage <a href=http://man7.org/linux/man-pages/man2/syscall.2.html>syscall(2)</a> lists per-architecture calling conventions and register assignments.</p><p>Documentation and tutorials for implementing a Linux syscall:</p><ul><li>LWN: Anatomy of a system call [<a href=https://lwn.net/Articles/604287/>part 1</a>] [<a href=https://lwn.net/Articles/604515/>part 2</a>] [<a href=https://lwn.net/Articles/604406/>additional content</a>] (David Drysdale)</li><li><a href=https://brennan.io/2016/11/14/kernel-dev-ep3/>Tutorial - Write a System Call</a> (Stephen Brennan)</li><li><a href=https://arvindsraj.wordpress.com/2012/10/05/adding-hello-world-system-call-to-linux/>Adding hello world system call to Linux</a> (Arvind S. Raj)</li><li><a href=https://medium.com/@ssreehari/implementing-a-system-call-in-linux-kernel-4-7-1-6f98250a8c38>Implementing a system call in Linux Kernel 4.7.1</a> (Sreehari S.)</li><li><a href=https://tssurya.wordpress.com/2014/08/19/adding-a-hello-world-system-call-to-linux-kernel-3-16-0/>Adding a Hello World System Call to Linux kernel 3.16.0</a> (Surya Seetharaman)</li></ul><blog-section id=linux-i386-interrupt><h3 slot=title>Linux: i386 (INT 0x80)</h3><p>The syscall number is passed in register <code>eax</code>. Syscalls with six or fewer parameters pass them in registers [<code>ebx</code>, <code>ecx</code>, <code>edx</code>, <code>esi</code>, <code>edi</code>, <code>ebp</code>]. Syscalls with more than six parameters use <code>ebx</code> to pass a memory address, in a way that doesn't seem to be well documented.</p><p>Linux syscall numbers for i386 are defined in <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/entry/syscalls/syscall_32.tbl?h=v5.19">arch/x86/entry/syscalls/syscall_32.tbl</a>.</p><p>See above for background on <code>int $0x80</code>.</p><blog-code><pre>
.data
.set .L_STDOUT, 1
.set .L_SYSCALL_EXIT, 1
.set .L_SYSCALL_WRITE, 4
.L_message:
.ascii "Hello, world!\n"
.set .L_message_len, . - .L_message
.text
.global _start
_start:
# write(STDOUT, message, message_len)
mov $.L_SYSCALL_WRITE, %eax
mov $.L_STDOUT, %ebx
mov $.L_message, %ecx
mov $.L_message_len, %edx
int $0x80
# exit(0)
mov $.L_SYSCALL_EXIT, %eax
mov $0, %ebx
int $0x80</pre></blog-code><p>static linking</p><blog-code syntax=commands><pre>
as --32 -o hello.o hello.s
ld -m elf_i386 -o hello hello.o
file hello
# hello: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, not stripped
./hello
# Hello, world!</pre></blog-code><p>dynamic linking</p><blog-code syntax=commands><pre>
as --32 -o hello.o hello.s
ld -m elf_i386 -o hello hello.o \
# --dynamic-linker /lib/ld-linux.so.2 \
# -l:ld-linux.so.2
file hello
# hello: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, not stripped
ldd hello
# /lib/ld-linux.so.2 (0x56614000)
# linux-gate.so.1 (0xf77ba000)
./hello
# Hello, world!</pre></blog-code></blog-section><blog-section id=linux-i386-vdso><h3 slot=title>Linux: i386 (vDSO)</h3><p>A <a href=https://en.wikipedia.org/wiki/VDSO>vDSO</a> is a shared library injected into processes by the kernel, rather than loaded by the dynamic linker. It's used in i386 linux to implement faster syscalls via the <code>SYSENTER</code> instructions available in modern 32-bit x86 processors<blog-footnote-ref>[<a href=#fn:1>1</a>]</blog-footnote-ref>
<blog-footnote-ref>[<a href=#fn:2>2</a>]</blog-footnote-ref>. Later kernel versions also added fast paths for certain read-only syscalls<blog-footnote-ref>[<a href=#fn:3>3</a>]</blog-footnote-ref>.</p><p>This code is slightly more complicated than the <code>int 0x80</code> example because all functions loaded from shared objects (including <code>__kernel_vsyscall</code>) must use indirect calls.</p><blog-code><pre>
.extern __kernel_vsyscall
.data
.set .L_STDOUT, 1
.set .L_SYSCALL_WRITE, 4
.set .L_SYSCALL_EXIT, 1
.L_message:
.ascii "Hello, world!\n"
.set .L_message_len, . - .L_message
.text
.global _start
_start:
call .L_get_pc_thunk.esi
add $_GLOBAL_OFFSET_TABLE_, %esi
# write(STDOUT, message, message_len)
mov $.L_SYSCALL_WRITE, %eax
mov $.L_STDOUT, %ebx
mov $.L_message, %ecx
mov $.L_message_len, %edx
call *__kernel_vsyscall@GOT(%esi)
# exit(0)
mov $.L_SYSCALL_EXIT, %eax
mov $0, %ebx
call *__kernel_vsyscall@GOT(%esi)
.L_get_pc_thunk.esi:
mov (%esp), %esi
ret</pre></blog-code><p>The <code>linux-gate.so.1</code> library that will be available at runtime is not available to the linker at compile time. To get the correct symbols and ELF headers into the executable, we need to inject some fake data:</p><ul><li><code>--defsym __kernel_vsyscall=0</code> creates a place for the symbol address to be written to, once resolved. This also prevents the linker from warning about an unresolved symbol.</li><li>Creating a dummy shared object with <code>ld -shared -soname=linux-gate.so.1</code> causes the linker to add a <code>DT_NEEDED</code> entry for the vDSO, so the dynamic linker will know to use it as a source of symbol addresses.</li></ul><p>The resulting binary is a totally normal dynamic ELF executable.</p><blog-code syntax=commands><pre>
echo '.type __kernel_vsyscall STT_FUNC' | as --32 -o dummy_so.o
ld -m elf_i386 -shared \
# --defsym __kernel_vsyscall=0 \
# -soname=linux-gate.so.1 \
# -o dummy_so dummy_so.o
as --32 -o hello.o hello.s
ld -m elf_i386 -o hello hello.o \
# --dynamic-linker /lib/ld-linux.so.2 \
# -l:ld-linux.so.2 \
# dummy_so
file hello
# hello: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, not stripped
ldd hello
# /lib/ld-linux.so.2 (0x56625000)
# linux-gate.so.1 (0xf77d5000)
./hello
# Hello, world!</pre></blog-code><blog-section><h4 slot=title>Why not auxinfo?</h4><p>Some articles about the Linux vDSO describe looking up its address using the <a href=https://refspecs.linuxfoundation.org/LSB_1.3.0/IA64/spec/auxiliaryvector.html>ELF auxiliary vector</a>. I avoided this because it seems complicated and fussy:</p><ul><li><code>AT_SYSINFO</code> provides the address of <code>__kernel_vsyscall</code> directly, but is deprecated<blog-footnote-ref>[<a href=#fn:4>4</a>]</blog-footnote-ref> and requires the discovered address to be plumbed through client code (or assigned to a magic global in some very early initializer).</li><li><code>AT_SYSINFO_EHDR</code> provides the address of the vDSO, which requires further parsing using an ELF library to extract relevant symbol addresses. I don't want my programs to embed ELF parsers, especially when a perfectly good one is available in <code>ld.so</code>.</li><li>The dynamic linker solution can be trivially extended to other Linux vDSO symbols like <code>__vdso_gettimeofday</code>, again with no ELF parsing needed.</li></ul><p>The main disadvantage of my solution is it can't be used in a statically linked executable, which are useful for system recovery tools (e.g. busybox) or minimal Docker containers.</p></blog-section><blog-section><h4 slot=title>Why not gs:0x10?</h4><p>I've seen one article recommend using <code>call *%gs:0x10</code>to invoke <code>__kernel_vsyscall</code>, because GNU libc uses this register to locate its early-initialized magic globals.</p><p>Don't do this. Everything I can find about glibc auxv handling indicates that the value of <code>%gs</code> is not part of the GNU libc public ABI, and it seems to be pointing to some internal datastructure that happens to have the address of <code>__kernel_vsyscall</code> at offset 0x10 (<a href=http://lkml.iu.edu/hypermail/linux/kernel/0212.2/1132.html>used to be 0x18</a>). There is no guarantees that these properties will be true in the future, especially if you want your code to link against non-GNU libc implementations such as musl.</p></blog-section></blog-section><blog-section id=linux-x86-64><h3 slot=title>Linux: x86-64</h3><p>The syscall number is passed in register <code>rax</code>. Parameters are passed in registers [<code>rdi</code>, <code>rsi</code>, <code>rdx</code>, <code>rcx</code>, <code>r8</code>, <code>r9</code>]. I haven't found documentation on what x86-64 Linux does for syscalls with more than six parameters. The <code>syscall</code> instruction is used to pass control to the kernel.</p><p>Linux syscall numbers for x86-64 are defined in <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/entry/syscalls/syscall_64.tbl?h=v5.19">arch/x86/entry/syscalls/syscall_64.tbl</a>.</p><blog-code><pre>
.data
.set .L_STDOUT, 1
.set .L_SYSCALL_EXIT, 60
.set .L_SYSCALL_WRITE, 1
.L_message:
.ascii "Hello, world!\n"
.set .L_message_len, . - .L_message
.text
.global _start
_start:
# write(STDOUT, message, message_len)
mov $.L_SYSCALL_WRITE, %rax
mov $.L_STDOUT, %rdi
mov $.L_message, %rsi
mov $.L_message_len, %rdx
syscall
# exit(0)
mov $.L_SYSCALL_EXIT, %rax
mov $0, %rdi
syscall</pre></blog-code><p>static linking</p><blog-code syntax=commands><pre>
as --64 -o hello.o hello.s
ld -m elf_x86_64 -o hello hello.o
file hello
# hello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped
./hello
# Hello, world!</pre></blog-code><p>dynamic linking</p><blog-code syntax=commands><pre>
as --64 -o hello.o hello.s
ld -m elf_x86_64 -o hello hello.o \
# --dynamic-linker /lib64/ld-linux-x86-64.so.2 \
# -l:ld-linux-x86-64.so.2
file hello
# hello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, not stripped
ldd hello
# /lib64/ld-linux-x86-64.so.2 (0x00007f472a831000)
# linux-vdso.so.1 (0x00007ffe83d7a000)
./hello
# Hello, world!</pre></blog-code></blog-section><blog-section id=linux-armv6-eabi><h3 slot=title>Linux: ARM v6 (Little-Endian, EABI)</h3>include/uapi/asm-generic/unistd.h<p>Linux syscall numbers for ARM are defined in <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/tools/syscall.tbl?h=v5.19">arch/arm/tools/syscall.tbl</a>.</p><blog-code><pre>
.arch armv6
.data
.set .L_STDOUT, 1
.set .L_SYSCALL_EXIT, 1
.set .L_SYSCALL_WRITE, 4
.L_message:
.ascii "Hello, world!\n"
.set .L_message_len, . - .L_message
.text
.global _start
_start:
@ write(STDOUT, message, message_len)
mov %r7, #.L_SYSCALL_WRITE
mov %r0, #.L_STDOUT
ldr %r1, =.L_message
mov %r2, #.L_message_len
swi #0
@ exit(0)
mov %r7, #.L_SYSCALL_EXIT
mov %r0, #0
swi #0</pre></blog-code><p>static linking</p><blog-code syntax=commands><pre>
as -EL -o hello.o hello.s
ld -m armelf_linux_eabi -o hello hello.o
file hello
# hello: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, not stripped
./hello
# Hello, world!</pre></blog-code><p>dynamic linking</p><blog-code syntax=commands><pre>
as -EL -o hello.o hello.s
ld -m armelf_linux_eabi -o hello hello.o \
# --dynamic-linker /lib/ld-linux-armhf.so.3 \
# -l:ld-linux-armhf.so.3
file hello
# hello: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, not stripped
./hello
# Hello, world!</pre></blog-code></blog-section><blog-section id=linux-riscv64><h3 slot=title>Linux: RISC-V</h3><p>The syscall number is passed in register <tt>a7</tt>, and parameters in registers <tt>a0</tt> to <tt>a6</tt>.</p><p>Linux syscall numbers for RISC-V are defined in <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/asm-generic/unistd.h?h=v5.19">include/uapi/asm-generic/unistd.h</a>.</p><blog-code><pre>
.section .rodata
.set .L_STDOUT, 1
.set .L_SYS_exit, 93
.set .L_SYS_write, 64
.L_message:
.ascii "Hello, world!\n"
.set .L_message_len, . - .L_message
.text
.global _start
_start:
li a7, .L_SYS_write
li a0, .L_STDOUT
la a1, .L_message
li a2, .L_message_len
ecall
li a7, .L_SYS_exit
li a0, 0
ecall</pre></blog-code><p>static linking</p><blog-code syntax=commands><pre>
as -o hello.o hello.s
ld -m elf64lriscv -o hello hello.o
file hello
# hello: ELF 64-bit LSB executable, UCB RISC-V, double-float ABI, version 1 (SYSV), statically linked, not stripped
./hello
# Hello, world!</pre></blog-code><p>dynamic linking</p><blog-code syntax=commands><pre>
as -o hello.o hello.s
ld -m elf64lriscv -o hello hello.o \
# --dynamic-linker /lib/ld-linux-riscv64-lp64d.so.1 \
# -l:ld-linux-riscv64-lp64d.so.1
file hello
# hello: ELF 64-bit LSB executable, UCB RISC-V, RVC, double-float ABI, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-riscv64-lp64d.so.1, not stripped
./hello
# Hello, world!</pre></blog-code></blog-section></blog-section><blog-section id=darwin><h2 slot=title>Darwin (MacOS X)</h2><p>Note that I have left out the instructions to statically link binaries because they are documented as unsupported: <a href=https://developer.apple.com/library/content/qa/qa1118/_index.html>Technical Q&A QA1118: Statically linked binaries on Mac OS X</a>. Apple is also known to break the syscall ABI between MacOS versions, though it should be stable enough for the syscalls inherited from BSD.</p><p>Use of <code>lea</code> here is because PIE addressing is required for <code>-macos_version_min 10.7</code> or later. Make sure this linker flag matches the <code>.macosx_version_min</code> value in the assembly, or the linker may reject your object code.</p><p>10.8 and later requires linking with libSystem via <code>ld -lSystem</code>. Earlier versions don't need that link.</p><p>The default entry point changed from <code>start</code> to <code>_main</code> in 10.8. Use <code>ld -e _main</code> to build for earlier <code>-macos_version_min</code> values.</p><blog-section id=darwin-i386><h3 slot=title>Darwin: i386</h3><blog-code><pre>
.macosx_version_min 10, 8
.data
.set L_STDOUT, 1
.set L_SYSCALL_EXIT, 1
.set L_SYSCALL_WRITE, 4
L_message:
.ascii "Hello, world!\n"
.set L_message_len, . - L_message
.text
.global _main
_main:
mov %eax, %esi
# write(STDOUT, message, message_len)
push $L_message_len
lea L_message-_main(%esi), %eax
push %eax
push $L_STDOUT
push $0 # stack padding
mov $L_SYSCALL_WRITE, %eax
int $0x80
add $16, %esp
# exit(0)
push $0 # exit code
push $0 # stack padding
mov $L_SYSCALL_EXIT, %eax
int $0x80</pre></blog-code><p>dynamic linking</p><blog-code syntax=commands><pre>
as -arch i386 -o hello.o hello.s
ld -arch i386 -macosx_version_min 10.8 -lSystem -o hello hello.o
file hello
# hello: Mach-O executable i386
otool -L hello
# hello:
# /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1238.60.2)
./hello
# Hello, world!</pre></blog-code></blog-section><blog-section id=darwin-x86-64><h3 slot=title>Darwin: x86-64</h3><p>In 64-bit MacOS X, syscall numbers are divided into "classes". The syscalls inherited from BSD are in <code>SYSCALL_CLASS_UNIX</code>, starting at <code>0x2000000</code>. See XNU header <a href=https://opensource.apple.com/source/xnu/xnu-4570.41.2/osfmk/mach/syscall_sw.h.auto.html>osfmk/mach/syscall_sw.h</a> for details.</p><blog-code><pre>
.macosx_version_min 10, 8
.data
.set L_STDOUT, 1
.set L_SYSCALL_EXIT, 0x2000001
.set L_SYSCALL_WRITE, 0x2000004
L_message:
.ascii "Hello, world!\n"
.set L_message_len, . - L_message
.text
.global _main
_main:
# write(STDOUT, message, message_len)
mov $L_SYSCALL_WRITE, %rax
mov $L_STDOUT, %rdi
lea L_message(%rip), %rsi
mov $L_message_len, %rdx
syscall
# exit(0)
mov $L_SYSCALL_EXIT, %rax
mov $0, %rdi
syscall</pre></blog-code><p>dynamic linking</p><blog-code syntax=commands><pre>
as -arch x86_64 hello.s -o hello.o
ld -arch x86_64 -o hello hello.o \
# -macosx_version_min 10.8 -lSystem
file hello
# hello: Mach-O 64-bit executable x86_64
otool -L hello
# hello:
# /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1238.60.2)
./hello
# Hello, world!</pre></blog-code></blog-section></blog-section><blog-section><h2 slot=title>FreeBSD</h2><p>The list of system calls is defined in <a href="https://cgit.freebsd.org/src/tree/sys/kern/syscalls.master?h=release/13.1.0">sys/kern/syscalls.master</a>. Syscall numbers are the same across hardware platforms.</p><blog-section id=freebsd-i386><h3 slot=title>FreeBSD: i386</h3><p><code>int $0x80</code> appears to be the only supported syscall mechanism for FreeBSD on i386. There is a vDSO at <a href="https://cgit.freebsd.org/src/tree/sys/sys/vdso.h?h=release/13.1.0">sys/sys/vdso.h</a> but it doesn't contain a Linux-style generic syscall trampoline.</p><blog-code><pre>
.data
.set .L_STDOUT, 1
.set .L_SYSCALL_EXIT, 1
.set .L_SYSCALL_WRITE, 4
.L_message:
.ascii "Hello, world!\n"
.set .L_message_len, . - .L_message
.text
.global _start
_start:
# write(STDOUT, message, message_len)
push $.L_message_len
push $.L_message
push $.L_STDOUT
push $0 # stack padding
mov $.L_SYSCALL_WRITE, %eax
int $0x80
add $16, %esp
# exit(0)
push $0 # exit code
push $0 # stack padding
mov $.L_SYSCALL_EXIT, %eax
int $0x80</pre></blog-code><p>static linking</p><blog-code syntax=commands><pre>
as --32 -o hello.o hello.s
ld -m elf_i386_fbsd -o hello hello.o
file hello
# hello: ELF 32-bit LSB executable, Intel 80386, version 1 (FreeBSD), statically linked, not stripped
./hello
# Hello, world!</pre></blog-code><p>dynamic linking</p><blog-code syntax=commands><pre>
as --32 -o hello.o hello.s
ld -m elf_i386_fbsd -o hello hello.o \
# --dynamic-linker=/libexec/ld-elf.so.1 \
# -L/libexec -l:ld-elf.so.1 \
# --hash-style=gnu
file hello
# hello: ELF 32-bit LSB executable, Intel 80386, version 1 (FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1, not stripped
ldd hello
# hello:
# /libexec/ld-elf.so.1 (0x2806e000)
./hello
# Hello, world!</pre></blog-code></blog-section><blog-section id=freebsd-x86-64><h3 slot=title>FreeBSD: x86-64</h3><p>Note that older FreeBSD kernels contain a <a href="https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=182161">bug in syscall handling</a> that can cause crashes when using the <code>SYSCALL</code> instruction. Compilers targeting these old versions should use <code>INT $0x80</code> instead.</p><blog-code><pre>
.data
.set L_STDOUT, 1
.set L_SYSCALL_EXIT, 1
.set L_SYSCALL_WRITE, 4
L_message:
.ascii "Hello, world!\n"
.set L_message_len, . - L_message
.text
.global _main
_main:
# write(STDOUT, message, message_len)
mov $L_SYSCALL_WRITE, %rax
mov $L_STDOUT, %rdi
mov $L_message, %rsi
mov $L_message_len, %rdx
syscall
# exit(0)
mov $L_SYSCALL_EXIT, %rax
mov $0, %rdi
syscall</pre></blog-code><p>static linking</p><blog-code syntax=commands><pre>
as --64 -o hello.o hello.s
ld -m elf_x86_64_fbsd -o hello hello.o
file hello
# hello: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), statically linked, not stripped
./hello
# Hello, world!</pre></blog-code><p>dynamic linking</p><blog-code syntax=commands><pre>
as --64 -o hello.o hello.s
ld -m elf_x86_64_fbsd -o hello hello.o \
# --dynamic-linker=/libexec/ld-elf.so.1 \
# -L/libexec -l:ld-elf.so.1 \
# --hash-style=gnu
file hello
# hello: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1, not stripped
ldd hello
# hello:
# /libexec/ld-elf.so.1 (0x800822000)
./hello
# Hello, world!</pre></blog-code></blog-section><blog-section id=freebsd-riscv64><h3 slot=title>FreeBSD: RISC-V</h3><p>The syscall number is passed in register <tt>t0</tt>, and parameters in registers <tt>a0</tt> to <tt>a6</tt>.</p><blog-code><pre>
.section .rodata
.set .L_STDOUT, 1
.set .L_SYS_exit, 1
.set .L_SYS_write, 4
.L_message:
.ascii "Hello, world!\n"
.set .L_message_len, . - .L_message
.text
.global _start
_start:
li t0, .L_SYS_write
li a0, .L_STDOUT
la a1, .L_message
li a2, .L_message_len
ecall
li t0, .L_SYS_exit
li a0, 5
ecall</pre></blog-code><p>static linking</p><blog-code syntax=commands><pre>
as -o hello.o hello.s
ld -m elf64lriscv_fbsd -o hello hello.o
file hello
# hello: ELF 64-bit LSB executable, UCB RISC-V, double-float ABI, version 1 (FreeBSD), statically linked, not stripped
./hello
# Hello, world!</pre></blog-code><p>dynamic linking</p><blog-code syntax=commands><pre>
as -o hello.o hello.s
ld -m elf64lriscv_fbsd -o hello hello.o \
# --dynamic-linker=/libexec/ld-elf.so.1 \
# -L/libexec -l:ld-elf.so.1 -rpath /libexec
file hello
# hello: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1, not stripped
ldd hello
# hello:
# ld-elf.so.1 (0x82254000)
./hello
# Hello, world!</pre></blog-code></blog-section></blog-section><blog-section id=sunos><h2 slot=title>SunOS 4.x (Solaris 1.x)</h2><blog-section id=sunos-sparc7><h3 slot=title>SunOS: SPARC v7</h3><blog-code><pre>
.seg "data"
L_STDOUT = 1
L_SYSCALL_EXIT = 1
L_SYSCALL_WRITE = 4
L_message:
.ascii "Hello world!\n"
L_message_len = . - L_message
.seg "text"
.global _start
_start:
! write(STDOUT, message, message_len)
mov L_SYSCALL_WRITE, %g1
mov L_STDOUT, %o0
set L_message, %o1
set L_message_len, %o2
ta 0
! exit(0)
mov L_SYSCALL_EXIT, %g1
mov 0, %o0
ta 0</pre></blog-code><p>static linking</p><blog-code syntax=commands prompt=%><pre>
as -o hello.o hello.s
ld -e _start -o hello hello.o
file hello
# hello: sparc demand paged executable not stripped
ldd hello
# hello: statically linked
./hello
# Hello world!</pre></blog-code></blog-section></blog-section><blog-section><h2 slot=title>Inline Assembly</h2><p>Higher-level languages sometimes let assembly be embedded directly into their object code. The exact syntax is language- and compiler-specific.</p><p>I used x86-64 Linux as the target platform for these examples, but they should work equally well if the appropriate instructions are substituted.</p><p>A note on "clobbering": compilers require the inline assembly block to declare which CPU registers _other than the inputs and outputs_ may be modified. The exact set of clobbered registers is compiler-, platform-, and os-specific<blog-footnote-ref>[<a href=#fn:5>5</a>]</blog-footnote-ref>. Linux on x86-64 clobbers <code>rcx</code> and <code>r11</code> (and maybe <code>r10</code>, as claimed by osdev?).</p><blog-section id=linux-x86-64-gnu-c><h3 slot=title>Linux: x86-64 (GNU C)</h3><p>See <a href=https://gcc.gnu.org/onlinedocs/gcc-7.3.0/gcc/Using-Assembly-Language-with-C.html>Using Assembly Language with C</a> in the GCC manual for an overview, <a href=https://gcc.gnu.org/onlinedocs/gcc-7.3.0/gcc/Machine-Constraints.html>Machine Constraints</a> for architecture-specific codes to pass parameters into an assembly block, and <a href=https://gcc.gnu.org/onlinedocs/gcc-7.3.0/gcc/Local-Register-Variables.html>Local Register Variables</a> for details on assigning values to specific registers.</p><p>I couldn't find documentation on which registers GNU C's inline assembly clobbers, if any.</p><blog-code syntax=c><pre>
static const int STDOUT = 1;
static const int SYSCALL_EXIT = 60;
static const int SYSCALL_WRITE = 1;
static const char message[] = "Hello, world!\n";
static const int message_len = sizeof(message);
void _start() {
{ /* write(STDOUT, message, message_len) */
register int rax __asm__ ("rax") = SYSCALL_WRITE;
register int rdi __asm__ ("rdi") = STDOUT;
register const char *rsi __asm__ ("rsi") = message;
register int rdx __asm__ ("rdx") = message_len;
__asm__ __volatile__ ("syscall"
: "+r" (rax)
: "r" (rax), "r" (rdi), "r" (rsi), "r" (rdx)
: "rcx", "r11");
}
{ /* exit(0) */
register int rax __asm__ ("rax") = SYSCALL_EXIT;
register int rdi __asm__ ("rdi") = 0;
__asm__ __volatile__ ("syscall"
:
: "r" (rax), "r" (rdi)
: "rcx", "r11");
}
}</pre></blog-code><p>static linking</p><blog-code syntax=commands><pre>
gcc -m64 -c -o hello.o hello.c
ld -m elf_x86_64 -o hello hello.o
file hello
# hello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped
./hello
# Hello, world!</pre></blog-code><p>dynamic linking</p><blog-code syntax=commands><pre>
gcc -m64 -c -o hello.o hello.c
ld -m elf_x86_64 -o hello hello.o \
# --dynamic-linker /lib64/ld-linux-x86-64.so.2 \
# -l:ld-linux-x86-64.so.2
file hello
# hello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, not stripped
./hello
# Hello, world!</pre></blog-code></blog-section><blog-section id=linux-x86-64-llvm><h3 slot=title>Linux: x86-64 (LLVM IR)</h3><p>See <a href=https://llvm.org/docs/LangRef.html#inline-assembler-expressions>Inline Assembler Expressions</a> in the LLVM IR reference for an overview. I'm using named registers in the input list instead of moving things around in the ASM block, so that LLVM will handle the register allocation.</p><p>LLVM documentation says its ASM calls clobber registers <code>dirflag</code>, <code>fpsr</code>, and <code>flags</code> in addition to any registers clobbered by the kernel.</p><blog-code><pre>
@.message = internal constant [14 x i8] c"Hello, world!\0A"
define void @_start() {
%message_ptr = getelementptr [14 x i8], [14 x i8]* @.message , i64 0, i64 0
; write(STDOUT, message, message_len)
call i64 asm sideeffect "syscall",
"={rax},{rax},{rdi},{rsi},{rdx},~{rcx},~{r11},~{dirflag},~{fpsr},~{flags}"
( i64 1 ; {rax} SYSCALL_WRITE
, i64 1 ; {rdi} STDOUT
, i8* %message_ptr ; {rsi} message
, i64 14 ; {rdx} message_len
)
; exit(0)
call i64 asm sideeffect "syscall",
"={rax},{rax},{rdi},~{rcx},~{r11},~{dirflag},~{fpsr},~{flags}"
( i64 60 ; {rax} SYSCALL_EXIT
, i64 0 ; {rdi} exit_code
)
ret void
}</pre></blog-code><p>static linking</p><blog-code syntax=commands><pre>
llc -o hello.o hello.ll -filetype=obj
ld -m elf_x86_64 -o hello hello.o
file hello
# hello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped
./hello
# Hello, world!</pre></blog-code><p>dynamic linking</p><blog-code syntax=commands><pre>
llc -o hello.o hello.ll -filetype=obj -relocation-model=pic
ld -m elf_x86_64 -o hello hello.o \
# --dynamic-linker /lib64/ld-linux-x86-64.so.2 \
# -l:ld-linux-x86-64.so.2
file hello
# hello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, not stripped
./hello
# Hello, world!</pre></blog-code></blog-section><blog-section id=linux-x86-64-rust><h3 slot=title>Linux: x86-64 (Rust)</h3><p>See <a href="Inline assembly">Inline assembly</a> in the Rust reference for an overview. As in the LLVM IR example, I'm using named registers to let the compiler handle register allocation.</p><blog-code syntax=rust><pre>
#![no_std]
#![no_main]
const STDOUT: u64 = 1;
const SYSCALL_EXIT: u64 = 60;
const SYSCALL_WRITE: u64 = 1;
#[panic_handler]
fn panic(_info: &core::panic::PanicInfo) -> ! {
loop {}
}
#[no_mangle]
unsafe fn _start() {
let message: &str = "Hello, world!\n";
// write(STDOUT, message, message.len())
let mut _rc: i64;
core::arch::asm!(
"syscall",
in("rax") SYSCALL_WRITE,
in("rdi") STDOUT,
in("rsi") message.as_ptr(),
in("rdx") message.len(),
out("rcx") _,
out("r11") _,
lateout("rax") _rc,
);
// exit(0)
core::arch::asm!(
"syscall",
in("rax") SYSCALL_EXIT,
in("rdi") 0,
out("rcx") _,
out("r11") _,
);
}</pre></blog-code><p>static linking</p><blog-code syntax=commands><pre>
rustc --emit obj -O -C panic=abort -o hello.o hello.rs
ld -m elf_x86_64 -o hello hello.o
file hello
# hello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped
./hello
# Hello, world!</pre></blog-code><p>dynamic linking</p><blog-code syntax=commands><pre>
rustc --emit obj -O -C panic=abort -o hello.o hello.rs
ld -m elf_x86_64 -o hello hello.o \
# --dynamic-linker /lib64/ld-linux-x86-64.so.2 \
# -l:ld-linux-x86-64.so.2
file hello
# hello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, not stripped
./hello
# Hello, world!</pre></blog-code></blog-section></blog-section><blog-footnotes><hr><ol><li id=fn:1><p>LKML: <a href=https://lkml.org/lkml/2002/12/9/13>Intel P6 vs P7 system call performance</a> (Mike Hayward)</p></li><li id=fn:2><p>LWN: <a href=https://lwn.net/Articles/18411/>How to speed up system calls</a></p></li><li id=fn:3><p>manpage <a href=http://man7.org/linux/man-pages/man7/vdso.7.html>vdso(7)</a></p></li><li id=fn:4><p>manpage <a href=http://man7.org/linux/man-pages/man3/getauxval.3.html>getauxval(3)</a></p></li><li id=fn:5><p>See the <a href=https://wiki.osdev.org/System_V_ABI>System V ABI</a> for details.</p></li></ol></blog-footnotes></blog-article>2018-03-17T22:03:09ZSRE School: Health Checking2018-03-14T06:20:54Zurn:uuid:281a0a37-b497-4673-91aa-3eb7cdf118e2<blog-article posted=2018-03-14T06:20:54Z><h1 slot=title>SRE School: Health Checking</h1><div slot=summary><p>Any service that has complex logic or external dependencies might stop working for unexpected reasons. While instrumentation and monitoring can help bring these problems to human attention, it can be difficult to use dashboards or alerts for low-latency automated responses. A load balancer, for example, should respond to unhealthy backends on the order of seconds – long before any human can become aware of the problem.</p><p>Health checking is the process by which processes self-monitor for problems, report those problems to other parts of the service, and respond to other processes' unhealthiness in ways that mitigate overall service degredation.</p></div><blog-section><h2 slot=title>Reporting Problems</h2><p>Health checks are done not for a process's own benefit, but for the benefit of others. The first part of any health checking logic is the endpoint by which other processes poll it. This is essentially a miniature black-box monitoring system.</p><p>Health checks should almost always be performed over the same protocol that normal requests and responses will be handled by. If your HTTP server processes health checks in a separate thread pool or with a special low-dependency handler, then the risk of health checks reporting OK for an unhealthy process is significantly increased.</p><blog-section><h3 slot=title>HTTP</h3><p>Many distributed systems use HTTP as a transport protocol, so adding in a simple <code>/healthcheck</code> endpoint is popular. The semantics are usually "always respond <code>200 OK</code>", and upstream load balancers treat timeouts or other response codes as unhealthy. Repeated failed health checks cause the load balancer to stop sending requests to that backend.</p><p>A few changes to this basic model can improve the efficiency:</p><ul><li>Certain error codes can be special-cased to mean "stop sending requests immediately" – for example, <a href=https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/health_checking.html>Envoy treats 503 as a hard go-away</a>.</li><li>When using common ports like :80 or :443, an "expected name" might be attached to the request to identify which backend the load balancer expects to be talking to. When a different process is listening on that port at the moment, it will reject the health-check request and the load balancer will avoid sending it traffic for the other service.</li></ul></blog-section><blog-section><h3 slot=title>gRPC</h3><p>gRPC has a standardised and expanded version of the basic HTTP health check. It expects each port to respond to <code>/grpc.health.v1.Health/Check</code>, and allows requests to specify which <i>service name</i> they are for:</p><ul><li><a href=https://github.com/grpc/grpc/blob/master/doc/health-checking.md>grpc/doc/health-checking.md</a></li><li><a href=https://github.com/grpc/grpc/blob/master/src/proto/grpc/health/v1/health.proto>grpc/src/proto/grpc/health/v1/health.proto</a></li></ul><p>The handling of service names is important because each gRPC server can offer multiple gRPC services, each logically distinct and with its own health check logic. For example, an authorization server with separate "issue token" and "validate token" services might be temporarily unable to issue tokens, but could still validate any that were previously issued.</p></blog-section></blog-section><blog-section><h2 slot=title>Dependencies</h2><p>While a service <i>might</i> become unhealthy because of some internal problem, it's far more common for unhealthiness to be caused by dependencies on other components. A mail server might be unable to send email because it's getting <code>CONNECTION_REFUSED</code> from <code>smptd</code>, unable to show existing emails because the database machine is rebooting, or unable to do anything at all because a human has manually marked its local machine as bad.</p><p>Within a single process, health status is detected and propagated to relevant services via a dependency tree. Ideally, the codebase is structured so that depending on any external resource (a database, an RPC backend, a secret key installed by Puppet) requires going through the dependency framework.</p><blog-section><h3 slot=title>Interfaces</h3><blog-code syntax=go><pre>
type HealthChecker interface {
Metadata() *Metadata
Children() []HealthChecker
HealthCheck(context.Context, func(error))
}
type Metadata struct {
Name string
Description string
}
</pre></blog-code></blog-section><blog-section><h3 slot=title>Defining Dependencies</h3><blog-code syntax=go><pre>
type FileDependency struct {
Path string
}
var _ health.HealthChecker = (*FileDependency)(nil)
func (f *FileDependency) Metadata() *health.Metadata {
return &health.Metadata{
Name: fmt.Sprintf("local file: %s", f.Path),
}
}
func (f *FileDependency) Children() []health.HealthChecker { return nil }
func (f *FileDependency) HealthCheck(ctx context.Context, cb func(error)) {
ticker := time.Tick(time.Second)
for {
select {
case <-ticker:
fp, err := os.Open(f.Path)
if err == nil {
fp.Close()
}
cb(err)
case <-ctx.Done():
return
}
}
}
</pre></blog-code></blog-section><blog-section><h3 slot=title>Registration</h3><blog-code syntax=go><pre>
type motdImpl struct {
motdFile *FileDependency
}
func (i *motdImpl) Motd(ctx context.Context, req *pb.MotdRequest) (*pb.MotdResponse, error) {
motd, err := ioutil.ReadFile(i.motdFile.Path)
if err != nil {
return nil, err
}
return &pb.MotdResponse{Message: motd}, nil
}
func main() {
ctx := context.Background()
impl := &motdImpl{
motdFile: &FileDependency{
Path: "/etc/motd",
},
}
machineHealthyFile := &FileDependency{
Path: "/etc/machine-healthy",
}
srv := grpc.NewServer()
pb.RegisterMotdServer(srv, impl)
healthSrv := health.NewHealthServer()
grpc_health_v1.RegisterHealthServer(srv, healthSrv)
healthSrv.Register(impl.motdFile, health.ServiceName("com.example.Motd"))
healthSrv.Register(machineHealthyFile)
// waits for dependencies to become healthy
healthSrv.Start(ctx)
address := "127.0.0.1:1234"
socket, err := net.Listen("tcp", address)
if err != nil {
log.Fatalf("net.Listen(%q): %v", address, err)
}
srv.Serve(socket)
}
</pre></blog-code></blog-section></blog-section><blog-section><h2 slot=title>Server Startup</h2><p>Server startup should block until dependencies have become healthy, so that service implementation code doesn't have to deal with "half-open" dependencies (unless explicitly written to do so). Since dependencies can take a few seconds to initialize, starting them in parallel also helps reduce overall startup time.</p><p>Not all dependencies should block server startup, and some should <i>only</i> block startup but not otherwise affect health checking. The levels are:</p><ul><li><i>Hard dependencies</i> block startup until the dependency is healthy, and the service (or entire process) becomes unhealthy if the dependency is unhealthy. Examples might include the main database server, a proxy for outgoing connections, or disk space for critical logs.</li><li><i>Startup dependencies</i> block startup, but once loaded don't need to be re-checked. Examples include a per-service private key, a large file loaded from local disk, or configuration data stored remotely.</li><li><i>Optional dependencies</i> do not block startup, but do propagate health status to services that depend on them. This is useful when a single process is providing many services, and there's no problem with only accepting traffic for some of them.</li></ul></blog-section></blog-article>2018-03-14T06:20:54ZReddit Front Page (2018)2018-03-11T04:34:36Zurn:uuid:205b5db1-a747-4261-ae88-7fb92c64dbc4<blog-article posted=2018-03-11T04:34:36Z><h1 slot=title>Reddit Front Page (2018)</h1><p slot=summary>Over the past few months I've noticed I get a lot less enjoyment out of browsing Reddit. There wasn't any clear reason, just a general feeling that every hour I spent there was an hour wasted. It didn't use to be that way, I think – while there were always some forums there filled with noise, it <i>also</i> contained fresh analyses and insightful commentary and regularly surfaced them to the front page (/r/all).</p><blog-section><h2 slot=title>Filtering With RES</h2><p>My first attempt to fix things was installing <a href=https://chrome.google.com/webstore/detail/reddit-enhancement-suite/kbmfpngjjgdllneeigpgjifpgocmfgmb>Reddit Enhancement Suite</a>, a Chrome extension that implements (among other things) the ability to hide subreddits from view. When I noticed that particular noisy subreddits were taking up too much of the page, I added them to my RES blacklist. Unfortunately, RES is client-side and can't easily request more links on heavily filtered pages. I noticed that the front page would sometimes be <i>empty</i>, because every link on it came from a subreddit that I didn't want to see.</p><p>Next I tried using Reddit's built-in subreddit filtering, a feature <a href=https://www.reddit.com/r/redditmobile/comments/5frpbf/you_can_now_filter_rall_on_a_desktop_browser_and/>added in November 2016</a> to handle just this use case. After quickly hitting their limit of 99 hidden subreddits, I moved things around between their filter and RES to optimize the number of filtered links per page view. Reddit would block the most popular of the noisy subreddits, and RES could take the long tail.</p></blog-section><blog-section><h2 slot=title>Collecting More Data</h2><p>Filtering didn't seem to be very effective, and I was still regularly seeing pages containing only noise, or nothing at all. Was the problem lack of data? I wrote a quick-n-dirty<blog-footnote-ref>[<a href=#fn:1>1</a>]</blog-footnote-ref> scraper that would hit <a href=https://github.com/reddit-archive/reddit/wiki/API>Reddit's API</a>, saving the current top 2000 posts to JSON files for analysis:</p><blog-code syntax=python><pre>
import datetime
import json
import os.path
import time
import urllib
import urllib2
timestamp = datetime.datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%SZ")
os.mkdir(timestamp)
after = None
for request_num in range(20):
out_filename = os.path.join(timestamp, "%02d.json" % (request_num + 1,))
print "[%s]" % (out_filename)
params = {"limit": "100"}
if after is not None:
params["after"] = after
req = urllib2.Request(
url = "https://reddit.com/r/all/.json?" + urllib.urlencode(params),
headers = {
# https://github.com/reddit-archive/reddit/wiki/API#rules
"user-agent": "darwin:com.john-millikin.redditpopularity:v1 (by /u/jmillikin)",
},
)
response_fp = urllib2.urlopen(req)
response = json.load(response_fp)
with open(out_filename, "wb") as fp:
json.dump(response, fp, indent=2)
time.sleep(2)
after = response["data"]["children"][-1]["data"]["name"]
</pre></blog-code><p>Then I extracted the most interesting fields into an SQLite database for easier querying:</p><blog-code syntax=python><pre>
import glob
import json
import sqlite3
import sys
# https://www.sqlite.org/datatype3.html
db = sqlite3.connect(sys.argv[1].strip("/") + ".sqlite")
db.execute("""
CREATE TABLE posts (
name text,
created_utc integer,
subreddit text,
score integer,
num_comments integer,
domain text,
title text,
url text
);
""")
for filename in glob.glob(sys.argv[1] + "/*.json"):
with open(filename, "rb") as fp:
response = json.load(fp)
for list_item in response["data"]["children"]:
post = list_item["data"]
db_row = [
post["name"],
int(post["created_utc"]),
post["subreddit"],
int(post["score"]),
int(post["num_comments"]),
post["domain"],
post["title"],
post["url"],
]
insert_sql = "INSERT INTO posts VALUES (%s)" % (", ".join("?" for _ in db_row),)
db.execute(insert_sql, db_row)
db.commit()
db.close()
</pre></blog-code></blog-section><blog-section><h2 slot=title>Analysis</h2><blog-section><h3 slot=title>By Subreddit</h3><p>OK, we've got a snapshot of the top 2000 posts and can refresh it at will. Which subreddits should I filter out server-side to minimize noise on /r/all?</p><blog-code><pre>
sqlite> SELECT COUNT(DISTINCT subreddit) FROM posts;
1608
sqlite> .mode column
sqlite> .headers on
sqlite> .width 20 10
sqlite> SELECT subreddit, COUNT(*) AS count FROM posts
...> GROUP BY subreddit ORDER BY count DESC, subreddit
...> LIMIT 20;
subreddit count
-------------------- ----------
aww 5
funny 5
gaming 5
gifs 5
pics 5
politics 5
AskReddit 4
BlackPeopleTwitter 4
CrappyDesign 4
FortNiteBR 4
PrequelMemes 4
Rainbow6 4
dankmemes 4
leagueoflegends 4
memes 4
nba 4
oddlysatisfying 4
soccer 4
todayilearned 4
trees 4
</pre></blog-code><p>This result was pretty surprising to me. I had expected to see <a href=https://en.wikipedia.org/wiki/Power_law>power law</a> numbers, with "default" subreddits like /r/funny having an order of magnitude more posts on /r/all than the average. But it looks like the front-page algorithm optimizes for maximum subreddit variety, featuring over 1600 unique subreddits within the top 2000 posts. With a limit of 99 subreddits in the server-side filter, there's just no practical way to hide noise based on subreddit name.</p></blog-section><blog-section><h3 slot=title>By Domain</h3><p>Here's where that power law showed up. Take a look at which domains the top 2000 posts are pointing at:</p><blog-code><pre>
sqlite> SELECT domain, COUNT(*) AS count FROM posts
...> GROUP BY domain ORDER BY count DESC, domain
...> LIMIT 20;
domain count
-------------------- ----------
i.redd.it 925
i.imgur.com 344
gfycat.com 137
imgur.com 103
v.redd.it 56
twitter.com 32
youtube.com 29
reddit.com 8
streamable.com 8
youtu.be 8
cdna.artstation.com 7
cdnb.artstation.com 4
clips.twitch.tv 4
media.giphy.com 4
self.AskReddit 4
self.leagueoflegends 4
streamja.com 4
78.media.tumblr.com 3
cdn.discordapp.com 3
inquisitr.com 3
</pre></blog-code><p>1565 images! Out of the top 2000 posts on the world's biggest internet forum, over 75% of them are just pictures<blog-footnote-ref>[<a href=#fn:2>2</a>]</blog-footnote-ref>!</p><figure style=text-align:center><figcaption>/r/all posts per domain as of 2018-03-11 00:40:30 UTC</figcaption><echarts-chart style=height:400px>{
"animation": false,
"tooltip": {
"trigger": "item",
"formatter": "{b}: {c}"
},
"xAxis": {
"type": "category",
"axisLabel": {
"rotate": -30,
"margin": 15
},
"data": [
"i.redd.it",
"i.imgur.com",
"gfycat.com",
"imgur.com",
"v.redd.it",
"youtube.com",
"twitter.com",
"cdn*.artstation.com",
"reddit.com",
"streamable.com"
]
},
"yAxis": {
"type": "value"
},
"series": [{
"type": "bar",
"data": [
925,
344,
137,
103,
56,
{
"value": 37,
"itemStyle": { "color": "#2f4554" }
},
{
"value": 32,
"itemStyle": { "color": "#2f4554" }
},
11,
{
"value": 8,
"itemStyle": { "color": "#2f4554" }
},
{
"value": 8,
"itemStyle": { "color": "#2f4554" }
}
]
}]
}</data-chart></figure><p>And the dropoff is incredible -- the #10 domain on /r/all has 0.04% of the posts!</p></blog-section></blog-section><blog-section><h2 slot=title>Filtering</h2><blog-section><h3 slot=title>Without Images</h3><p>What happens if we kick out all the image hosts?</p><blog-code><pre>
sqlite> SELECT COUNT(*) FROM posts;
2000
sqlite> DELETE FROM posts WHERE url LIKE "%.jpg" OR url LIKE "%.gif" OR domain IN ('i.redd.it', 'i.imgur.com', 'gfycat.com', 'giant.gfycat.com', 'imgur.com', 'v.redd.it', 'm.imgur.com', 'i.gyazo.com', 'cdna.artstation.com', 'cdnb.artstation.com', 'flickr.com') OR domain LIKE "%.media.tumblr.com";
sqlite> SELECT COUNT(*) FROM posts;
388
sqlite> SELECT domain, COUNT(*) AS count FROM posts GROUP BY domain ORDER BY count DESC, domain LIMIT 30;
domain count
------------------------------ ----------
twitter.com 32
youtube.com 29
reddit.com 8
streamable.com 8
youtu.be 8
clips.twitch.tv 4
self.AskReddit 4
self.leagueoflegends 4
streamja.com 4
inquisitr.com 3
nytimes.com 3
self.Jokes 3
self.Showerthoughts 3
thehill.com 3
variety.com 3
businessinsider.com 2
dailycaller.com 2
dailymail.co.uk 2
en.wikipedia.org 2
epicgames.com 2
newsweek.com 2
rawstory.com 2
salon.com 2
self.AskOuija 2
self.Brawlstars 2
self.CFB 2
self.Competitiveoverwatch 2
self.DestinyTheGame 2
self.LifeProTips 2
self.WritingPrompts 2
</pre></blog-code><p>We have less than a quarter of the original dataset, but the link quality is higher. I see some newspapers in the list now, and the two most popular domains together are only 15% of the links.</p></blog-section><blog-section><h3 slot=title>Without Self-Posts</h3><p>Many of the remaining high-scoring posts are "self-posts", text posted directly to Reddit by users — basically a comment. Lets look more closely to see if they might be interesting:</p><blog-code><pre>
sqlite> SELECT score, subreddit, title FROM posts WHERE domain LIKE "self.%" LIMIT 20;
score subreddit title
-------------------- -------------------- ----------------------------------------------------------------------------------------------------
47354 Showerthoughts Being a blacksmith must have been a real pantydropper back in the day seeing how Smith is the most c
6008 atheism “Religion is what keeps the poor from murdering the rich.” ―Napoleon Bonaparte
23625 AskReddit What should people stop buying?
6289 garlicoin If this post gets 20,000 upvotes, I will give 5 random commenters 1000 GRLC.
15396 Jokes A priest and a rabbi were sitting next to each other on an airplane.
1919 leagueoflegends MLG has wiped their entire LoL archive channel. This means many important pre-LCS VODS no longer exi
5852 WritingPrompts [WP] One evening, a portal to hell opens at the foot of your bed. A demon strides through, rips off
2671 CrazyIdeas I'm starting a charity to raise awareness of pyramid schemes. Donate $100 to register as a fundraise
2087 ireland IRELAND ARE 6 NATIONS CHAMPIONS UPVOTE PARTY!!!!
5334 askscience Am I using muscles to keep my eyelids open or to keep them closed or both?
5320 confession I got married tonight and it was the worst, most stressful day of my life.
1773 nintendo Happy March 10th aka MAR10 aka Mario Day!
2924 personalfinance A “subscription” box charged me for 4 of their $107 boxes without my consent and won’t refund
1125 IAmA [AMA REQUEST] A Designer For Expensive Brands Like Gucci, Louis Vuitton, Etc
7414 dadjokes My teenage daughter came home from school and she was blazing mad. “We had sex education today, da
3886 AskReddit What is something everyone knows, but no one wants to admit?
1412 YouShouldKnow YSK that by looking up "3.11" on yahoo.co.jp, 10 cents will be donated to the East Japan Earthquake
493 Competitiveoverwatch Uber: "You know what Hydration is called in the sky, Matt?"
847 leagueoflegends Clutch Gaming vs. Echo Fox / NA LCS 2018 Spring - Week 8 / Post-Match Discussion
494 canada CBC reporting Doug Ford has won PC Leadership in Ontario by the slimmest of margins. Christine Ellio
</pre></blog-code><p>Not really. There are two good links here (awareness of a disaster-relief charity and a breaking political story), but it's 90% noise. Lets delete them for now, and consider re-adding with strict filtering in the future.</p><blog-code><pre>
sqlite> DELETE FROM posts WHERE domain LIKE "self.%";
sqlite> SELECT COUNT(*) FROM posts;
223
</pre></blog-code></blog-section><blog-section><h3 slot=title>Without Sports or Video Games</h3><p>I'm assuming that anybody who cares enough about disc golf (etc) to click its posts is already subscribed directly. Lets delete posts from any subreddits that are obviously for a specific game (physical or virtual). In theory, Reddit could support this directly in their server by a simple tagging system for subreddits.</p><blog-code><pre>
sqlite> DELETE FROM posts WHERE subreddit IN ('49ers', 'Artifact', 'Barca', 'BattleRite', 'CollegeBasketball', 'Destiny', 'DetroitRedWings', 'DotA2', 'FantasyPL', 'FortNiteBR', 'GlobalOffensive', 'GreenBayPackers', 'LiverpoolFC', 'LonghornNation', 'MkeBucks', 'NUFC', 'NYYankees', 'NintendoSwitch', 'PS4', 'SquaredCircle', 'Steam', 'StreetFighter', 'aoe2', 'baseball', 'canucks', 'chelseafc', 'civbattleroyale', 'cowboys', 'detroitlions', 'discgolf', 'eagles', 'gamernews', 'hockey', 'lakers', 'minnesotatwins', 'nba', 'osugame', 'reddevils', 'smashbros', 'soccer', 'speedrun', 'sports', 'starcraft', 'thelastofus', 'torontoraptors', 'xboxone');
sqlite> SELECT COUNT(*) FROM posts;
169
</pre></blog-code></blog-section></blog-section><blog-section><h2 slot=title>Ranking</h2><blog-section><h3 slot=title>Reddit's Default Ranking</h3><p>Here's what the front page would look like, using the above filters with Reddit's current ranking algorithm:</p><blog-code><pre>
sqlite> .width 7 12 30 22 120
sqlite> SELECT score, num_comments, domain, subreddit, title FROM posts LIMIT 25;
score num_comments domain subreddit title
------- ------------ ------------------------------ ---------------------- --------------------------------------------------------------------------------------------------------------------
23083 1466 ultimateclassicrock.com Music 40 year old rock station in Chicago replaced by Christian radio at midnight last night. Signed off with Motley Crue’s
36976 1491 youtube.com todayilearned TIL that before the Super Bowl XLI Halftime Show, the show coordinator asked Prince if he'd be alright performing in the
8255 286 businessinsider.com Futurology SpaceX rocket launches are getting boring — and that's an incredible success story for Elon Musk: “His aim: dramatic
24711 791 inquisitr.com technology Senate Bill Meant To Punish Equifax Might Actually Reward It: Thanks to last-minute changes in legislation designed to d
11283 772 cbsnews.com politics 80 percent of mass shooters showed no interest in video games, researcher says
4621 278 haaretz.com worldnews 'Caved to religious pressure': Israeli army takes down viral Women's Day video empowering female soldiers
12958 599 fox13news.com FloridaMan Florida woman jailed for 5 months because of a failed field drug test. The lab test took 7 months to come back, revealin
40663 1571 usatoday.com books Banning literature in prisons perpetuates a system that ignores inmate humanity
26701 343 dailymail.co.uk UpliftingNews Cute video shows no-kill shelter putting old chairs to good use by letting rescue dogs curl up on them in their cages
59591 3772 seattletimes.com news Costco says extra profit from tax cuts will be shared with employees
6080 285 bellinghamherald.com nottheonion A man found 54 human hands in the snow. Russia says they’re probably just trash
20262 903 indiewire.com television Bill Hader’s ‘Massive Panic Attacks’ on ‘SNL’ Inspired His New HBO Series, ‘Barry’
2097 148 scontent-lht6-1.xx.fbcdn.net batman And that's how you end the greatest live action superhero film of all time.
4092 75 web.archive.org savedyouaclick Scientists warn of mysterious and deadly new epidemic called Disease X that could kill millions around the world | "Dise
2595 69 twitter.com TrumpCriticizesTrump "I told Rex Tillerson, our wonderful Secretary of State, that he is wasting his time trying to negotiate with Little Roc
4403 314 youtube.com videos He is not using auto tune but a form of yodeling.
3885 88 youtu.be youtubehaiku [Poetry] Rejected Theme Song from READY PLAYER ONE
6152 237 inquisitr.com AgainstHateSubreddits Reddit’s Financial Ties To Jared Kushner’s Family Under Scrutiny Amid Inaction Against The_Donald Hate Speech
44691 880 aero.umd.edu science Scientists create nanowood, a new material that is as insulating as Styrofoam but lighter and 30 times stronger, doesn?
4252 364 nydailynews.com politics FedEx won't ship items like stamps, coins or ashes — but they'll ship guns at a discount
2137 131 heroichollywood.com Marvel Marvel's 'Black Panther' Joins The $1 Billion Box Office Club
6039 679 space.com space Trump Praises Commercial Space Industry at Cabinet Meeting
2277 535 clips.twitch.tv LivestreamFail OWL referee or should I say "no fun police". DED game btw
1891 119 wect.com offbeat Cop who lied to Uber driver about it being "illegal to film police" gets reinstated, abruptly retires the next day.
2827 75 salon.com esist Is Donald Trump a cult leader? Expert says he “fits the stereotypical profile”
</pre></blog-code><p>This is better than we started with, but even after all that bulk deletion we still have to contend with noise like /r/savedyouaclick (posting clickbait on purpose), /r/LivestreamFail (people I don't know doing things I will never care about), and /r/youtubehaiku (<i>America's Funniest Home Videos</i> for snake people).</p></blog-section><blog-section><h3 slot=title>By Score</h3><p>What if we rank directly on voted score?</p><blog-code><pre>
sqlite> SELECT score, num_comments, domain, subreddit, title FROM posts ORDER BY score DESC LIMIT 25;
score num_comments domain subreddit title
------- ------------ ------------------------------ ---------------------- --------------------------------------------------------------------------------------------------------------------
59591 3772 seattletimes.com news Costco says extra profit from tax cuts will be shared with employees
44691 880 aero.umd.edu science Scientists create nanowood, a new material that is as insulating as Styrofoam but lighter and 30 times stronger, doesn?
40663 1571 usatoday.com books Banning literature in prisons perpetuates a system that ignores inmate humanity
36976 1491 youtube.com todayilearned TIL that before the Super Bowl XLI Halftime Show, the show coordinator asked Prince if he'd be alright performing in the
30939 1845 youtube.com videos It's the weekend and you know what that means
26701 343 dailymail.co.uk UpliftingNews Cute video shows no-kill shelter putting old chairs to good use by letting rescue dogs curl up on them in their cages
26312 2873 jpost.com politics Putin: Jews might have been behind U.S. election interference
24711 791 inquisitr.com technology Senate Bill Meant To Punish Equifax Might Actually Reward It: Thanks to last-minute changes in legislation designed to d
23083 1466 ultimateclassicrock.com Music 40 year old rock station in Chicago replaced by Christian radio at midnight last night. Signed off with Motley Crue’s
20262 903 indiewire.com television Bill Hader’s ‘Massive Panic Attacks’ on ‘SNL’ Inspired His New HBO Series, ‘Barry’
12958 599 fox13news.com FloridaMan Florida woman jailed for 5 months because of a failed field drug test. The lab test took 7 months to come back, revealin
11932 1418 timesofisrael.com worldnews Putin suggests ‘Jews with Russian citizenship’ behind US election interference
11283 772 cbsnews.com politics 80 percent of mass shooters showed no interest in video games, researcher says
8255 286 businessinsider.com Futurology SpaceX rocket launches are getting boring — and that's an incredible success story for Elon Musk: “His aim: dramatic
7490 80 np.reddit.com bestof Redditor mentions psychiatrist Dr. Tyler Black in a thread about gamer psychology and violence, Dr. Tyler Black shows up
6657 227 en.wikipedia.org todayilearned TIL of Major Digby Tatham-Warter, a British major who brought an umbrella into battle, using it to stop an armoured vehi
6341 740 youtu.be videos Girl goes on Dr. Phil and says she is pregnant with baby Jesus. Ultrasound reveals she is literally full of shit.
6152 237 inquisitr.com AgainstHateSubreddits Reddit’s Financial Ties To Jared Kushner’s Family Under Scrutiny Amid Inaction Against The_Donald Hate Speech
6080 285 bellinghamherald.com nottheonion A man found 54 human hands in the snow. Russia says they’re probably just trash
6039 679 space.com space Trump Praises Commercial Space Industry at Cabinet Meeting
5254 784 scmp.com worldnews Putin said he “couldn’t care less” if fellow Russian citizens sought to meddle in US election, insisting such effo
4790 176 jaha.ahajournals.org science top cardiologists have better patient outcomes when they are away. Study of patient outcomes during Transcatheter Cardio
4621 278 haaretz.com worldnews 'Caved to religious pressure': Israeli army takes down viral Women's Day video empowering female soldiers
4403 314 youtube.com videos He is not using auto tune but a form of yodeling.
4252 364 nydailynews.com politics FedEx won't ship items like stamps, coins or ashes — but they'll ship guns at a discount
</pre></blog-code><p>This … is good! I would enjoy reading a Reddit front page that looked like this.</p></blog-section></blog-section><blog-section><h2 slot=title>Conclusions</h2><p>Reddit's server-side filtering options are not currently useful for /r/all, because their ranking algorithm intentionally optimizes for a small number of posts from many subreddits, but their filter has a hard capacity limit of 99 subreddits. Their filtering could be made much more effective by offering the ability to hide unwanted domains, hide self posts, and hide certain broad categories of subreddit that users can easily self-recognize (e.g. "video games", "sports", or "livestreamers").</p></blog-section><blog-section><h2 slot=title>Other Findings</h2><blog-section><h3 slot=title>/r/The_Donald</h3><p>One interesting note is that of the top 2000 posts on /r/all, none of them came from controversial right-wing political subreddit /r/The_Donald. This appears to be intentional: at the time of writing /r/The_Donald has several recent posts with scores in the 2000-8000 range, which is far above the minimum scores seen on /r/all:</p><blog-code><pre>
sqlite> SELECT score, num_comments, domain, subreddit, title FROM posts ORDER BY score ASC LIMIT 10;
score num_comments domain subreddit title
------- ------------ ------------------------------ ---------------------- --------------------------------------------------------------------------------------------------------------------
55 15 pcper.com hardware AMD FreeSync 2 for Xbox One S and Xbox One X
57 0 nytimes.com netneutrality Washington Governor Signs First State Net Neutrality Bill
63 5 oregonlive.com oregon Bend woman gets 21 years for drugging kids so she could go tanning, do CrossFit
70 4 bitcoinafrica.io BasicIncome Universal Basic Income Experiment Launches in Kenya and Uganda Partly Funded by Bitcoin
71 40 youtube.com SugarPine7 Sexy nightmare.
75 41 reddit.com Drama /u/GallowBoob outs his sockpuppet as he justifies his pedophilia.
77 1 vstinner.github.io Python How Victor Stinner fixed a very old GIL race condition in Python 3.7
79 12 dailymail.co.uk EnoughLibertarianSpam Pro-gun poster girl is shot in the back by her four-year-old son
97 81 youtube.com eurovision It's Benjamin Ingrosso with "Dance You Off" for Sweden!
98 228 baytoday.ca CanadaPolitics Doug Ford wins PC leadership race in close vote
</pre></blog-code><p>I am personally happy about this because I find candidate-specific political forums very noisy, but it's not clear how to reconcile this behavior with the Reddit administration's public claims of content neutrality.</p></blog-section></blog-section><blog-footnotes slot=footnotes><hr><ol><li id=fn:1><p>My first attempt used <a href=https://pypi.python.org/pypi/praw>praw</a>, but it requires a registered client ID and I didn't want to go through that just to get a few MB of JSON.</p></li><li id=fn:2><p>You might object that pictures can be uesful, but those aren't the kind Reddit upvotes. Currently #1 on the site, judged by the users to be more interesting than any other post, is a <a href=https://www.reddit.com/r/funny/comments/83iv2w/this_is_one_of_my_favorite_scenes_from_how_its/>gif of a falling tree plus fake captions</a>.</p></li></ol></blog-footnotes></blog-article>2018-03-11T04:34:36ZRe:Creators Episode 212018-03-10T21:24:27Zurn:uuid:7c69ff9b-1f00-4e60-8432-03ba734e8a51<style>img{max-width:100%}figure{margin:1em}figcaption{font-weight:700;margin:0 0 1em}</style><blog-article posted=2018-03-10T21:24:27Z><h1 slot=title>Re:Creators Episode 21</h1><p>Episode 21 of <a href=https://en.wikipedia.org/wiki/Re:Creators>Re:Creators</a> ends with a rather nicely typeset message to the viwer, in Japanese and Latin:</p><img src=/🤔/recreators-episode-21/21m23s.png alt="Re:Creators episode 21, at 21m 23s"><blockquote>Mundum divit factum, atque pulchre.<br>世界は豊かに、そして美しく</blockquote><p>That's some unusual Latin. I wonder how they translated it? Can I do better?</p><p>Google Translate<blog-footnote-ref>[<a href=#fn:1>1</a>]</blog-footnote-ref> and Yandex Translate both support Latin output, but their results don't match the screencap. Bing doesn't support Latin at the time of writing<blog-footnote-ref>[<a href=#fn:2>2</a>]</blog-footnote-ref>.</p><div style=display:flex><figure style=flex:50%><figcaption>Google Translate</figcaption><img src=/🤔/recreators-episode-21/google-translate.png alt="Google Translate: Dives est mundus et pulchra"></figure><figure style=flex:50%><figcaption>Yandex Translate</figcaption><img src=/🤔/recreators-episode-21/yandex-translate.png alt="Yandex Translate: Mundus est dives, et pulchra"></figure></div><p>Note that Google and Yandex have nearly identical outputs after accounting for Latin's order-independent grammar, and it's a reasonable-looking solution.</p><p>Lets look more directly at the vocabulary being used:</p><ul><li><i>Mundus</i> / <i>mundum</i> is a direct translation of「<a href=http://jisho.org/word/世界>世界</a>」. Using a <a href=http://latindictionary.wikidot.com/noun:mundus>Latin dictionary</a> as reference, we see that <i>mundus</i> is in the <a href=https://en.wikipedia.org/wiki/Nominative_case>nominative case</a>, and <i>mundum</i> is in the <a href=https://en.wikipedia.org/wiki/Accusative_case>accusative case</a>. In English, we use word order to make this distinction – "The mundus is …", "… around the mundum". In「世界は」, the joshi「は」has semantics similar to English's "Regarding the …". So we should use the nominative case: <i>mundus</i>.</li><li><a href=http://latindictionary.wikidot.com/adjective:dives><i>dives</i></a> is an adjective, being used here as a translation for「<a href=http://jisho.org/word/豊か>豊か</a>」. <i>divit</i> isn't a Latin word, or at least not one I can find in any dictionary. Are we done with this part? Not quite – <i>dives</i> means wealthy<blog-footnote-ref>[<a href=#fn:3>3</a>]</blog-footnote-ref>, but「豊か」has the slightly different meaning of plentiful, abundant, or bountiful. Wiktionary suggests <a href=https://en.wiktionary.org/wiki/copia#Latin><i>copia</i></a> or <a href=https://en.wiktionary.org/wiki/abundantia#Latin><i>abundantia</i></a> would be more fitting.</li><li><a href=http://latindictionary.wikidot.com/conjunction:atque><i>atque</i></a> is one of the Latin equivalents for <i>and</i>. This is our clue that it's being used as a translation for the conjunction「<a href=http://jisho.org/word/然して>そして</a>」. It's not quite a good fit though – Lewis & Short's <i>A Latin Dictionary</i> notes<blog-footnote-ref>[<a href=#fn:4>4</a>]</blog-footnote-ref> that <i>atque</i> is used before words starting with vowels, and <i>ac</i> is the form before consonants. The examples also make it clear that <i>atque</i> is a much tighter binding than「そして」, which is usually translated as "and thus" or "and therefore". Perhaps <i>atque</i> would be a more like「<a href=http://jisho.org/word/と>と</a>」? We'll come back to this in a moment.</li><li><a href=http://latindictionary.wikidot.com/adjective:pulcher><i>pulcher</i></a> is another adjective, and an exact match for「<a href=http://jisho.org/word/美しい>美しい</a>」. Both translations use the <a href=https://en.wikipedia.org/wiki/Vocative_case>vocative case</a>, with <i>pulchre</i> being the masculine form and <i>pulchra</i> the feminine. I don't have high confidence that either of these is correct – vocative is an odd choice because we're not talking to the world itself. Lets use the accusative: <i>pulchrum</i>.</li></ul><p>We now have enough to attempt a literal translation:</p><blockquote>世界は豊かに、そして美しく<br>Mundus est copiosum ac pulchrum.<br>The world is bountiful and beautiful.</blockquote><p>But we can do better! The original text appears to be using grammatical forms from <a href=https://en.wikipedia.org/wiki/Classical_Japanese_language>Classical Japanese</a>. To a native reader it would seem slightly poetic or literary, the feeling of which we can reproduce in Latin and English by adjusting the vocabulary and word order. Amazon's English subtitles translated this as "The world is full of abundance and beauty". As a native speaker I don't know how to name the "X is Y" -> "X is full of Y" pattern, but it does seem to add a certain poetic feeling.</p><p>First, lets review the use of <i>atque</i> / <i>ac</i>. To me, those words seem more suited for "I went to the store to buy eggs ac milk". Latin has a <a href=https://en.wiktionary.org/wiki/-que><i>‑que</i></a> suffix, which is a conjunctive that appended to words to imply they go together and are somehow related. The world's beauty is beacause it is abundant, so <i>‑que</i> may be a good fit here.</p><p>To convert the adjective <i>pulcher</i> into a noun, we need to add the <a href=https://en.wiktionary.org/wiki/-tudo#Latin><i>‑tudo</i></a> suffix – more specifically, the accusative case <i>‑tudinem</i>. While we're at it, lets use <i>abundantia</i> (accusative: <i>abundantiam</i>) instead of <i>copia</i> to match Amazon.</p><p>Putting these adjustments together, we arrive at this translation:</p><blockquote>世界は豊かに、そして美しく<br>Mundus abundantiam plenus est pulchritudinemque.<br>The world is full of abundance and beauty.</blockquote><p>I don't like how long this Latin is. Romans valued brevity, so lets step back a bit toward our first translation. By using the verb <a href=https://en.wiktionary.org/wiki/abundo#Latin><i>abundat</i></a> and our adjective <i>pulchrum</i> we can trim out almost half of it. I'm using <i>et</i> for this one instead of <i>‑que</i> to resemble Horace's <a href=https://en.wikipedia.org/wiki/Dulce_et_decorum_est_pro_patria_mori>"<i>dulce et decorum est</i> …"</a>
<blog-footnote-ref>[<a href=#fn:5>5</a>]</blog-footnote-ref>, and moving <i>mundus est</i> to the end:</p><blockquote>世界は豊かに、そして美しく<br>Abundat et pulchrum mundus est.<br>The world is abundant and beautiful.</blockquote><p>This looks reasonable. I'm content with it.</p><p>If you've somehow made it to the end of this page and thought "I want to read another eight hundred pages of this", find a copy of <a href=https://en.wikipedia.org/wiki/Le_Ton_beau_de_Marot>Le Ton beau de Marot</a> by Hofstadter.</p><blog-footnotes><hr><ol><li id=fn:1><p><a href=https://latin.stackexchange.com/questions/4349/what-is-google-translate-good-for>Google Translate does struggle with Latin</a>, but can usually get the gist across.</p></li><li id=fn:2><p>Bing does support Klingon! Any trekkies around who can check their work?</p><img src=/🤔/recreators-episode-21/bing-klingon.png alt="Bing Translate: Japanese to Klingon"></li><li id=fn:3><p>The word <i>dives</i> is familiar to many English speakers via <a href=https://en.wikipedia.org/wiki/Rich_man_and_Lazarus>The Parable of Dives and Lazarus</a>.</p></li><li id=fn:4><p><a href="http://www.perseus.tufts.edu/hopper/text?doc=Perseus:text:1999.04.0059:entry=atque">http://www.perseus.tufts.edu/hopper/text?doc=Perseus:text:1999.04.0059:entry=atque</a></p></li><li id=fn:5><p>This may seem strange given what I said about <i>ac</i> earlier. A native Roman speaker of Latin would probably have considered them equivalent, but in modern times the fame of Horace makes <i>et</i> seem a bit fancier.</p></li></ol></blog-footnotes></blog-article>2018-03-10T21:24:27ZSRE School: Instrumentation2018-03-03T18:52:24Zurn:uuid:bb0fbcbc-7a4f-4347-8e71-8d48e4e5ea91<blog-article posted=2018-03-03T18:52:24Z><h1 slot=title>SRE School: Instrumentation</h1><p slot=summary>Instrumentation is the foundation of a monitoring infrastructure. It is the part that directly touches the system(s) being monitored, the source of raw data for our collectors and analyzers and dashboards. It is also the only part that is not under an SRE team's direct control – instrumentation is usually plumbed through the codebase by product teams. Given this, an SRE's primary source of leverage is to make adding instrumentation as easy and painless as possible. We do this by writing instrumentation libraries with friendly, approachable, idiomatic APIs.</p><blog-section><h2 slot=title>Metrics</h2><p>Each measurable property of the system is a <i>metric</i>. Repeated measurements of a metric's value yield a <a href=https://en.wikipedia.org/wiki/Time_series>time series</a> of <i>data samples</i>. A metric's definition includes metadata about how to collect, aggregate, and interpret its samples.</p><p>Metric values can in theory be of any serializable data type, but in practice they are numbers, text, or distributions:</p><ul><li>Numeric metrics may have an associated unit, ideally in a machine-readable annotation. This is most important for metrics where the "natural" definition of a unit is divisible, e.g. to record time intervals as an integral amount of milliseconds instead of a fractional amount of seconds.</li><li>Text metrics are most often constants, but are sometimes used for gauges if there's a small number of possible values.</li><li>Distributions are used for metrics with a very large set of possible values. They are usually visualized as a histogram or heat map.</li></ul><p>A C-style enumeration such as <code>enum { OPT_FOO = 1; OPT_BAR = 2; }</code> is best reported as <code>"OPT_FOO"</code> and <code>"OPT_BAR"</code><blog-footnote-ref>[<a href=#fn:1>1</a>]</blog-footnote-ref> instead of numeric <code>1</code> and <code>2</code>.</p><p>Booleans can be thought of as the enum <code>{ FALSE, TRUE }</code>. Some monitoring systems give them a separate type to simplify query planning and analysis.</p><p>Metrics can be defined ad-hoc at point of emission, or statically in some global type. I prefer statically declared metrics because that gives the opportunity to attach <a href=#metric-metadata>metric metadata</a>.</p><p>There are four common categories of metrics: constants, gauges, counters, and distributions<blog-footnote-ref>[<a href=#fn:2>2</a>]</blog-footnote-ref>.</p><blog-section><h3 slot=title>Constants</h3><p>A metric that does not change for the lifetime of its associated system component. Samples of a constant metric will always contain the same value. Common examples are build information (e.g. git commit ID), process start time, and process ID. Don't use constants for things that are only constant-ish, such as hostnames.</p><p>Constants can be text or numbers. For numbers, integers usually work better than floats (e.g. represent your start time as <code>int64 milliseconds</code> instead of <code>float64 seconds</code>.</p><table><thead><tr><th>Time</th><th><code>/build/timestamp</code> (seconds since UNIX epoch)</th><th><code>/build/revision_id</code></th></tr></thead><tbody><tr><td>2011-12-13 14:15</td><td>1300000000</td><td>git:da39a3ee5e6b4b0d3255bfef95601890afd80709</td></tr><tr><td>2011-12-13 14:16</td><td>1300000000</td><td>git:da39a3ee5e6b4b0d3255bfef95601890afd80709</td></tr><tr><td>2011-12-13 14:17</td><td>1300000000</td><td>git:da39a3ee5e6b4b0d3255bfef95601890afd80709</td></tr></tbody></table><p>In Go, using a constant metric might look something like this:</p><blog-code syntax=go><pre>
import "foo.com/my/monitoring/impl/metric"
var (
_TIMESTAMP int64 /* filled in by linker */
_REVISION_ID string /* filled in by linker */
metric.NewConstantInt64("/build/timestamp", _TIMESTAMP)
metric.NewConstantString("/build/revision_id", _REVISION_ID)
)
</pre></blog-code></blog-section><blog-section><h3 slot=title>Gauges</h3><p>A gauge metric can vary freely across its possible value range. Think of them like tachometers.</p><p>Gauges can be text or numbers.</p><ul><li>Example integer gauges are memory allocation, thread count, active RPC count.</li><li>Example text gauges are mutable config settings (e.g. backend addresses), environment variables, and hostnames.</li></ul><table><thead><tr><th>Time</th><th><code>/proc/thread_count</code></th><th><code>/proc/working_directory</code></th></tr></thead><tbody><tr><td>2011-12-13 14:15</td><td>200</td><td>/var/www/current</td></tr><tr><td>2011-12-13 14:16</td><td>250</td><td>/var/www/previous</td></tr><tr><td>2011-12-13 14:17</td><td>230</td><td>/var/www/current</td></tr></tbody></table><p>In Go, using a gauge metric might look something like this:</p><blog-code syntax=go><pre>
import "foo.com/my/monitoring/impl/metric"
var (
threadCount = metric.NewGaugeInt64("/proc/thread_count")
workingDir = metric.NewGaugeString("/proc/working_directory")
)
func updateMetrics() {
threadCount.Set(int64(runtime.NumGoroutine()))
wd, _ := os.Getwd()
workingDir.Set(wd)
}
</pre></blog-code></blog-section><blog-section><h3 slot=title>Counters</h3><p>A counter metric must be a number, and can only increase during the lifetime of the system. Counters are almost always integers to avoid the implications of IEEE-754 rounding.</p><p>Example counter metrics are CPU microseconds spent, or the total request count.</p><p>Counters can only increase. If the metric collector sees that a new value is lower than the older value, it knows a <i>metric reset</i> has occurred. Resets happen when a process restarts, clearing in-memory state of the counter.</p><table><thead><tr><th>Time</th><th><code>/net/http/server/request_count</code></th><th></th></tr></thead><tbody><tr><td>2011-12-13 14:15</td><td>10000</td><td></td></tr><tr><td>2011-12-13 14:16</td><td>11000</td><td></td></tr><tr><td>2011-12-13 14:17</td><td>1500</td><td>RESET</td></tr></tbody></table><p>In Go, defining a counter metric might look something like this:</p><blog-code syntax=go><pre>
import "foo.com/my/monitoring/impl/metric"
var (
requestCount = metric.NewCounterInt64("/net/http/server/request_count")
)
func handler(w http.ResponseWriter, req *http.Request) {
requestCount.Increment() // or .IncrementBy(1)
}
</pre></blog-code></blog-section><blog-section><h3 slot=title>Distributions</h3><p><a href=https://en.wikipedia.org/wiki/Frequency_distribution>Distributions</a> are used for metrics with a very large set of possible values. They are usually visualized as a histogram or heat map.</p><p>Examples include request latencies, client IP addresses<blog-footnote-ref>[<a href=#fn:3>3</a>]</blog-footnote-ref>, and aggregations of constant/gauge/counter metrics from other sources.</p><table><thead><tr><th>Time</th><th><code>/net/http/server/response_latency</code> (seconds)</th></tr></thead><tbody><tr><td>2011-12-13 14:15</td><td><pre>
[ 0, 2) #
[ 2, 3) ###
[ 3, 5) #######
[ 5, 8) ####
[ 8, 13) ##
[13, ∞)
</pre></td></tr><tr><td>2011-12-13 14:16</td><td><pre>
[ 0, 2) #
[ 2, 3) ####
[ 3, 5) ########
[ 5, 8) ###
[ 8, 13) #
[13, ∞)
</pre></td></tr><tr><td>2011-12-13 14:17</td><td><pre>
[ 0, 2)
[ 2, 3) #
[ 3, 5) ##
[ 5, 8) #####
[ 8, 13) ########
[13, ∞) #
</pre></td></tr></tbody></table><p>In Go, defining a distribution metric might look something like this:</p><blog-code syntax=go><pre>
import "foo.com/my/monitoring/impl/metric"
var (
latency = metric.NewDurations(
"/net/http/server/response_latency",
metric.BinDurations([]time.Duration{
2 * time.Second,
3 * time.Second,
5 * time.Second,
8 * time.Second,
13 * time.Second,
})
)
)
func handler(w http.ResponseWriter, req *http.Request) {
start := time.Now()
defer func() {
latency.Sample(time.Now() - start)
}()
}
</pre></blog-code><p>Each distribution is also inherently a set of counters, because recording a sample in one of the bins will increment that bin's count. This property can be used to simplify some monitoring configurations.</p><p>Bins can be defined statically (as in the example above), or using a function. Binning might be performed either by the system reporting the metric, or by the monitoring infrastructure.</p><ul><li>With <b>client-side binning</b>, the reporter decides how fine-grained the distribution should be.<ul><li>This is usually configurable per-metric by a command-line flag or config setting.</li><li>Changing the binning can cause vertical aberrations in visualisations.</li></ul></li><li>With <b>collector-side binning</b>, the client reports the events as-is and the monitoring infrastructure aggregates the data before storing/forwarding it.<ul><li>Example: collector receives raw distribution samples from its clients, and records {50,90,95,99}th percentiles over a trailing window.</li><li>This can be significantly less flexible, and it is often difficult to visualize percentiles as usefully as a full distribution.</li></ul></ul></blog-section><blog-section><h3 slot=title>Metric Names</h3><p>I know of three styles for metric names:</p><ul><li>The <a href=https://prometheus.io/docs/practices/naming/>Prometheus Style Guide</a> recommends <code>myapp_descriptive_snake_case</code>, where <code>myapp_</code> is a one-word prefix specific to the system being monitored. This style is derived from <a href=http://landing.google.com/sre/book/chapters/practical-alerting.html>Google Borgmon</a>, which uses metric names as symbols in its configuration DSL<blog-footnote-ref>[<a href=#fn:4>4</a>]</blog-footnote-ref>.<ul><li><a href=https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CW_Support_For_AWS.html>Amazon CloudWatch metrics</a> use this format, without a prefix.</li></ul></li><li><a href=https://github.com/etsy/statsd>statsd</a> and its derivatives use <code>short.dotted.words</code>, though the exact symbol set can vary between vendors.<ul><li>For example, <a href=https://help.datadoghq.com/hc/en-us/articles/203764705-What-are-valid-metric-names->DataDog allows alphanum, underscores, and periods</a>.</li></ul></li><li><a href=https://cloud.google.com/monitoring/api/metrics_gcp>Google Stackdriver</a> uses <code>myapp.com/unix/filesystem/paths</code>, with each product having its own "subdirectory" in the metrics hierarchy.<ul><li>The same style is applied to <a href=https://cloud.google.com/monitoring/api/metrics_aws>AWS metrics in Stackdriver</a>, by adding product-specific prefixes for each CloudWatch metric.</li></ul></li></ul><p>My personal favorite is the UNIX paths style, which I've seen used to great success. Engineers exposed to this style begin to naturally lay out metric hierarchies, with clear meanings and good namespacing. I don't have any solid data about <i>why</i> the naming style has such an effect, but I suspect it has something to do with familiarity:</p><ul><li>A metric name like <code>http_request_count</code> is well and good, but <code>myapp_com.net.http.server.request_count</code> looks <i>wrong</i> to an experienced engineer. Expressions that use that many dots violate the <a href=http://wiki.c2.com/?LawOfDemeter>Law of Demeter</a>.</li><li>In contrast, path-shaped metric names like <code>myapp.com/net/http/server/request_count</code> inspire no such negative thoughts. Long paths are common in UNIX environments, and it's certainly no harder to remember than many of the paths in Linux's <code>sysfs</code>.</li></ul></blog-section></blog-section><blog-section><h2 slot=title>Traces</h2><p>While metrics help understand the system in aggregate, traces are used to understand the relationship between the parts of a system that processed a particular request.</p><p>A trace is a tree of <i>spans</i>, which each represent a logical region of the system's execution time. Spans are nested – all spans except the <i>root span</i> have a <i>parent span</i>, and a trace is constructed by walking the tree to link parents with their children.</p><pre>
######################## GET /user/messages/inbox
###### User permissions check
#### Read template from disk
######### Query database
### Render page to HTML
## Compress response body
###### Write response body
</pre><p>Spans and traces can be understood by analogy to lower-level programming concepts. If a trace is a stack trace, then a span is a single stack frame. Just as every stack frame is pushed and popped, each span begins and ends. It's the timing of when the spans begin and end that is interesting when analysing a trace.</p><p>Each span is implicitly a sample of a duration distribution, and therefore also a counter<blog-footnote-ref>[<a href=#fn:5>5</a>]</blog-footnote-ref>.</p><p>Tools for creating and recording traces are currently less mature than for creating metrics, and a wide variety of tracing platforms exists. <a href=http://opentracing.io/>OpenTracing</a> is an attempt to provide vendor-neutral APIs for many languages so that tracing support can more easily be added to shared libraries.</p></blog-section><blog-section><h2 slot=title>Events</h2><p>Events are conceptually similar to logging, but with an implied increase in how interesting a human will find the event as compared to normal logs. A web server might log a message for every request, but only record an event for things like unhandled exceptions, config file changes, or 5xx error codes.</p><p>Events are usually rendered in dashboards on top of visualized metric data, so humans can correlate them with potential production impact. For example, an oncaller might be better able to debug a spike in request latency if the dashboard shows it was immediately preceeded by a config change.</p><p>Events can also be archived to a searchable event log. This can be useful when investigating unexpected behavior that occurred in a large window of time – logs may be too noisy to search, but the event log can quickly find "all SSH logins to this machine in the last 3 hours".</p><p>Events that indicate programming errors should be recorded in a ticket tracking system, then assigned to a engineer for diagnosis and correction. This should be relatively rare – if your service encounters unhandled errors more than once a month or so, then you should improve its automated test suite.</p></blog-section><blog-section><h2 slot=title>Metric Metadata</h2><p>A raw stream of numbers can be useful to authors of the system who are deeply familiar with its internal details, but can be opaque to other engineers (including oncall SREs). Attaching <i>metadata</i> to metrics at their point of definition can help with this by acting as type hints, documentation, and cross-references.</p><p>Types of metadata that might be added include:</p><ul><li>Human-readable <b>documentation</b>, such as a description of the metric's deeper meaning. Very nice to have when staring at hundreds of similar-looking metrics in a dashboard builder.</li><li><b>Numeric units</b>, so the monitoring system can combine millisecond-resolution data from one system with minute-resolution data from another. Or bytes and gibibyte, or Mb/s with kB/s.</li><li><b>Tags</b> (see below), which benefit from pre-definition and strong typing in the same way metrics do.</li><li><b>Source code location</b>, usually inserted automatically by the monitoring library. Navigating from a dashboard to source code is often the first step for investigating an anomalous chart reading.</li><li><b>Contact info</b> for a person or team that has more context about what the metric means, or how it relates to the overall health of a system.</li><li><b>Stability</b> metrics are an API too! Some metrics are experimental and shouldn't be built into other teams' dashboards, so it's useful to be able to indicate "this metric's definition is stable" vs "this could change without warning".</li></ul><p>In Go, defining a metric with metadata might look something like this:</p><blog-code syntax=go><pre>
import "foo.com/my/monitoring/impl/metric"
var (
requestCount = metric.NewCounterInt64(
"/net/http/server/request_count",
metric.Description("A count of the HTTP requests received by this process."),
)
)
func handler(w http.ResponseWriter, req *http.Request) {
requestCount.Increment() // or .IncrementBy(1)
}
</pre></blog-code><blog-section id=metric-tags><h3 slot=title>Tags</h3><p>Tags are attached to a sample, span, or event to provide more information and context. They are a critical part of a metric, because without tags you couldn't tell which machine has unusually high load, or whether your HTTP responses are <code>status: OK</code> or <code>status: INTERNAL_SERVER_ERROR</code>.</p><p>Tags are almost always named with <code>short_snake_case</code>. There's no need to have full namespacing as in Metric names, because tags are implicitly namespaced to their metric.</p><p>Tags should have low <a href=https://en.wikipedia.org/wiki/Cardinality_(data_modeling)>cardinality</a> – the number of possible key-value combinations in the data. Tagging the response status is fine because there's only a few dozen of them, but tagging the timestamp or client IP would place an enormous load on your collection and analysis infrastructure.</p><p>Tag value types are a restricted subset of metric value types: integers, text, and maybe bools. Floats are forbidden due to cardinality, and distributions don't make sense as a tag.</p><p>Just like for metrics, tags might be declared ad-hoc or or statically. Static declaration of tags with their metrics improves the information available in dashboards, and helps catch programming errors before they land in prod.</p><p>In Go, defining a metric with tags might look something like this:</p><blog-code syntax=go><pre>
import "foo.com/my/monitoring/impl/metric"
var (
requestCount = metric.NewCounterInt64(
"/net/http/server/request_count",
metric.StringTag("method"),
)
)
func handler(w http.ResponseWriter, req *http.Request) {
// func (c *CounterInt64) Increment(tagValues ...metric.TagValue)
// func (c *CounterInt64) IncrementBy(count int64, tagValues ...metric.TagValue)
requestCount.Increment(req.Method)
}
</pre></blog-code><p>An alternative style, which is more type-safe but also more verbose, might be:</p><blog-code syntax=go><pre>
var (
tagMethod = metric.NewStringTag("method"),
requestCount = metric.CounterInt64(
"/net/http/server/request_count",
metric.Tags(tagMethod),
)
)
func handler(w http.ResponseWriter, req *http.Request) {
requestCount.Increment(tagMethod(req.Method))
}
</pre></blog-code><p>Note that neither style protects against forgetting a tag. In Go this is acceptable because zero-valued defaults are idiomatic, but other languages may prefer to require all tags to be specified when recording a sample.</p></blog-section></blog-section><blog-section><h2 slot=title>Push vs Pull</h2><p>Until I started looking into open-source monitoring frameworks, I didn't realize the "push vs pull" debate existed. I still don't fully understand it. Have we, as an industry, forgotten that TCP sockets are bidirectional? Anyway, here's a summary of the two sides.</p><blog-section><h3 slot=title>Push Model</h3><p>In the push model, processes are configured with the network address of a metric collector. They send metrics on their own schedule, either periodically (e.g. every 5 minutes) or whenever a value changes. <a href=https://github.com/etsy/statsd>statsd</a> and its various derivatives are a canonical example of the push model – to increment a counter or set a gauge, the process sends a UDP packet<blog-footnote-ref>[<a href=#fn:6>6</a>]</blog-footnote-ref> to the collector with the metric name and value.</p><p>The push model is dead simple to implement, and has a significant advantage of not requiring any sort of service discovery infrastructure. But it's also inflexible and difficult to manage – metric collection policies are hardcoded (or require a complex configuration management), and load balancing between collectors.</p></blog-section><blog-section><h3 slot=title>Pull Model</h3><p>In the pull model, processes provide network access to their metrics and register themselves in a service discovery infrastructure such as <a href=https://www.consul.io/>Consul</a>. Typical implementations are an HTTP endpoint (e.g. Prometheus's <code>/varz</code>) or a simple request-response RPC. The collectors use service discovery to find endpoints, scrape them on their own schedule, and make the data available on their own endpoints for scraping by higher-level collectors.</p><p>Two significant downsides to the pull model are the dependency on service discovery, and lack of backpressure:</p><ul><li>If your service discovery infrastructure is degraded or unavailable, then newly created processes might not be monitored properly. Monitoring the discovery infrastructure itself is also a challenge, because your collectors need some way to hard-code the discovery service metric endpoints.</li><li>A fleet of collectors can easily send more metric requests than a single process can handle. Incorrect load balancing, monitoring configuration mistakes, or aggresive retries can cause your monitoring infrastructure to degrade the system it's monitoring!</li></ul></blog-section></blog-section><blog-section><h2 slot=title>Bi-Directional Collection</h2><p>One solution to the push-vs-pull debate is to have the instrumented system connect to the collectors, receive its collection policy from them, and then push samples. This achieves the best of both worlds – the collector can set policy about which metrics to push and how often, but implementation of the policy is left up to the monitored system. Service registration is present only in vestigal form, because the monitored system can register with <i>any</i> collector instead of a globally-consistent service discovery infrastructure.</p><div style=text-align:center><pre style=text-align:left;display:inline-block;margin:0>
+------------------+ +------------+
| Monitored System | | Collector |
+------------------+ +------------+
|| ||
|| Announcement ||
|| ---------------------------> ||
|| ||
|| Collection Policy ||
|| <--------------------------- ||
|| ||
|| Samples ||
|| ---------------------------> ||
|| ||
|| Samples ||
|| ---------------------------> ||
|| ||
|| ... ||
|| ||
</pre></div><p>The monitored system starts the process by connecting to the collector, and announcing its identity. The identity consists of things like cluster name, machine ID, process ID, or other ways to distinguish processes from each other.</p><p>A monitored system might announce multiple identities, for example if it's proxying metrics from some other source. A process that scrapes Apache log files to count errors might report two identities, one for itself and one for Apache. Each identity has independent (and possibly overlapping) sets of metrics.</p><blog-section><h3 slot=title>Collection Policies</h3><p>A large binary might be instrumented with many thousands of metrics, but only a subset will be of interest to the SRE team. Furthermore, some metrics should be updated more often than others – and the details can change as the SRE team refines dashboards or investigates ongoing service degradation. The rules about which metrics to push, and how frequently to push them, are encoded in a <i>collection policy</i> that the collector sends to the monitored system.</p><p>The following example policy pushes metrics starting with <code>/build/</code> every 10 minutes, and metrics starting with <code>/proc/</code> or <code>/net/rpc/server/</code> every 5 seconds. The metric <code>/net/rpc/client/response_latency</code> is also pushed every 5 seconds, but other metrics under <code>/net/rpc/client/</code> are not pushed.</p><blog-code syntax=yaml><pre>
metrics:
- prefix: "/build/"
interval: {seconds: 600}
- prefix: "/proc/"
interval: {seconds: 5}
- prefix: "/net/rpc/server/"
interval: {seconds: 5}
- name: "/net/rpc/client/response_latency"
interval: {seconds: 5}
</pre></blog-code><p>A collector might also request specific <a href=#events>events</a> and <a href=#traces>trace spans</a>, or all of them.</p><p>Note that there is no hard requirement on the monitored system to push at the specified interval. It might push less often if it's running low on CPU allocation, or perform an unscheduled push during shutdown.</p></blog-section><blog-section><h3 slot=title>Sample Compression</h3><p>An unexpected benefit of pushing metrics in a reliable connection-oriented protocol is the opportunity for cheap data compression. Metric names, unchanged sample values, and timestamps are easy wins to reduce bandwidth requirements in your metric collection.</p><blog-section id=compress-metric-names><h4 slot=title>Metric Names</h4><p>When the monitored system pushes a metric sample, it can allocate a connection-unique ID to that metric name. For later pushes, the name doesn't need to be re-transmitted. This is an especially good fit for <a href=https://developers.google.com/protocol-buffers/>protocol buffers</a>, because each message field is identified by an ID. Therefore, a sample push can be encoded in the protobuf wire format as a sequence of <code>(metric_id, metric_value)</code> tuples, where the <code>metric_value</code> is of the protobuf type corresponding to the metric type.</p><p>A brief example, showing the original metric definition on the left, and the logical protobuf encoding on the right:</p><table style="margin:0 auto"><tr><td><pre style=margin:20px>
metric {
name: "/proc/thread_count"
type: INT64
per_connection_metric_id: 1
}
metric {
name: "/proc/working_directory"
type: TEXT
per_connection_metric_id: 2
}
</pre></td><td style="vertical-align:top;border-left:1px solid"><pre style=margin:20px>
message {
int64 proc_thread_count = 1;
string proc_working_directory = 2;
}
</pre></td></tr></table></blog-section><blog-section id=compress-unchanged-samples><h4 slot=title>Unchanged Samples</h4><p>Metric values often change less frequently than their collection interval. Instead of resending the same value over and over, the protocol can have a <code>repeated int64 unchanged_metric_id</code> field. Any metric IDs in this list will be treated as if they were sent using the last value seen in the current connection.</p></blog-section><blog-section id=compress-timestamps><h4 slot=title>Timestamps</h4><p>If timestamps are a metric type encoded into the protocol instead of just using integers, then they can be compressed using a timestamp base. For example, instead of sending <code>int64 timestamp</code> for each sample, send <code>int64 timestamp_base</code> in the announcement message and <code>int32 timestamp_offset</code> in the samples. Then reconstruct the original values in the collector as <code>timestamp_base + timestamp_offset</code>.</p><p>This technique works regardless of whether you use a fixed-length integer field, or a protobuf-style varint. Fixed-length fields will save 50% of each timestamp per sample, varint savings will vary depending on how small the offsets are. Note that for protobuf, chosing a timestamp base in the future and using negative offsets may result in more compact output due to <a href=https://developers.google.com/protocol-buffers/docs/encoding#signed-integers>ZigZag encoding</a>.</p><p>The base time must be updated to a larger value if the offset would overflow a signed 32-bit integer. The resolution of your timestamps will affect how often the base time must be updated:</p><table><thead><tr><th>Maximum Offset</th><th>Seconds</th><th>Minutes</th><th>Hours</th><th>Days</th></tr></thead><tbody><tr><td>2<span style=display:none>^</span><span style=vertical-align:super;font-size:smaller>31</span> nanoseconds</td><td>2.15</td><td>-</td><td>-</td><td>-</td></tr><tr><td>2<span style=display:none>^</span><span style=vertical-align:super;font-size:smaller>31</span> microseconds</td><td>2147.48</td><td>35.79</td><td>-</td><td>-</td></tr><tr><td>2<span style=display:none>^</span><span style=vertical-align:super;font-size:smaller>31</span> milliseconds</td><td>-</td><td>-</td><td>596.52</td><td>24.86</td></tr></tbody></table></blog-section></blog-section></blog-section><blog-footnotes slot=footnotes><hr><ol><li id=fn:1><p>Or maybe <code>"opt_foo"</code> and <code>"opt_bar"</code>. <code>"OptFoo"</code> is right out.</p></li><li id=fn:2><p>Distributions are sometimes called "histograms", for example by DataDog and Prometheus, but this is technically incorrect – a histogram is a visualization of a distribution.</p></li><li id=fn:3><p>This may seem like an odd metric value, but it can be useful when diagnosing routing-related network issues.</p></li><li id=fn:4><p>If you ever feel the urge to write your own <a href=https://www.robustperception.io/conways-life-in-prometheus/>turing-complete configuration language</a>, take a deep breath and step back for a bit. Go for a walk around the block. Look at some trees.</p></li><li id=fn:5><p>Be careful about depending on spans as counters. Many tracing systems record only a subset of the traces they receive, or discard spans with durations outside of their recall window. You may find the implied metrics to be missing data from times when they are most interesting.</p></li><li id=fn:6><p>UDP <i>could</i> be a reasonable transport for metrics if you used it as the foundation for a reliable connection-oriented protocol (ala QUIC), but statsd does not do this. There is no mechanism to resend lost updates, ignore duplicates, or ensure correct sequencing of gauge values. Embedding the metric name in each packet is enormously wasteful of bandwidth. <a href=https://githubengineering.com/brubeck/>statsd collection is difficult to load balance across threads</a>, and and very difficult to balance across collectors running on separate machines.</p></li></ol></blog-footnotes></blog-article>2018-03-03T18:52:24Zhaskell-cpython: Calling Python libraries from Haskell2010-10-28T04:04:18Zurn:uuid:979862e7-d90d-4b36-b8f1-87f9fa1954b0<blog-article posted=2010-10-28T04:04:18Z><h1 slot=title>haskell-cpython: Calling Python libraries from Haskell</h1><div slot=summary><p>Haskell's a great language; it's efficient, consistent, terse, reliable, and so on. But if there's one thing Haskell's not, it's "batteries included". Compared to popular dynamic languages, such as Python and Ruby, Haskell has a very limited module library. Writing bindings to Python libraries (via the <a href=http://docs.python.org/3.1/c-api/>Python/C API</a>) is an easy and practical approach to reusing the Python community's work.</p><p><b>Code</b>: <a href=https://john-millikin.com/code/haskell-cpython>https://john-millikin.com/code/haskell-cpython</a> (<a href=https://github.com/jmillikin/haskell-cpython>GitHub mirror</a>)</p></div><blog-section><h2 slot=title>Preflight</h2><p>In addition to standard Haskell development tools (GHC, Cabal, etc), building the example code requires the Python 3.1 headers. In Debian/Ubuntu, <kbd>apt-get install python3.1-dev</kbd>.</p><p>Once necessary libraries are installed, you should be able to run the following test program. If the program won't compile, or crashes, double-check that GHC and Cabal are installed properly.</p><blog-code syntax=haskell><pre>
module Main where
import qualified Data.Text.IO as T
import qualified CPython as Py
main :: IO ()
main = do
Py.initialize
Py.getVersion >>= T.putStrLn
</pre></blog-code><p>The program should give output like this:</p><blog-code><pre>
$ runhaskell version.hs
3.1.2 (release31-maint, Sep 17 2010, 20:37:45)
[GCC 4.4.5]
</pre></blog-code></blog-section><blog-section id=pythons-built-in-types><h2 slot=title>Python's built-in types</h2><p>Like any self-respecting language, Python has a variety of built-in types; integers, text, lists, tuples, etc. The first step to using any Python library is marshaling Haskell values into an equivalent Python value. A full list of types supported by the CPython bindings is available in the <a href=https://hackage.haskell.org/package/cpython>API reference</a>.</p><p>Lets marshal some basic stuff, using <code>print()</code> to see what Python makes of it:</p><blog-code syntax=haskell><pre>
{-# LANGUAGE OverloadedStrings #-}
module Main where
import qualified Data.Text as T
import qualified Data.ByteString.Char8 as B
import System.IO (stdout)
import qualified CPython as Py
import qualified CPython.Protocols.Object as Py
import qualified CPython.Types as Py
main :: IO ()
main = do
Py.initialize
unicode <- Py.toUnicode "Hello World!"
Py.print unicode stdout
bytes <- Py.toBytes (B.pack "Hello\NULWorld!\ETX")
Py.print bytes stdout
float <- Py.toFloat 1.2345
Py.print float stdout
int <- Py.toInteger 12345
Py.print int stdout
list <- Py.toList [Py.toObject int]
Py.print list stdout
tuple <- Py.toTuple [Py.toObject int]
Py.print tuple stdout
set <- Py.toSet [Py.toObject int]
Py.print set stdout
</pre></blog-code><blog-code><pre>
$ runhaskell marshaling.hs
'Hello World!'
b'Hello\x00World!\x03'
1.2345
12345
[12345]
(12345,)
{12345}
</pre></blog-code><p>That's a big chunk to digest at once, so lets break it down a bit:</p><ul><li>Python's <code>unicode</code>, <code>bytes</code>, <code>float</code>, and <code>int</code> types match up precisely with Haskell's <code>Text</code>, <code>ByteString</code>, <code>Double</code>, and <code>Integer</code>, respectively. Byte literals are prefixed with <code>b</code>, to reduce confusion with unicode strings.</li><li>Python's tuples are similar to Haskell's, except they may contain any number of elements. Single-element tuples are indicated by a trailing comma.</li><li>Python's lists are heterogeneous and support constant-time indexing; in Haskell, we use the <code>SomeObject</code> <abbr title="Generalized Algebraic Data Type">GADT</abbr> to represent the contents of lists (and of arbitrary Python objects in general). Every value stored in a list must be first converted to a <code>SomeObject</code>, using <code>Py.toObject</code>.</li><li>Python's sets are also heterogeneous and constant-time; the special syntax <code>{1, 2, 3}</code> is equivalent to Haskell's <code>Data.Set.fromList [1, 2, 3]</code>.</li></ul></blog-section><blog-section><h2 slot=title>Methods and Protocols</h2><p>Every Python object has a selection of <i>methods</i>, which can be called by external code to do stuff. If you've ever used a pseudo-<abbr title=Object-Oriented>OO</abbr> language like C++ or Java, you've used methods before. Some methods are exposed directly via Python/C; others must be queried as attributes from an object.</p><p>When separate types have similar methods, those methods are usually standardized into a <i>protocol</i>. Python protocols are like Haskell typeclasses, except not type checked; any value with the appropriate methods is said to implement a protocol. For example, <code>tuple</code>, <code>list</code>, and <code>bytes</code> values all implement the <i>sequence</i> protocol.</p></blog-section><blog-section><h2 slot=title>Importing modules</h2><p>There's only so much you can do with the built-in types; sooner or later, you'll want to use one of Python's rich selection of libraries. That's why you're reading this, right?</p><p>Modules are exposed to the runtime as standard Python objects, and their contents (variables, procedures, class definitions) can be queried like any other object attribute. Lets look at an example of calling <code>os.uname()</code>:</p><blog-code syntax=haskell><pre>
{-# LANGUAGE OverloadedStrings #-}
module Main where
import qualified Data.Text as T
import System.IO (stdout)
import qualified CPython as Py
import qualified CPython.Protocols.Object as Py
import qualified CPython.Types as Py
import qualified CPython.Types.Module as Py
main :: IO ()
main = do
Py.initialize
os <- Py.importModule "os"
uname <- Py.getAttribute os =<< Py.toUnicode "uname"
res <- Py.callArgs uname []
Py.print res stdout
</pre></blog-code><blog-code><pre>
$ runhaskell import.hs
('Linux', 'desktop', '2.6.35-22-generic', '#35-Ubuntu SMP Sat Oct 16 20:45:36 UTC 2010', 'x86_64')
</pre></blog-code><p>The <code>getAttribute</code> and <code>callArgs</code> functions are both part of the object protocol; the former works on all objects, while the latter works on objects with the <code>__call__()</code> magic method.</p><p>A module can be imported any number of times, but will only be loaded once per interpreter. This comes in very useful in Haskell, which has no native support for static data – if you need to call a Python method, just import its module at the call site.</p><p>Of course, even inexpensive operations can become a bottleneck if performed often enough; importing an already-loaded module is fast, but the full lookup still involves several string comparisons and a marshal. If the same Python function needs to be run many times, consider querying it once and caching the function object.</p></blog-section><blog-section><h2 slot=title>Catching Exceptions</h2><p>If anybody's been playing around with the above examples, they might have run into the following problem:</p><blog-code><pre>
$ runhaskell exceptions.hs
exceptions.hs: <CPython exception>
</pre></blog-code><p>Because Python exceptions are themselves Python objects, printing them requires an IO action. In fact, because Python methods can perform arbitrary actions, printing the same exception twice might give different output! Therefore, the <code>Show</code> instance for Python exceptions is mostly worthless.</p><p>Every Python exception has three components: a class, a value, and an optional traceback (i.e. stack trace). The class is generally not interesting, but the value can be printed to see what went wrong:</p><blog-code syntax=haskell><pre>
{-# LANGUAGE OverloadedStrings #-}
module Main where
import qualified Control.Exception as E
import qualified Data.Text as T
import System.IO (stdout)
import qualified CPython as Py
import qualified CPython.Protocols.Object as Py
import qualified CPython.Types.Exception as Py
import qualified CPython.Types.Module as Py
main :: IO ()
main = do
Py.initialize
E.handle onException $ do
Py.importModule "no-such-mod"
return ()
onException :: Py.Exception -> IO ()
onException exc = Py.print (Py.exceptionValue exc) stdout
</pre></blog-code><blog-code><pre>
$ runhaskell exceptions.hs
ImportError('No module named no-such-mod',)
</pre></blog-code><p>This'll do for quick and dirty scripts, but more complex errors will benefit from using the <a href=http://docs.python.org/3.1/library/traceback.html>traceback</a> module. Use procedures like <code>print_exception()</code> to get nice, pretty-printed error messages. If an exception originated in Python code, a stack trace will also be printed.</p><blog-code syntax=haskell><pre>
import qualified CPython.Constants as Py
import qualified CPython.Types as Py
-- ...
onException exc = do
tb <- case Py.exceptionTraceback exc of
Just obj -> return obj
Nothing -> Py.none
mod <- Py.importModule "traceback"
proc <- Py.getAttribute mod =<< Py.toUnicode "print_exception"
Py.callArgs proc [Py.exceptionType exc, Py.exceptionValue exc, tb]
return ()
</pre></blog-code><blog-code><pre>
$ runhaskell exceptions.hs
ImportError: No module named no-such-mod
</pre></blog-code></blog-section><blog-section id=putting-it-all-together-binding-mimetypes><h2 slot=title>Putting it all together: binding 'mimetypes'</h2><p>Here's the payoff; implementing a Haskell library with an existing Python library. For this I'll use the <a href=http://docs.python.org/3.1/library/mimetypes.html>mimetypes</a> module, since it's simple and self-contained; more useful bindings might be to the <a href=http://www.feedparser.org/>Universal Feed Parser</a> or <a href=http://docutils.sourceforge.net/rst.html>docutils</a>.</p><p>Even a simple binding is a bit big to read all at once as an example, so I've split it up. First is the imports and exports; no explanation needed, hopefully.</p><blog-code syntax=haskell><pre>
{-# LANGUAGE OverloadedStrings #-}
module MimeTypes
( MimeTypes
, newMimeTypes
, guessExtension
, guessType
) where
import qualified Data.Text as T
import qualified CPython as Py
import qualified CPython.Constants as Py
import qualified CPython.Protocols.Object as Py
import qualified CPython.Types as Py
import qualified CPython.Types.Module as Py
import qualified CPython.Types.Tuple as PyT
</pre></blog-code><p>Next we have a data type for matching the <code>mimetypes.MimeTypes</code> class; it doesn't have the full complement of attributes, but enough for demonstration. <code>newMimeTypes</code>'s parameters mimic that of the Python class's constructor.</p><p>Note that there are no Python types exposed in this module's public interface; clients of this module are insulated from the internal implementation. Aside from the absurdly heavy dependency list, there is no sign that this module is just a binding.</p><blog-code syntax=haskell><pre>
data MimeTypes = MimeTypes
{ mtGuessExtension :: Py.SomeObject
, mtGuessType :: Py.SomeObject
}
newMimeTypes :: [FilePath] -> Bool -> IO MimeTypes
newMimeTypes files strict = do
Py.initialize
mod <- Py.importModule "mimetypes"
cls <- Py.getAttribute mod =<< Py.toUnicode "MimeTypes"
pyFiles <- Py.toList =<< mapM (fmap Py.toObject . Py.toUnicode) files
pyStrict <- if strict then Py.true else Py.false
mt <- Py.callArgs cls [Py.toObject pyFiles, Py.toObject pyStrict]
pyGuessExtension <- Py.getAttribute mt =<< Py.toUnicode "guess_extension"
pyGuessType <- Py.getAttribute mt =<< Py.toUnicode "guess_type"
return $ MimeTypes pyGuessExtension pyGuessType
</pre></blog-code><p>If you've any sense, one of the first things you thought after reading that was "golly, that sure is ugly". And you're right – it is ugly. Anybody who wants to make a serious go of binding large-scale Python libraries (such as Django) are heavily encouraged to write something similar to <a href=http://www.cse.unsw.edu.au/~chak/haskell/c2hs/>c2hs</a> to automate the worst of it. Call it py2hs?</p><p>However, aside from being dreadfully verbose, it's not particularly complex. Parameters are marshaled from Haskell types into their Python equivalents, packaged up into a parameter list, and used to call the class constructor. After the <code>MimeTypes</code> object has been created, its <code>guess_extension</code> and <code>guess_type</code> methods are queried and cached for later use.</p><p>Which brings us to:</p><blog-code syntax=haskell><pre>
guessExtension :: MimeTypes -> T.Text -> Bool -> IO (Maybe T.Text)
guessExtension mt type_ strict = do
pyType <- Py.toUnicode type_
pyStrict <- if strict then Py.true else Py.false
res <- Py.callArgs (mtGuessExtension mt) [Py.toObject pyType, Py.toObject pyStrict]
textOrNone res
guessType :: MimeTypes -> T.Text -> Bool -> IO (Maybe T.Text, Maybe T.Text)
guessType mt url strict = do
pyURL <- Py.toUnicode url
pyStrict <- if strict then Py.true else Py.false
res <- Py.callArgs (mtGuessType mt) [Py.toObject pyURL, Py.toObject pyStrict]
Just tup <- Py.cast res
[pyType, pyEncoding] <- Py.fromTuple tup
type_ <- textOrNone pyType
encoding <- textOrNone pyEncoding
return (type_, encoding)
textOrNone :: Py.SomeObject -> IO (Maybe T.Text)
textOrNone obj = do
isNone <- Py.isNone obj
if isNone
then return Nothing
else do
Just cast <- Py.cast obj
Just `fmap` Py.fromUnicode cast
</pre></blog-code><p>Really, it's more of the same; marshal parameters, call, dissect the result. Testing for <code>None</code> is common enough that I moved it to a helper; more complex bindings might have dozens such helpers for special cases. Are you listening, py2hs author?</p><p>Finally, load up our new binding into GHCi and see if it works:</p><blog-code><pre>
$ ghci -XOverloadedStrings
GHCi, version 6.12.1: http://www.haskell.org/ghc/ :? for help
Prelude> :l MimeTypes
[1 of 1] Compiling MimeTypes ( MimeTypes.hs, interpreted )
Ok, modules loaded: MimeTypes.
*MimeTypes> types <- newMimeTypes [] False
</pre></blog-code><p>It loaded! And it didn't crash! We're off to a good start; lets see if our <code>guessType</code> works:</p><blog-code><pre>
*MimeTypes> import Data.Text
*MimeTypes Data.Text> guessType types "foo.txt" True
(Just "text/plain",Nothing)
*MimeTypes Data.Text> guessType types "foo.html.gz" True
(Just "text/html",Just "gzip")
</pre></blog-code><p>Looks good; it's picking up the file type, and the optional encoding. Now for <code>guessExtension</code>:</p><blog-code><pre>
*MimeTypes Data.Text> guessExtension types "text/plain" True
Just ".ksh"
</pre></blog-code><p>Hmm.</p><p><a href=http://bugs.python.org/issue1043134>http://bugs.python.org/issue1043134</a></p><p>Hmm 🤔</p></blog-section></blog-article>2010-10-28T04:04:18ZMonad is not difficult2010-02-23T00:57:33Zurn:uuid:8696bb46-9a7b-411c-8451-7aa59e3e62ff<blog-article posted=2010-02-23T00:57:33Z><h1 slot=title>Monad is not difficult</h1><p>In Haskell, the typeclass <code>Monad</code> is a way for programmers to customize how sequences of function calls are run. Originally the goal was just to make IO-heavy code easier to read, but it turns out that Monad-shaped APIs are wonderfully flexible and can simplify all sorts of programming problems.</p><p>For example, say we want to define a function to look up two people in a database, and return information about the older one (or the first, if both are the same age). The database API already defines a way to look up a person by name, which returns NULL if the requested name isn’t found. Our function should return NULL if either name cannot be found.</p><p>The imperative-style implementation would look like this:</p><blog-code syntax=c><pre>
Maybe<Person> findOldest(Database db, Name name1, Name name2) {
Maybe<Person> maybePerson1 = findByName(db, name1);
if (maybePerson1 == Nothing) {
return Nothing;
}
Person person1 = maybePerson1.Value();
Maybe<Person> maybePerson2 = findByName(db, name2);
if (maybePerson2 == Nothing) {
return Nothing;
}
Person person2 = maybePerson2.Value();
if (person1.BirthDate > person2.BirthDate) {
return person2;
}
return person1;
}
</pre></blog-code><p>Other than verbosity, this code is fairly readable, because the language has built-in support for returning early from a function.</p><p>But what happens if we write it in a declarative-style language? In most declarative languages, a function is a single expression, and every branch must return a value. This prevents the programmer from returning early.</p><blog-code syntax=haskell><pre>
findOldest :: Database -> Name -> Name -> Maybe Person
findOldest db name1 name2 =
case findByName db name1 of
Nothing -> Nothing
Just person1 -> case findByName db name2 of
Nothing -> Nothing
Just person2 -> if birthDate person1 > birthDate person2
then Just person2
else Just person1
</pre></blog-code><p>You can see that each time the programmer wants to check for an empty result, they have to add another level of indentation. This sort of code quickly becomes difficult to read and maintain.</p><p>Monad provides a solution, because it allows the programmer to customize how to handle functions that return an empty value. If the programmer instructs the compiler to return early when an empty value is encountered, then she doesn’t have to write all those checks manually.</p><blog-code syntax=haskell><pre>
instance Monad Maybe where
return x = Just x
val >>= f = case val of
Nothing -> Nothing
Just x -> f x
findOldest :: Database -> Name -> Name -> Maybe Person
findOldest db name1 name2 = do
person1 <- findByName db name1
person2 <- findByName db name2
return (if birthDate person1 > birthDate person2
then person2
else person1)
</pre></blog-code><p>The same pattern extends to all sorts of code that needs extra control over the execution sequence. With only a small tweak, the instance for Maybe can be used for Either:</p><blog-code syntax=haskell><pre>
instance Monad (Either a) where
return x = Right x
val >>= f = case val of
Left err -> Left err
Right x -> f x
</pre></blog-code><p>Monad instances can also be defined for types which change how functions are called, rather than whether they are called. The Reader type adds an implicit environment to a function:</p><blog-code syntax=haskell><pre>
data Reader env a = Reader (env -> a)
ask :: Reader env env
ask = Reader (\env -> env)
runReader :: Reader env a -> env -> a
runReader (Reader run) env = run env
instance Monad (Reader env) where
return x = Reader (\_ -> x)
val >>= f = Reader (\env -> runReader (f (runReader val env)) env)
</pre></blog-code><p>Either is used instead of exceptions, and Reader is used instead of global state, because their behavior can be mechanically checked by the compiler. By offloading concerns about error and state handling to the compiler, the programmer can focus on the higher-level behavior of their program.</p><p>Exercise for the reader: Given this definition, how would you define a Monad instance for the State type?</p><blog-code syntax=haskell><pre>
data State st a = State (st -> (st, a))
get :: State st st
get = State (\st -> (st, st))
put :: st -> State st ()
put st = State (\_ -> (st, ()))
</pre></blog-code></blog-article>2010-02-23T00:57:33ZUnderstanding Iteratees2010-01-18T19:40:37Zurn:uuid:5bdda5f1-7d26-4add-b17a-5c138725ba19<blog-article posted=2010-01-18T19:40:37Z><h1 slot=title>Understanding Iteratees</h1><p><a href=http://okmij.org/ftp/Streams.html>Iteratees</a> are an abstraction discovered by Oleg Kiselyov, which provide a performant, predictable, and safe alternative to lazy I/O. Though the data types involved are simple, their relationship to incremental processing is not obvious, and existing documentation ranges in quality from merely dense to outright baffling. This article attempts to clarify the concepts and use underlying iteratees.</p><p>Please note that these are my notes, as I attempt to implement iteratee–based libraries. I may have misunderstood minor or major parts of iteratees. If in doubt, the final authority is Oleg -- though understanding his answers requires a saving throw vs. confusion. Please <a href=mailto:john@john-millikin.com>e-mail me</a> any comments or suggestions.</p><p><i>2010–08–19: the code available in this article has been expanded and packaged as the <a href=/software/haskell-enumerator>enumerator</a> library.</i></p><blog-section id=iteratees-vs-lazy-io><h3 slot=title>Iteratees vs. Lazy I/O</h3><p>Lazy I/O – eg, <code>hGetContents</code> and friends – is known to have several shortcomings. Most notably, IO errors can occur in pure code and <code>Handle</code>s may remain open for arbitrary periods of time. Oleg notes<blog-footnote-ref>[<a href=#fn:1>1</a>]</blog-footnote-ref> that this can lead to unexpected failures, due to resource exhaustion.</p><p>Iteratees do not suffer from these problems. Their resource use is bounded and predictable, and the type system provides guarantees that limited resources are released when no longer needed. Notably, iteratees can process arbitrarily large inputs in constant space.</p></blog-section><blog-section><h3 slot=title>Implementing iteratees</h3><p>There are at least five generic iteratee libraries, each with differing type signatures and semantics. Oleg's <a href=http://okmij.org/ftp/Haskell/Iteratee/Iteratee.hs>Iteratee.hs</a>, <a href=http://okmij.org/ftp/Haskell/Iteratee/IterateeM.hs>IterateeM.hs</a>, & <a href=http://okmij.org/ftp/Haskell/Iteratee/IterateeMCPS.hs>IterateeMCPS.hs</a>, John Lato's <a href=http://hackage.haskell.org/package/iteratee><i>iteratee</i> package</a>, and a <a href=http://therning.org/magnus/archives/735>post by Per Magnus Therning</a>.</p><p>This page documents a sixth implementation, based on IterateeM, with simplified error handling and naming conventions (hopefully) more obvious to the average Haskell programmer.</p><blog-code syntax=haskell><pre>
data Chunk a
= Chunk [a]
| EOF
deriving (Show, Eq)
data Step e a m b
= Continue (Chunk a -> Iteratee e a m b)
| Yield b (Chunk a)
| Error e
newtype Iteratee e a m b = Iteratee
{ runIteratee :: m (Step e a m b)
}
</pre></blog-code><p>In general, an iteratee begins in the <code>Continue</code> state. As each chunk is passed to the continuation, the iteratee may return the next step, which is one of:</p><style>dt{margin:1em}</style><dl><dt><code>Continue</code></dt><dd>The iteratee requires more input before it can produce a result.</dd><dt><code>Yield</code></dt><dd>The iteratee has received enough input to generate a result, along with left–over input. If the iteratee will no longer accept input, it should yield <code>EOF</code>. If no input remains, but the iteratee can still accept more, it should yield <code>Chunk []</code>.</dd><dt><code>Error</code></dt><dd>The iteratee experienced an error which prevents it from proceeding further. The type of error contained will depend on the enumerator and/or iteratee – common choices are <code>String</code> and <code>SomeException</code>.</dd></dl><p>Based on these semantics, some simple instances can be created:</p><blog-code syntax=haskell><pre>
instance Monoid (Chunk a) where
mempty = Chunk []
mappend (Chunk xs) (Chunk ys) = Chunk $ xs ++ ys
mappend _ _ = EOF
instance Functor Chunk where
fmap _ EOF = EOF
fmap f (Chunk xs) = Chunk $ map f xs
instance (Show a, Show b, Show e) => Show (Step e a m b) where
showsPrec d step = showParen (d > 10) $ case step of
(Continue _) -> s "Continue"
(Yield b chunk) -> s "Yield " . sp b . s " " . sp chunk
(Error err) -> s "Error " . sp err
where
s = showString
sp :: Show a => a -> ShowS
sp = showsPrec 11
</pre></blog-code><p>Slightly more complex is the <code>Monad</code> instance for iteratees. The first iteratee is run, and if it yielded a value, that value is fed into the second iteratee.</p><blog-code syntax=haskell><pre>
instance Monad m => Monad (Iteratee e a m) where
return x = Iteratee . return . Yield x $ Chunk []
m >>= f = Iteratee $ runIteratee m >>= \mStep -> case mStep of
Continue k -> return $ Continue ((>>= f) . k)
Error err -> return $ Error err
Yield x (Chunk []) -> runIteratee $ f x
Yield x chunk -> runIteratee (f x) >>= \r -> case r of
Continue k -> runIteratee $ k chunk
Error err -> return $ Error err
-- runIteratee (f x) does not consume any input; if it
-- returns Yield, then its "extra" input must be
-- (Chunk []) and can be ignored.
Yield x' _ -> return $ Yield x' chunk
instance MonadTrans (Iteratee e a) where
lift m = Iteratee $ m >>= runIteratee . return
instance MonadIO m => MonadIO (Iteratee e a m) where
liftIO = lift . liftIO
instance Monad m => Functor (Iteratee e a m) where
fmap f i = i >>= return . f
</pre></blog-code><p>Next, lets define a few simple primitive combinators for building iteratees from pure functions:</p><blog-code syntax=haskell><pre>
returnI :: Monad m => Step e a m b -> Iteratee e a m b
returnI = Iteratee . return
liftI :: Monad m => (Chunk a -> Step e a m b) -> Iteratee e a m b
liftI k = returnI $ Continue (returnI . k)
yield :: Monad m => b -> Chunk a -> Iteratee e a m b
yield x chunk = returnI $ Yield x chunk
continue :: Monad m => (Chunk a -> Iteratee e a m b) -> Iteratee e a m b
continue k = returnI $ Continue k
throwError :: Monad m => e -> Iteratee e a m b
throwError err = returnI $ Error err
</pre></blog-code><p>These combinators are sufficient to define simple iteratees; for example, a variation of <code>dropWhile</code>:</p><blog-code syntax=haskell><pre>
-- import Prelude hiding (dropWhile)
-- import qualified Prelude as Prelude
dropWhile :: Monad m => (a -> Bool) -> Iteratee e a m ()
dropWhile f = liftI step where
step (Chunk xs) = case Prelude.dropWhile f xs of
[] -> Continue $ returnI . step
xs' -> Yield () (Chunk xs')
step EOF = Yield () EOF
</pre></blog-code><p>Or an iteratee for printing received chunks to stdout, useful for debugging:</p><blog-code syntax=haskell><pre>
printChunks :: MonadIO m => Show a => Bool -> Iteratee e a m ()
printChunks printEmpty = continue step where
step (Chunk []) | not printEmpty = continue step
step (Chunk xs) = liftIO (print xs) >> continue step
step EOF = liftIO (putStrLn "EOF") >> yield () EOF
</pre></blog-code><p>Finally, to extract the final result from an iteratee, it's sufficient to feed it <code>EOF</code> and check the returned <code>Step</code>. Note that a "well–behaved" iteratee continuation will always return <code>Yield</code> or <code>Error</code> in response to <code>EOF</code> – iteratees which return <code>Continue</code> may loop forever, depending on their monadic behavior.</p><blog-code syntax=haskell><pre>
run :: Monad m => Iteratee e a m b -> m (Either e b)
run i = runIteratee i >>= check where
check (Continue k) = runIteratee (k EOF) >>= check
check (Yield x _) = return $ Right x
check (Error e) = return $ Left e
</pre></blog-code></blog-section><blog-section><h3 slot=title>Enumerators</h3><p>Iteratees consume data from a sequence of input chunks. To generate those chunks, we define <i>enumerators</i> (and enumerator composition operators).</p><blog-code syntax=haskell><pre>
type Enumerator e a m b = Step e a m b -> Iteratee e a m b
infixl 1 >>==, ==<<
(>>==) :: Monad m =>
Iteratee e a m b ->
(Step e a m b -> Iteratee e a' m b') ->
Iteratee e a' m b'
m >>== f = Iteratee (runIteratee m >>= runIteratee . f)
(==<<):: Monad m =>
(Step e a m b -> Iteratee e a' m b') ->
Iteratee e a m b ->
Iteratee e a' m b'
f ==<< m = m >>== f
</pre></blog-code><p>Note that the <code>Enumerator</code> type is semantically equivalent to:</p><blog-code syntax=haskell><pre>
type Enumerator e a m b = Step e a m b -> m (Step e a m b)
</pre></blog-code><p>Simple enumerators can be defined in terms of existing combinators. The basic format of an enumerator is that when it receives a <code>Continue</code> step, it passes a chunk to the continuation to generate its returned iteratee. Other step types are passed through unchanged.</p><blog-code syntax=haskell><pre>
enumList :: Monad m => [a] -> Enumerator e a m b
enumList xs (Continue k) = case xs of
[] -> k EOF
(x:xs') -> k (Chunk [x]) >>== enumList xs'
enumList _ step = returnI step
</pre></blog-code><p>More complex enumerators require building the result manually. Note that while the recursive step is much larger in this example, the fundamental layout (loop on <code>Continue</code>, pass on others) remains.</p><blog-code syntax=haskell><pre>
enumHandle :: Handle -> Enumerator String ByteString IO b
enumHandle h = Iteratee . allocaBytes bufferSize . loop where
bufferSize = 4096
loop (Continue k) = do_read k
loop step = const $ return step
do_read k p = do
n <- try $ hGetBuf h p bufferSize
case (n :: Either SomeException Int) of
Left err -> return $ Error $ show err
Right 0 -> return $ Continue k
Right n' -> do
bytes <- packCStringLen (p, n')
step <- runIteratee (k (Chunk [bytes]))
loop step p
</pre></blog-code><p>In some cases, it might make more sense to define this enumerator in terms of bytes rather than byte strings. The required changes are minor – the bytes are stored directly in the <code>Chunk</code> list.</p><blog-code syntax=haskell><pre>
enumHandle :: Handle -> Enumerator String Word8 IO b
…
Right n' -> do
bytes <- F.peekArray n' p
step <- runIteratee (k (Chunk bytes))
loop step p
</pre></blog-code></blog-section><blog-section><h3 slot=title>Enumeratees</h3><p>Enumerators generate data, iteratees consume it. When a value needs to generate a stream using another stream as input, it is named an <i>enumeratee</i>.</p><blog-code syntax=haskell><pre>
type Enumeratee e aOut aIn m b = Step e aIn m b -> Iteratee e aOut m (Step e aIn m b)
</pre></blog-code><p>Most interesting transformations in iteratee-based code are enumeratees. For example, <code>map</code> can be encoded as an enumeratee:</p><blog-code syntax=haskell><pre>
checkDone :: Monad m =>
((Chunk a -> Iteratee e a m b) -> Iteratee e a' m (Step e a m b)) ->
Enumeratee e a' a m b
checkDone _ (Yield x chunk) = return $ Yield x chunk
checkDone f (Continue k) = f k
checkDone _ (Error err) = throwError err
mapI :: Monad m => (ao -> ai) -> Enumeratee e ao ai m b
mapI f = checkDone $ continue . step where
step k EOF = yield (Continue k) EOF
step k (Chunk []) = continue $ step k
step k chunk = k (fmap f chunk) >>== mapI f
</pre></blog-code><p>A more complex example: <code>sequenceI</code> converts an iteratee to an enumeratee, by feeding it input until it returns <code>EOF</code>. This is useful for chaining iteratees together, to support embedded streams.</p><blog-code syntax=haskell><pre>
finished :: Monad m => Iteratee e a m Bool
finished = liftI $ \chunk -> case chunk of
EOF -> Yield True EOF
_ -> Yield False chunk
sequenceI :: Monad m => Iteratee e ao m ai -> Enumeratee e ao ai m b
sequenceI i = checkDone check where
check k = finished >>= \f -> if f
then yield (Continue k) EOF
else step k
step k = i >>= \v -> k (Chunk [v]) >>== sequenceI i
</pre></blog-code><p>A join combinator is useful for "extracting" an output stream from an enumeratee's result.</p><blog-code syntax=haskell><pre>
joinI :: Monad m => Iteratee e a m (Step e a' m b) -> Iteratee e a m b
joinI outer = outer >>= check where
check (Continue k) = k EOF >>== check
check (Yield x _) = return x
check (Error e) = throwError e
</pre></blog-code></blog-section><blog-footnotes><hr><ol><li id=fn:1><p>Oleg Kiselyov – <a href=http://www.haskell.org/pipermail/haskell-cafe/2008-September/047738.html>Lazy vs correct IO</a></p></li></ol></blog-footnotes></blog-article>2010-01-18T19:40:37Z