Filesystem in Userspace (FUSE) is a protocol for implementing UNIX-style filesystems outside of the OS kernel. It was initially developed for Linux, and has seen some limited adoption by other kernels.
I wanted to write a library for the userspace side of FUSE as an exercise in learning Rust, but got stuck on a lack of documentation regarding the protocol, its versions, and how it varies across kernels. This page contains my notes on the FUSE protocol.
Similar documents elsewhere on the web:
The FUSE protocol is versioned with a (major, minor) tuple. Backwards compatibility is freely broken in "minor" releases, so it's not a SemVer-style version. I tend to think of it as separate "handshake version" and "protocol version", each being equivalent to a SemVer major version.
In this version table, I use Linux as the reference version and note when other kernel-side implementations break Linux compatibility. In practice this seems to happen only for the Darwin (macOS) port, which is maintained as a third-party kernel module.
|v7.8-darwin1 (diff)||2008-04-25||MacFUSE 1.5|
|v7.8-darwin2 (diff)||2008-06-30||MacFUSE 1.7|
|v7.8-darwin3 (diff)||2008-12-08||MacFUSE 2.0|
|v7.19-darwin1 (diff)||2015-01-11||FUSE for OS X 3.0.0|
|v7.19-darwin2 (diff)||2015-03-09||FUSE for OS X 3.0.2|
FUSE was maintained as a "fusefs-kmod" port until it merged into the kernel in FreeBSD v10.0 as sys/fs/fuse/fuse_kernel.h
The FreeBSD developers have kept their implementation pinned to v7.8 to maintain ABI compatibility, instead of using FUSE version negotiation. There are occasional ABI-compatible changes relative to the Linux version, such as removing
fuse_mknod_in or splitting the xattr request/response types
The first FUSE implementation for Darwin was MacFUSE, written by Amit Singh
The MacFUSE kernel ABI was based on FUSE v7.8 but had a few Darwin-specific struct fields and opcodes. The last stable release of MacFUSE I can find was v2.0, released in 2008.
In 2011, Benjamin Fleischer forked MacFUSE into OSXFUSE and resumed development. In 2015 it was rebased to the v7.19 ABI, keeping the Darwin-specific fields and opcodes. As of 2018 there have been no further changes to the kernel ABI.
Extended attributes or "xattrs" are key-value items that may be associated with filesystem nodes. Keys are C-style null-terminated strings; values are arbitrary byte blobs. See xattr(7) for more details on their use and semantics.
FUSE supports xattrs through four opcodes that directly map to libattr functions, documented by:
Because no UNIX API would be complete without some sharp corners to stub your toes on, the libattr authors invented
ENOATTR. There no no such error code defined in the POSIX standard and it's not guaranteed to be defined by system headers, so libattr defines
ENOATTR equal to
ENODATA if it's not already set:
ENOATTR The named attribute does not exist, or the process has no access to this attribute. (ENOATTR is defined to be a synonym for ENODATA in
Read that again!
The API of extended attributes depends on the content of third-party userland headers!
And if that's not enough,
ENODATA is itself optional – UNIX systems that don't implement the XSI STREAMS Option Group might not have a definition of
ENODATA. FreeBSD is in this category.
In practice I've found it easiest to hardcode the error behavior to whatever that platform's native filesystems do, even if the resulting behavior deviates from the libattr manpages.
If this isn't handled well by the FUSE library, then filesystem authors will try to do it themselves and probably get it wrong. See [firstname.lastname@example.org] ENOATTR vs ENODATA for the trouble caused by a filesystem assuming ENODATA == ENOATTR.
Character Devices in Userspace (CUSE) lets a FUSE server export operations as a Linux character device instead of a filesystem. Most of the behavior is the same, and the CUSE "mount" acts like a filesystem containing a single file.
Differences from standard FUSE:
/dev/cusedirectly, there isn't a suid helper like for filesystem mounts.
This seems to exist so the kernel can use a FUSE server to interpret bytes on a block device. ntfs-3g is the main user?
The user can mount a "control filesystem" to inspect FUSE state and forcefully abort an existing FUSE server mount.
mount -t fusectl none /sys/fs/fuse/connections ls /sys/fs/fuse/connections # 42/ 44/ 46/ 47/ 48/ 50/ 51/ 52/ 53/ ls /sys/fs/fuse/connections/42 # abort congestion_threshold max_background waiting
There's some basic docs at https://www.kernel.org/doc/Documentation/filesystems/fuse.txt
Background: When a file is opened, the Linux kernel creates a "file description" for the I/O state, and returns a "file descriptor" to userland. That descriptor can be freely passed to the dup(2) functions to duplicate the descriptor, but the underlying description remains unary.
The FUSE kernel driver implicitly locks access to the
/dev/fuse file descriptor so that each read() and write() syscall is atomic. This implies that multiple threads can safely share the descriptor, but also that they will face lock contention and reduced performance.
To get the best performance out of a multi-threaded filesystem server, open
/dev/fuse once as a "session FD" and again in each thread as "worker FDs". After initializing the session with a standard FUSE handshake, the workers can be associated with the session by calling
ioctl(worker_fd, FUSE_DEV_IOC_CLONE, &session_fd).
This allows multiple threads to serve FUSE requests without contending for the descriptor lock.
default_permissions mount option is unset, the kernel will delegate permission checks to the FUSE server.
The mode is a bitmask of requested operations, matching the semantics of the POSIX access() syscall.
The response body is empty, but the return value is significant:
-ENOSYSmeans the access is allowed, and all future accesses are also allowed. The kernel may skip sending further access calls to the FUSE server.
-EACCESmeans the access is denied due to lack of permissions.
Other return codes are OS-dependent.
Sent just before the kernel unmounts the filesystem. Might be received by the server after the kernel has terminated the session.
No request or response.
Reduces the reference count of a lookup'd inode.
Negotiated features ("flags"):
The request is a NUL-terminated bytestring. Incoming name length is constrained to some maximum length by the kernel:
Response is a fuse_entry_out.
fuse_entry_out::nodeidhad to be non-zero. Lookup failure was handled by
ENOENTonly. This restriction was lifted in v7.6, so that a lookup response with
nodeid == 0meant a cacheable lookup failure.
fuse_attr::modeisn't a valid file type (
S_REGetc), the kernel will drop the response and won't enqueue a
FUSE_FORGET. A server that thought the response was successful would be stuck with that refcount forever.
fuse_attrpropagate to all the structs that contain it, including
fuse_attr_out. The fuse kernel header has constants like
FUSE_COMPAT_ENTRY_OUT_SIZEset to the "old" struct size.
nodeid: 1might also send an EIO to the client, because this node ID is reserved for the root node.
3.3.17 Procedure 17: READDIRPLUS - Extended read from directory
FUSE supports mandatory locking and BSD flock if that's your thing < http://0pointer.de/blog/projects/locking.html >.
The FreeBSD release notes describe their implementation as "state of the art", but the ABI was ~7 years behind Linux at the time it was released.
Amit worked at Google from 2006 to 2011, left to found a startup, got acquired by Facebook in 2013, and vanished from the Internet. I assume he's either racing yachts around a private island or trapped in Mark Zuckerberg's human zoo