Filesystem in Userspace (FUSE) is a protocol for implementing UNIX-style filesystems outside of the OS kernel. It was initially developed for Linux, and has seen some limited adoption by other kernels.
I wanted to write a library for the userspace side of FUSE as an exercise in learning Rust, but got stuck on a lack of documentation regarding the protocol, its versions, and how it varies across kernels. This page contains my notes on the FUSE protocol.
Similar documents elsewhere on the web:
Other resources:
The FUSE protocol is versioned with a (major, minor) tuple. Backwards compatibility is freely broken in "minor" releases, so it's not a SemVer-style version. I tend to think of it as separate "handshake version" and "protocol version", each being equivalent to a SemVer major version.
Protocol Version | Linux Release | Date |
---|---|---|
v7.2 | 2.6.14 | 2005-10-27 |
v7.3 (diff) | 2.6.15 | 2006-01-03 |
v7.6 (diff) | 2.6.16 | 2006-03-20 |
v7.7 (diff) | 2.6.18 | 2006-09-20 |
v7.8 (diff) | 2.6.20 | 2007-02-05 |
v7.9 (diff) | 2.6.24 | 2008-01-24 |
v7.10 (diff) | 2.6.28 | 2008-12-25 |
v7.11 (diff) | 2.6.29 | 2009-03-23 |
v7.12 (diff) | 2.6.31 | 2009-09-09 |
v7.13 (diff) | 2.6.32 | 2009-12-03 |
v7.14 (diff) | 2.6.35 | 2010-08-01 |
v7.15 (diff) | 2.6.36 | 2010-10-20 |
v7.16 (diff) | 2.6.38 | 2011-03-14 |
v7.17 (diff) | 3.1 | 2011-10-24 |
v7.18 (diff) | 3.3 | 2012-03-18 |
v7.19 (diff) | 3.5 | 2012-07-21 |
v7.20 (diff) | 3.6 | 2012-09-30 |
v7.21 (diff) | 3.9 | 2013-04-28 |
v7.22 (diff) | 3.10 | 2013-06-30 |
v7.23 (diff) | 3.15 | 2014-06-08 |
v7.24 (diff) | 4.5 | 2016-03-13 |
v7.25 (diff) | 4.7 | 2016-07-24 |
v7.26 (diff) | 4.9 | 2016-12-11 |
v7.27 (diff) | 4.18 | 2018-08-12 |
v7.28 (diff) | 4.20 | 2018-12-23 |
v7.29 (diff) | 5.1 | 2019-05-05 |
v7.31 (diff) | 5.2 | 2019-07-07 |
v7.32 (diff) | 5.10 | 2020-12-13 |
v7.33 (diff) | 5.11 | 2021-02-14 |
v7.34 (diff) | 5.14 | 2021-08-29 |
v7.35 (diff) | 5.16 | 2022-01-09 |
v7.36 (diff) | 5.17 | 2022-03-20 |
v7.37 (diff) | 6.1 | 2022-12-11 |
v7.38 (diff) | 6.2 | 2023-02-19 |
TODO
TODO
TODO
TODO
Extended attributes or "xattrs" are key-value items that may be associated with filesystem nodes. Keys are C-style null-terminated strings; values are arbitrary byte blobs. See xattr(7) for more details on their use and semantics.
FUSE supports xattrs through four opcodes that directly map to libattr functions, documented by:
FUSE_GETXATTR
FUSE_SETXATTR
FUSE_LISTXATTR
FUSE_REMOVEXATTR
Because no UNIX API would be complete without some sharp corners to stub your toes on, the libattr authors invented ENOATTR
. There no no such error code defined in the POSIX standard and it's not guaranteed to be defined by system headers, so libattr defines ENOATTR
equal to ENODATA
if it's not already set:
ENOATTR The named attribute does not exist, or the process has no access to this attribute. (ENOATTR is defined to be a synonym for ENODATA in<attr/xattr.h>
.)
Read that again!
The API of extended attributes depends on the content of third-party userland headers!
And if that's not enough, ENODATA
is itself optional – UNIX systems that don't implement the XSI STREAMS Option Group might not have a definition of ENODATA
. FreeBSD is in this category.
Platform | ENODATA | ENOATTR |
---|---|---|
Linux (x86-64) | 61 | |
Linux (sparc) | 111 | |
FreeBSD x86-64) | 87 |
In practice I've found it easiest to hardcode the error behavior to whatever that platform's native filesystems do, even if the resulting behavior deviates from the libattr manpages.
If this isn't handled well by the FUSE library, then filesystem authors will try to do it themselves and probably get it wrong. See [tech-kern@netbsd.org] ENOATTR vs ENODATA for the trouble caused by a filesystem assuming ENODATA == ENOATTR.
Character Devices in Userspace (CUSE) lets a FUSE server export operations as a Linux character device instead of a filesystem. Most of the behavior is the same, and the CUSE "mount" acts like a filesystem containing a single file.
Differences from standard FUSE:
/dev/cuse
directly, there isn't a suid helper like for filesystem mounts.CUSE_INIT
.FUSE_READ
, FUSE_WRITE
, and FUSE_IOCTL
.TODO
This seems to exist so the kernel can use a FUSE server to interpret bytes on a block device. ntfs-3g is the main user?
TODO
The user can mount a "control filesystem" to inspect FUSE state and forcefully abort an existing FUSE server mount.
mount -t fusectl none /sys/fs/fuse/connections ls /sys/fs/fuse/connections # 42/ 44/ 46/ 47/ 48/ 50/ 51/ 52/ 53/ ls /sys/fs/fuse/connections/42 # abort congestion_threshold max_background waiting
There's some basic docs at https://www.kernel.org/doc/Documentation/filesystems/fuse.txt
Background: When a file is opened, the Linux kernel creates a "file description" for the I/O state, and returns a "file descriptor" to userland. That descriptor can be freely passed to the dup(2) functions to duplicate the descriptor, but the underlying description remains unary.
The FUSE kernel driver implicitly locks access to the /dev/fuse
file descriptor so that each read() and write() syscall is atomic. This implies that multiple threads can safely share the descriptor, but also that they will face lock contention and reduced performance.
To get the best performance out of a multi-threaded filesystem server, open /dev/fuse
once as a "session FD" and again in each thread as "worker FDs". After initializing the session with a standard FUSE handshake, the workers can be associated with the session by calling ioctl(worker_fd, FUSE_DEV_IOC_CLONE, &session_fd)
.
This allows multiple threads to serve FUSE requests without contending for the descriptor lock.
TODO
TODO
TODO
name | value | version | |
---|---|---|---|
FUSE_LOOKUP | 1 | ||
FUSE_FORGET | 2 | ||
FUSE_GETATTR | 3 | ||
FUSE_SETATTR | 4 | ||
FUSE_READLINK | 5 | ||
FUSE_SYMLINK | 6 | ||
7 | |||
FUSE_MKNOD | 8 | ||
FUSE_MKDIR | 9 | ||
FUSE_UNLINK | 10 | ||
FUSE_RMDIR | 11 | ||
FUSE_RENAME | 12 | ||
FUSE_LINK | 13 | ||
FUSE_OPEN | 14 | ||
FUSE_READ | 15 | ||
FUSE_WRITE | 16 | ||
FUSE_STATFS | 17 | ||
FUSE_RELEASE | 18 | ||
19 | |||
FUSE_FSYNC | 20 | ||
FUSE_SETXATTR | 21 | ||
FUSE_GETXATTR | 22 | ||
FUSE_LISTXATTR | 23 | ||
FUSE_REMOVEXATTR | 24 | ||
FUSE_FLUSH | 25 | ||
FUSE_INIT | 26 | ||
FUSE_OPENDIR | 27 | ||
FUSE_READDIR | 28 | ||
FUSE_RELEASEDIR | 29 | ||
FUSE_FSYNCDIR | 30 | ||
FUSE_GETLK | 31 | v7.7 | |
FUSE_SETLK | 32 | v7.7 | |
FUSE_SETLKW | 33 | v7.7 | |
FUSE_ACCESS | 34 | v7.3 | |
FUSE_CREATE | 35 | v7.3 | |
FUSE_INTERRUPT | 36 | v7.7 | |
FUSE_BMAP | 37 | v7.8 | |
FUSE_DESTROY | 38 | v7.8 | |
FUSE_IOCTL | 39 | v7.11 | |
FUSE_POLL | 40 | v7.11 | |
FUSE_NOTIFY_REPLY | 41 | v7.15 | |
FUSE_BATCH_FORGET | 42 | v7.16 | |
FUSE_FALLOCATE | 42 | v7.19 | |
FUSE_READDIRPLUS | 44 | v7.21 | |
FUSE_RENAME2 | 45 | v7.23 | |
FUSE_LSEEK | 46 | v7.24 | |
FUSE_COPY_FILE_RANGE | 47 | v7.28 | |
CUSE_INIT | 4096 | v7.12 |
If the default_permissions
mount option is unset, the kernel will delegate permission checks to the FUSE server.
The mode is a bitmask of requested operations, matching the semantics of the POSIX access() syscall.
Permission | Mode bit |
---|---|
execute | 0x1 |
write | 0x2 |
read | 0x4 |
The response body is empty, but the return value is significant:
-ENOSYS
means the access is allowed, and all future accesses are also allowed. The kernel may skip sending further access calls to the FUSE server.-EACCES
means the access is denied due to lack of permissions.Other return codes are OS-dependent.
TODO
TODO
TODO
TODO
Sent just before the kernel unmounts the filesystem. Might be received by the server after the kernel has terminated the session.
No request or response.
TODO
TODO
Reduces the reference count of a lookup'd inode.
TODO
TODO
TODO
TODO
Negotiated features ("flags"):
feature | bitmask | version |
---|---|---|
FUSE_ASYNC_READ | 0x1 | v7.6 |
FUSE_POSIX_LOCKS | 0x2 | v7.7 |
FUSE_FILE_OPS | 0x4 | v7.9 |
FUSE_ATOMIC_O_TRUNC | 0x8 | v7.9 |
FUSE_EXPORT_SUPPORT | 0x10 | v7.10 |
FUSE_BIG_WRITES | 0x20 | v7.10 |
FUSE_DONT_MASK | 0x40 | v7.12 |
FUSE_SPLICE_WRITE | 0x80 | v7.20 |
FUSE_SPLICE_MOVE | 0x100 | v7.20 |
FUSE_SPLICE_READ | 0x200 | v7.20 |
FUSE_FLOCK_LOCKS | 0x400 | v7.17 |
FUSE_HAS_IOCTL_DIR | 0x800 | v7.20 |
FUSE_AUTO_INVAL_DATA | 0x1000 | v7.20 |
FUSE_DO_READDIRPLUS | 0x2000 | v7.21 |
FUSE_READDIRPLUS_AUTO | 0x4000 | v7.21 |
FUSE_ASYNC_DIO | 0x8000 | v7.22 |
FUSE_WRITEBACK_CACHE | 0x10000 | v7.23 |
FUSE_NO_OPEN_SUPPORT | 0x20000 | v7.24 |
FUSE_PARALLEL_DIROPS | 0x40000 | v7.25 |
FUSE_HANDLE_KILLPRIV | 0x80000 | v7.26 |
FUSE_POSIX_ACL | 0x100000 | v7.26 |
FUSE_ABORT_ERROR | 0x200000 | v7.27 |
FUSE_MAX_PAGES | 0x400000 | v7.28 |
FUSE_CACHE_SYMLINKS | 0x800000 | v7.28 |
FUSE_NO_OPENDIR_SUPPORT | 0x1000000 | v7.29 |
FUSE_EXPLICIT_INVAL_DATA | 0x2000000 | v7.30 |
TODO
TODO
TODO
TODO
The request is a NUL-terminated bytestring. Incoming name length is constrained to some maximum length by the kernel:
FUSE_NAME_MAX
)MAXNAMLEN
)Response is a fuse_entry_out.
Notes:
fuse_entry_out::nodeid
had to be non-zero. Lookup failure was handled by ENOENT
only. This restriction was lifted in v7.6, so that a lookup response with nodeid == 0
meant a cacheable lookup failure.FUSE_FORGET
.fuse_attr::mode
isn't a valid file type (S_REG
etc), the kernel will drop the response and won't enqueue a FUSE_FORGET
. A server that thought the response was successful would be stuck with that refcount forever.fuse_attr
propagate to all the structs that contain it, including fuse_attr_out
. The fuse kernel header has constants like FUSE_COMPAT_ENTRY_OUT_SIZE
set to the "old" struct size.nodeid: 1
might also send an EIO to the client, because this node ID is reserved for the root node.TODO
TODO
TODO
TODO
TODO
TODO
TODO
TODO
TODO
TODO
https://tools.ietf.org/html/rfc1813#section-3.3.17
3.3.17 Procedure 17: READDIRPLUS - Extended read from directory
TODO
TODO
TODO
TODO
TODO
https://lwn.net/Articles/606237/
TODO
TODO
TODO
https://sourceforge.net/p/fuse/mailman/message/35018434/
FUSE supports mandatory locking and BSD flock if that's your thing < http://0pointer.de/blog/projects/locking.html >.
TODO
TODO
TODO
TODO
TODO
TODO