The FUSE Protocol

Filesystem in Userspace (FUSE) is a protocol for implementing UNIX-style filesystems outside of the OS kernel. It was initially developed for Linux, and has seen some limited adoption by other kernels.

I wanted to write a library for the userspace side of FUSE as an exercise in learning Rust, but got stuck on a lack of documentation regarding the protocol, its versions, and how it varies across kernels. This page contains my notes on the FUSE protocol.

Similar documents elsewhere on the web:

Other resources:

  • SibylFS is a userspace test suite for filesystems.

Versions

The FUSE protocol is versioned with a (major, minor) tuple. Backwards compatibility is freely broken in "minor" releases, so it's not a SemVer-style version. I tend to think of it as separate "handshake version" and "protocol version", each being equivalent to a SemVer major version.

Protocol VersionLinux ReleaseDate
v7.22.6.142005-10-27
v7.3 (diff)2.6.152006-01-03
v7.6 (diff)2.6.162006-03-20
v7.7 (diff)2.6.182006-09-20
v7.8 (diff)2.6.202007-02-05
v7.9 (diff)2.6.242008-01-24
v7.10 (diff)2.6.282008-12-25
v7.11 (diff)2.6.292009-03-23
v7.12 (diff)2.6.312009-09-09
v7.13 (diff)2.6.322009-12-03
v7.14 (diff)2.6.352010-08-01
v7.15 (diff)2.6.362010-10-20
v7.16 (diff)2.6.382011-03-14
v7.17 (diff)3.12011-10-24
v7.18 (diff)3.32012-03-18
v7.19 (diff)3.52012-07-21
v7.20 (diff)3.62012-09-30
v7.21 (diff)3.92013-04-28
v7.22 (diff)3.102013-06-30
v7.23 (diff)3.152014-06-08
v7.24 (diff)4.52016-03-13
v7.25 (diff)4.72016-07-24
v7.26 (diff)4.92016-12-11
v7.27 (diff)4.182018-08-12
v7.28 (diff)4.202018-12-23
v7.29 (diff)5.12019-05-05
v7.31 (diff)5.22019-07-07
v7.32 (diff)5.102020-12-13
v7.33 (diff)5.112021-02-14
v7.34 (diff)5.142021-08-29
v7.35 (diff)5.162022-01-09
v7.36 (diff)5.172022-03-20
v7.37 (diff)6.12022-12-11
v7.38 (diff)6.22023-02-19

Wire Format

TODO

{ "name": "fuse_request_header", "fields": [ {"name": "length", "type": "u32"}, {"name": "opcode", "type": "u32"}, {"name": "request_id", "type": "u64"}, {"name": "node_id", "type": "u64"}, {"name": "user_id", "type": "u32"}, {"name": "group_id", "type": "u32"}, {"name": "task_id", "type": "u32"}, {"name": "padding", "type": "u32"} ] } { "name": "fuse_response_header", "fields": [ {"name": "length", "type": "u32"}, {"name": "error", "type": "i32"}, {"name": "request_id", "type": "u64"} ] }

TODO

Mounting

/dev/fuse

TODO

fusermount

TODO

Special Topics

Extended Attributes

Extended attributes or "xattrs" are key-value items that may be associated with filesystem nodes. Keys are C-style null-terminated strings; values are arbitrary byte blobs. See xattr(7) for more details on their use and semantics.

FUSE supports xattrs through four opcodes that directly map to libattr functions, documented by:

Because no UNIX API would be complete without some sharp corners to stub your toes on, the libattr authors invented ENOATTR. There no no such error code defined in the POSIX standard and it's not guaranteed to be defined by system headers, so libattr defines ENOATTR equal to ENODATA if it's not already set:

ENOATTR
       The named attribute does not exist, or the process has no
       access to this attribute. (ENOATTR is defined to be a synonym
       for ENODATA in <attr/xattr.h>.)

Read that again!

The API of extended attributes depends on the content of third-party userland headers!

And if that's not enough, ENODATA is itself optional – UNIX systems that don't implement the XSI STREAMS Option Group might not have a definition of ENODATA. FreeBSD is in this category.

PlatformENODATAENOATTR
Linux (x86-64)61
Linux (sparc)111
FreeBSD x86-64)87

In practice I've found it easiest to hardcode the error behavior to whatever that platform's native filesystems do, even if the resulting behavior deviates from the libattr manpages.

If this isn't handled well by the FUSE library, then filesystem authors will try to do it themselves and probably get it wrong. See [tech-kern@netbsd.org] ENOATTR vs ENODATA for the trouble caused by a filesystem assuming ENODATA == ENOATTR.

CUSE

Character Devices in Userspace (CUSE) lets a FUSE server export operations as a Linux character device instead of a filesystem. Most of the behavior is the same, and the CUSE "mount" acts like a filesystem containing a single file.

Differences from standard FUSE:

  • The server must open /dev/cuse directly, there isn't a suid helper like for filesystem mounts.
  • The kernel handshakes with CUSE_INIT.
  • The protocol has a reduced set of opcodes, primarily FUSE_READ, FUSE_WRITE, and FUSE_IOCTL.

Block Devices (fuseblk)

TODO

This seems to exist so the kernel can use a FUSE server to interpret bytes on a block device. ntfs-3g is the main user?

Debugging

TODO

The user can mount a "control filesystem" to inspect FUSE state and forcefully abort an existing FUSE server mount.

mount -t fusectl none /sys/fs/fuse/connections
ls /sys/fs/fuse/connections
# 42/  44/  46/  47/  48/  50/  51/  52/  53/
ls /sys/fs/fuse/connections/42
# abort  congestion_threshold  max_background  waiting

There's some basic docs at https://www.kernel.org/doc/Documentation/filesystems/fuse.txt

Multi-Threading

Background: When a file is opened, the Linux kernel creates a "file description" for the I/O state, and returns a "file descriptor" to userland. That descriptor can be freely passed to the dup(2) functions to duplicate the descriptor, but the underlying description remains unary.

The FUSE kernel driver implicitly locks access to the /dev/fuse file descriptor so that each read() and write() syscall is atomic. This implies that multiple threads can safely share the descriptor, but also that they will face lock contention and reduced performance.

To get the best performance out of a multi-threaded filesystem server, open /dev/fuse once as a "session FD" and again in each thread as "worker FDs". After initializing the session with a standard FUSE handshake, the workers can be associated with the session by calling ioctl(worker_fd, FUSE_DEV_IOC_CLONE, &session_fd).

This allows multiple threads to serve FUSE requests without contending for the descriptor lock.

POSIX ACLs

TODO

Notifications

TODO

Locks

TODO

Opcodes

namevalueversion
FUSE_LOOKUP1
FUSE_FORGET2
FUSE_GETATTR3
FUSE_SETATTR4
FUSE_READLINK5
FUSE_SYMLINK6
7
FUSE_MKNOD8
FUSE_MKDIR9
FUSE_UNLINK10
FUSE_RMDIR11
FUSE_RENAME12
FUSE_LINK13
FUSE_OPEN14
FUSE_READ15
FUSE_WRITE16
FUSE_STATFS17
FUSE_RELEASE18
19
FUSE_FSYNC20
FUSE_SETXATTR21
FUSE_GETXATTR22
FUSE_LISTXATTR23
FUSE_REMOVEXATTR24
FUSE_FLUSH25
FUSE_INIT26
FUSE_OPENDIR27
FUSE_READDIR28
FUSE_RELEASEDIR29
FUSE_FSYNCDIR30
FUSE_GETLK31v7.7
FUSE_SETLK32v7.7
FUSE_SETLKW33v7.7
FUSE_ACCESS34v7.3
FUSE_CREATE35v7.3
FUSE_INTERRUPT36v7.7
FUSE_BMAP37v7.8
FUSE_DESTROY38v7.8
FUSE_IOCTL39v7.11
FUSE_POLL40v7.11
FUSE_NOTIFY_REPLY41v7.15
FUSE_BATCH_FORGET42v7.16
FUSE_FALLOCATE42v7.19
FUSE_READDIRPLUS44v7.21
FUSE_RENAME245v7.23
FUSE_LSEEK46v7.24
FUSE_COPY_FILE_RANGE47v7.28
CUSE_INIT4096v7.12

FUSE_ACCESS

If the default_permissions mount option is unset, the kernel will delegate permission checks to the FUSE server.

{ "name": "fuse_access_request", "fields": [ {"embed": "fuse_request_header", "size": 40}, {"name": "mode", "type": "u32"}, {"name": "padding", "type": "u32"} ] }

The mode is a bitmask of requested operations, matching the semantics of the POSIX access() syscall.

PermissionMode bit
execute0x1
write0x2
read0x4

The response body is empty, but the return value is significant:

  • Returning 0 means the access is allowed.
  • Returning -ENOSYS means the access is allowed, and all future accesses are also allowed. The kernel may skip sending further access calls to the FUSE server.
  • Returning -EACCES means the access is denied due to lack of permissions.

Other return codes are OS-dependent.

FUSE_BATCH_FORGET

TODO

{ "name": "fuse_batch_forget_in", "fields": [ {"name": "count", "type": "u32"}, {"name": "padding", "type": "u32"}, {"name": "node_id[0]", "type": "u64"}, {"name": "nlookup[0]", "type": "u64"}, {"name": "[...]", "type": "string"}, {"name": "node_id[count-1]", "type": "u64"}, {"name": "nlookup[count-1]", "type": "u64"} ] }

FUSE_BMAP

TODO

FUSE_CREATE

TODO

FUSE_COPY_FILE_RANGE

TODO

FUSE_DESTROY

Sent just before the kernel unmounts the filesystem. Might be received by the server after the kernel has terminated the session.

No request or response.

FUSE_FALLOCATE

TODO

FUSE_FLUSH

TODO

FUSE_FORGET

Reduces the reference count of a lookup'd inode.

{ "name": "fuse_forget_in", "fields": [ {"name": "nlookup", "type": "u64"} ] }

FUSE_FSYNC

TODO

FUSE_FSYNCDIR

TODO

FUSE_GETATTR

TODO

FUSE_GETLK

TODO

FUSE_GETXATTR

{ "name": "FUSE_GETXATTR", "fields": [ {"name": "size", "type": "u32"}, {"name": "padding", "type": "u32"} ] }

FUSE_INIT

{ "name": "FUSE_INIT (v7.2)", "fields": [ {"embed": "fuse_request_header", "size": 40}, {"name": "major", "type": "u32"}, {"name": "minor", "type": "u32"} ] } { "name": "FUSE_INIT (v7.6)", "fields": [ {"embed": "fuse_request_header", "size": 40}, {"name": "major", "type": "u32"}, {"name": "minor", "type": "u32"}, {"name": "max_readahead", "type": "u32"}, {"name": "flags", "type": "u32"} ] }

Negotiated features ("flags"):

featurebitmaskversion
FUSE_ASYNC_READ0x1v7.6
FUSE_POSIX_LOCKS0x2v7.7
FUSE_FILE_OPS0x4v7.9
FUSE_ATOMIC_O_TRUNC0x8v7.9
FUSE_EXPORT_SUPPORT0x10v7.10
FUSE_BIG_WRITES0x20v7.10
FUSE_DONT_MASK0x40v7.12
FUSE_SPLICE_WRITE0x80v7.20
FUSE_SPLICE_MOVE0x100v7.20
FUSE_SPLICE_READ0x200v7.20
FUSE_FLOCK_LOCKS0x400v7.17
FUSE_HAS_IOCTL_DIR0x800v7.20
FUSE_AUTO_INVAL_DATA0x1000v7.20
FUSE_DO_READDIRPLUS0x2000v7.21
FUSE_READDIRPLUS_AUTO0x4000v7.21
FUSE_ASYNC_DIO0x8000v7.22
FUSE_WRITEBACK_CACHE0x10000v7.23
FUSE_NO_OPEN_SUPPORT0x20000v7.24
FUSE_PARALLEL_DIROPS0x40000v7.25
FUSE_HANDLE_KILLPRIV0x80000v7.26
FUSE_POSIX_ACL0x100000v7.26
FUSE_ABORT_ERROR0x200000v7.27
FUSE_MAX_PAGES0x400000v7.28
FUSE_CACHE_SYMLINKS0x800000v7.28
FUSE_NO_OPENDIR_SUPPORT0x1000000v7.29
FUSE_EXPLICIT_INVAL_DATA0x2000000v7.30
{ "name": "fuse_init_out (v7.2)", "fields": [ {"name": "major", "type": "u32"}, {"name": "minor", "type": "u32"} ] } { "name": "fuse_init_out (v7.6)", "fields": [ {"name": "major", "type": "u32"}, {"name": "minor", "type": "u32"}, {"name": "max_readahead", "type": "u32"}, {"name": "flags", "type": "u32"}, {"name": "padding", "type": "u32"}, {"name": "max_write", "type": "u32"} ] } { "name": "fuse_init_out (v7.13)", "fields": [ {"name": "major", "type": "u32"}, {"name": "minor", "type": "u32"}, {"name": "max_readahead", "type": "u32"}, {"name": "flags", "type": "u32"}, {"name": "max_background", "type": "u16"}, {"name": "congestion_threshold", "type": "u16"}, {"name": "max_write", "type": "u32"} ] } { "name": "fuse_init_out (v7.23)", "fields": [ {"name": "major", "type": "u32"}, {"name": "minor", "type": "u32"}, {"name": "max_readahead", "type": "u32"}, {"name": "flags", "type": "u32"}, {"name": "max_background", "type": "u16"}, {"name": "congestion_threshold", "type": "u16"}, {"name": "max_write", "type": "u32"}, {"name": "time_gran", "type": "u32"}, {"name": "padding", "type": "u32"}, {"name": "padding", "type": "u64"}, {"name": "padding", "type": "u64"}, {"name": "padding", "type": "u64"}, {"name": "padding", "type": "u64"} ] }

FUSE_INTERRUPT

TODO

FUSE_IOCTL

TODO

FUSE_LINK

TODO

FUSE_LISTXATTR

TODO

FUSE_LOOKUP

The request is a NUL-terminated bytestring. Incoming name length is constrained to some maximum length by the kernel:

  • Linux: 1024 (FUSE_NAME_MAX)
  • FreeBSD: 255 (MAXNAMLEN)

Response is a fuse_entry_out.

Notes:

  • There's two places the inode can be written, which get saved in different kernel data structures. I couldn't figure out what happens if they're different, but probably nothing good.
  • The FUSE inode IDs are always 64-bit, but kernels usually have an inode sized to the machine word. Linux XORs the high and low halves of the u64, while BSD just assigns u64 to u32 and lets the compiler do what it wants.
  • In early versions of FUSE, fuse_entry_out::nodeid had to be non-zero. Lookup failure was handled by ENOENT only. This restriction was lifted in v7.6, so that a lookup response with nodeid == 0 meant a cacheable lookup failure.
  • Each successful lookup increments the node's reference count, which is decremented by FUSE_FORGET.
    • Exception: If fuse_attr::mode isn't a valid file type (S_REG etc), the kernel will drop the response and won't enqueue a FUSE_FORGET. A server that thought the response was successful would be stuck with that refcount forever.
    • This seems like a kernel bug, TODO report and send a patch.
  • Changes in the size of fuse_attr propagate to all the structs that contain it, including fuse_attr_out. The fuse kernel header has constants like FUSE_COMPAT_ENTRY_OUT_SIZE set to the "old" struct size.
  • It looks like returning nodeid: 1 might also send an EIO to the client, because this node ID is reserved for the root node.

FUSE_LSEEK

TODO

FUSE_MKDIR

TODO

FUSE_MKNOD

TODO

FUSE_NOTIFY_REPLY

TODO

FUSE_OPEN

TODO

FUSE_OPENDIR

TODO

FUSE_POLL

TODO

FUSE_READ

TODO

FUSE_READDIR

TODO

FUSE_READDIRPLUS

TODO

https://tools.ietf.org/html/rfc1813#section-3.3.17

3.3.17 Procedure 17: READDIRPLUS - Extended read from directory

FUSE_READLINK

TODO

FUSE_RELEASE

{ "name": "FUSE_RELEASE (v7.2)", "fields": [ {"embed": "fuse_request_header", "size": 40}, {"name": "fh", "type": "u64"}, {"name": "flags", "type": "u32"}, {"name": "padding", "type": "u32"} ] } { "name": "FUSE_RELEASE (v7.8)", "fields": [ {"embed": "fuse_request_header", "size": 40}, {"name": "fh", "type": "u64"}, {"name": "flags", "type": "u32"}, {"name": "release_flags", "type": "u32"}, {"name": "lock_owner", "type": "u64"} ] }

TODO

FUSE_RELEASEDIR

TODO

FUSE_REMOVEXATTR

TODO

FUSE_RENAME

{ "name": "fuse_rename_request", "fields": [ {"embed": "fuse_request_header", "size": 40}, {"name": "newdir", "type": "u64"}, {"name": "old_name", "type": "string"}, {"name": "new_name", "type": "string"} ] }

FUSE_RENAME2

{ "name": "fuse_rename2_request", "fields": [ {"embed": "fuse_request_header", "size": 40}, {"name": "newdir", "type": "u64"}, {"name": "flags", "type": "u32"}, {"name": "padding", "type": "u32"}, {"name": "old_name", "type": "string"}, {"name": "new_name", "type": "string"} ] }

TODO

https://lwn.net/Articles/606237/

https://www.spinics.net/lists/linux-fsdevel/msg72068.html

https://www.systutorials.com/docs/linux/man/2-renameat2/

FUSE_RMDIR

TODO

FUSE_SETATTR

TODO

FUSE_SETLK

TODO

https://sourceforge.net/p/fuse/mailman/message/35018434/

FUSE supports mandatory locking and BSD flock if that's your thing < http://0pointer.de/blog/projects/locking.html >.

FUSE_SETLKW

TODO

FUSE_SETXATTR

{ "name": "FUSE_SETXATTR", "fields": [ {"embed": "fuse_request_header", "size": 40}, {"name": "size", "type": "u32"}, {"name": "flags", "type": "u32"} ] }

FUSE_STATFS

TODO

FUSE_SYMLINK

TODO

FUSE_UNLINK

TODO

FUSE_WRITE

TODO

CUSE_INIT

TODO

Appendix A: Structs

fuse_attr

{ "name": "fuse_attr (v7.2)", "fields": [ {"name": "ino", "type": "u64"}, {"name": "size", "type": "u64"}, {"name": "blocks", "type": "u64"}, {"name": "atime", "type": "u64"}, {"name": "mtime", "type": "u64"}, {"name": "ctime", "type": "u64"}, {"name": "atimensec", "type": "u32"}, {"name": "mtimensec", "type": "u32"}, {"name": "ctimensec", "type": "u32"}, {"name": "mode", "type": "u32"}, {"name": "nlink", "type": "u32"}, {"name": "uid", "type": "u32"}, {"name": "gid", "type": "u32"}, {"name": "rdev", "type": "u32"} ] } { "name": "fuse_attr (v7.9)", "fields": [ {"name": "ino", "type": "u64"}, {"name": "size", "type": "u64"}, {"name": "blocks", "type": "u64"}, {"name": "atime", "type": "u64"}, {"name": "mtime", "type": "u64"}, {"name": "ctime", "type": "u64"}, {"name": "atimensec", "type": "u32"}, {"name": "mtimensec", "type": "u32"}, {"name": "ctimensec", "type": "u32"}, {"name": "mode", "type": "u32"}, {"name": "nlink", "type": "u32"}, {"name": "uid", "type": "u32"}, {"name": "gid", "type": "u32"}, {"name": "rdev", "type": "u32"}, {"name": "blksize", "type": "u32"}, {"name": "padding", "type": "u32"} ] }

fuse_entry_out

{ "name": "fuse_entry_out", "fields": [ {"name": "nodeid", "type": "u64"}, {"name": "generation", "type": "u64"}, {"name": "entry_valid", "type": "u64"}, {"name": "attr_valid", "type": "u64"}, {"name": "entry_valid_nsec", "type": "u32"}, {"name": "attr_valid_nsec", "type": "u32"}, {"embed": "fuse_attr", "size": 0} ] }
Change Feed