The FUSE Protocol

Filesystem in Userspace (FUSE) is a protocol for implementing UNIX-style filesystems outside of the OS kernel. It was initially developed for Linux, and has seen some limited adoption by other kernels.

I wanted to write a library for the userspace side of FUSE as an exercise in learning Rust, but got stuck on a lack of documentation regarding the protocol, its versions, and how it varies across kernels. This page contains my notes on the FUSE protocol.

Versions

The FUSE protocol is versioned with a (major, minor) tuple. Backwards compatibility is freely broken in "minor" releases, so it's not a SemVer-style version. I tend to think of it as separate "handshake version" and "protocol version", each being equivalent to a SemVer major version.

Protocol Version	Linux Release	Date
v7.2	2.6.14	2005-10-27
v7.3 (diff)	2.6.15	2006-01-03
v7.6 (diff)	2.6.16	2006-03-20
v7.7 (diff)	2.6.18	2006-09-20
v7.8 (diff)	2.6.20	2007-02-05
v7.9 (diff)	2.6.24	2008-01-24
v7.10 (diff)	2.6.28	2008-12-25
v7.11 (diff)	2.6.29	2009-03-23
v7.12 (diff)	2.6.31	2009-09-09
v7.13 (diff)	2.6.32	2009-12-03
v7.14 (diff)	2.6.35	2010-08-01
v7.15 (diff)	2.6.36	2010-10-20
v7.16 (diff)	2.6.38	2011-03-14
v7.17 (diff)	3.1	2011-10-24
v7.18 (diff)	3.3	2012-03-18
v7.19 (diff)	3.5	2012-07-21
v7.20 (diff)	3.6	2012-09-30
v7.21 (diff)	3.9	2013-04-28
v7.22 (diff)	3.10	2013-06-30
v7.23 (diff)	3.15	2014-06-08
v7.24 (diff)	4.5	2016-03-13
v7.25 (diff)	4.7	2016-07-24
v7.26 (diff)	4.9	2016-12-11
v7.27 (diff)	4.18	2018-08-12
v7.28 (diff)	4.20	2018-12-23
v7.29 (diff)	5.1	2019-05-05
v7.31 (diff)	5.2	2019-07-07
v7.32 (diff)	5.10	2020-12-13
v7.33 (diff)	5.11	2021-02-14
v7.34 (diff)	5.14	2021-08-29
v7.35 (diff)	5.16	2022-01-09
v7.36 (diff)	5.17	2022-03-20
v7.37 (diff)	6.1	2022-12-11
v7.38 (diff)	6.2	2023-02-19

Wire Format

TODO

{ "name": "fuse_request_header", "fields": [ {"name": "length", "type": "u32"}, {"name": "opcode", "type": "u32"}, {"name": "request_id", "type": "u64"}, {"name": "node_id", "type": "u64"}, {"name": "user_id", "type": "u32"}, {"name": "group_id", "type": "u32"}, {"name": "task_id", "type": "u32"}, {"name": "padding", "type": "u32"} ] }

{ "name": "fuse_response_header", "fields": [ {"name": "length", "type": "u32"}, {"name": "error", "type": "i32"}, {"name": "request_id", "type": "u64"} ] }

TODO

Mounting

/dev/fuse

TODO

fusermount

TODO

Special Topics

Extended Attributes

Extended attributes or "xattrs" are key-value items that may be associated with filesystem nodes. Keys are C-style null-terminated strings; values are arbitrary byte blobs. See xattr(7) for more details on their use and semantics.

FUSE supports xattrs through four opcodes that directly map to libattr functions, documented by:

getxattr(2): FUSE_GETXATTR
setxattr(2): FUSE_SETXATTR
listxattr(2): FUSE_LISTXATTR
removexattr(2): FUSE_REMOVEXATTR

Because no UNIX API would be complete without some sharp corners to stub your toes on, the libattr authors invented ENOATTR. There no no such error code defined in the POSIX standard and it's not guaranteed to be defined by system headers, so libattr defines ENOATTR equal to ENODATA if it's not already set:

ENOATTR
       The named attribute does not exist, or the process has no
       access to this attribute. (ENOATTR is defined to be a synonym
       for ENODATA in <attr/xattr.h>.)

Read that again!

The API of extended attributes depends on the content of third-party userland headers!

And if that's not enough, ENODATA is itself optional – UNIX systems that don't implement the XSI STREAMS Option Group might not have a definition of ENODATA. FreeBSD is in this category.

Platform	`ENODATA`	`ENOATTR`
Linux (x86-64)	61
Linux (sparc)	111
FreeBSD x86-64)		87

In practice I've found it easiest to hardcode the error behavior to whatever that platform's native filesystems do, even if the resulting behavior deviates from the libattr manpages.

If this isn't handled well by the FUSE library, then filesystem authors will try to do it themselves and probably get it wrong. See [tech-kern@netbsd.org] ENOATTR vs ENODATA for the trouble caused by a filesystem assuming ENODATA == ENOATTR.

CUSE

Character Devices in Userspace (CUSE) lets a FUSE server export operations as a Linux character device instead of a filesystem. Most of the behavior is the same, and the CUSE "mount" acts like a filesystem containing a single file.

Differences from standard FUSE:

The server must open /dev/cuse directly, there isn't a suid helper like for filesystem mounts.
The kernel handshakes with CUSE_INIT.
The protocol has a reduced set of opcodes, primarily FUSE_READ, FUSE_WRITE, and FUSE_IOCTL.

Block Devices (fuseblk)

TODO

This seems to exist so the kernel can use a FUSE server to interpret bytes on a block device. ntfs-3g is the main user?

Debugging

TODO

The user can mount a "control filesystem" to inspect FUSE state and forcefully abort an existing FUSE server mount.

mount -t fusectl none /sys/fs/fuse/connections
ls /sys/fs/fuse/connections
# 42/  44/  46/  47/  48/  50/  51/  52/  53/
ls /sys/fs/fuse/connections/42
# abort  congestion_threshold  max_background  waiting

There's some basic docs at https://www.kernel.org/doc/Documentation/filesystems/fuse.txt

Multi-Threading

Background: When a file is opened, the Linux kernel creates a "file description" for the I/O state, and returns a "file descriptor" to userland. That descriptor can be freely passed to the dup(2) functions to duplicate the descriptor, but the underlying description remains unary.

The FUSE kernel driver implicitly locks access to the /dev/fuse file descriptor so that each read() and write() syscall is atomic. This implies that multiple threads can safely share the descriptor, but also that they will face lock contention and reduced performance.

To get the best performance out of a multi-threaded filesystem server, open /dev/fuse once as a "session FD" and again in each thread as "worker FDs". After initializing the session with a standard FUSE handshake, the workers can be associated with the session by calling ioctl(worker_fd, FUSE_DEV_IOC_CLONE, &session_fd).

This allows multiple threads to serve FUSE requests without contending for the descriptor lock.

POSIX ACLs

TODO

Notifications

TODO

Locks

TODO

Opcodes

name	value	version
FUSE_LOOKUP	1
FUSE_FORGET	2
FUSE_GETATTR	3
FUSE_SETATTR	4
FUSE_READLINK	5
FUSE_SYMLINK	6
	7
FUSE_MKNOD	8
FUSE_MKDIR	9
FUSE_UNLINK	10
FUSE_RMDIR	11
FUSE_RENAME	12
FUSE_LINK	13
FUSE_OPEN	14
FUSE_READ	15
FUSE_WRITE	16
FUSE_STATFS	17
FUSE_RELEASE	18
	19
FUSE_FSYNC	20
FUSE_SETXATTR	21
FUSE_GETXATTR	22
FUSE_LISTXATTR	23
FUSE_REMOVEXATTR	24
FUSE_FLUSH	25
FUSE_INIT	26
FUSE_OPENDIR	27
FUSE_READDIR	28
FUSE_RELEASEDIR	29
FUSE_FSYNCDIR	30
FUSE_GETLK	31	v7.7
FUSE_SETLK	32	v7.7
FUSE_SETLKW	33	v7.7
FUSE_ACCESS	34	v7.3
FUSE_CREATE	35	v7.3
FUSE_INTERRUPT	36	v7.7
FUSE_BMAP	37	v7.8
FUSE_DESTROY	38	v7.8
FUSE_IOCTL	39	v7.11
FUSE_POLL	40	v7.11
FUSE_NOTIFY_REPLY	41	v7.15
FUSE_BATCH_FORGET	42	v7.16
FUSE_FALLOCATE	42	v7.19
FUSE_READDIRPLUS	44	v7.21
FUSE_RENAME2	45	v7.23
FUSE_LSEEK	46	v7.24
FUSE_COPY_FILE_RANGE	47	v7.28

CUSE_INIT	4096	v7.12

FUSE_ACCESS

If the default_permissions mount option is unset, the kernel will delegate permission checks to the FUSE server.

{ "name": "fuse_access_request", "fields": [ {"embed": "fuse_request_header", "size": 40}, {"name": "mode", "type": "u32"}, {"name": "padding", "type": "u32"} ] }

The mode is a bitmask of requested operations, matching the semantics of the POSIX access() syscall.

Permission	Mode bit
execute	0x1
write	0x2
read	0x4

The response body is empty, but the return value is significant:

Returning 0 means the access is allowed.
Returning -ENOSYS means the access is allowed, and all future accesses are also allowed. The kernel may skip sending further access calls to the FUSE server.
Returning -EACCES means the access is denied due to lack of permissions.

Other return codes are OS-dependent.

FUSE_BATCH_FORGET

TODO

{ "name": "fuse_batch_forget_in", "fields": [ {"name": "count", "type": "u32"}, {"name": "padding", "type": "u32"}, {"name": "node_id[0]", "type": "u64"}, {"name": "nlookup[0]", "type": "u64"}, {"name": "[...]", "type": "string"}, {"name": "node_id[count-1]", "type": "u64"}, {"name": "nlookup[count-1]", "type": "u64"} ] }

FUSE_BMAP

TODO

FUSE_CREATE

TODO

FUSE_COPY_FILE_RANGE

TODO

FUSE_DESTROY

Sent just before the kernel unmounts the filesystem. Might be received by the server after the kernel has terminated the session.

No request or response.

FUSE_FALLOCATE

TODO

FUSE_FLUSH

TODO

FUSE_FORGET

Reduces the reference count of a lookup'd inode.

{ "name": "fuse_forget_in", "fields": [ {"name": "nlookup", "type": "u64"} ] }

FUSE_FSYNC

TODO

FUSE_FSYNCDIR

TODO

FUSE_GETATTR

TODO

FUSE_GETLK

TODO

FUSE_GETXATTR

{ "name": "FUSE_GETXATTR", "fields": [ {"name": "size", "type": "u32"}, {"name": "padding", "type": "u32"} ] }

FUSE_INIT

{ "name": "FUSE_INIT (v7.2)", "fields": [ {"embed": "fuse_request_header", "size": 40}, {"name": "major", "type": "u32"}, {"name": "minor", "type": "u32"} ] }

{ "name": "FUSE_INIT (v7.6)", "fields": [ {"embed": "fuse_request_header", "size": 40}, {"name": "major", "type": "u32"}, {"name": "minor", "type": "u32"}, {"name": "max_readahead", "type": "u32"}, {"name": "flags", "type": "u32"} ] }

Negotiated features ("flags"):

feature	bitmask	version
FUSE_ASYNC_READ	0x1	v7.6
FUSE_POSIX_LOCKS	0x2	v7.7
FUSE_FILE_OPS	0x4	v7.9
FUSE_ATOMIC_O_TRUNC	0x8	v7.9
FUSE_EXPORT_SUPPORT	0x10	v7.10
FUSE_BIG_WRITES	0x20	v7.10
FUSE_DONT_MASK	0x40	v7.12
FUSE_SPLICE_WRITE	0x80	v7.20
FUSE_SPLICE_MOVE	0x100	v7.20
FUSE_SPLICE_READ	0x200	v7.20
FUSE_FLOCK_LOCKS	0x400	v7.17
FUSE_HAS_IOCTL_DIR	0x800	v7.20
FUSE_AUTO_INVAL_DATA	0x1000	v7.20
FUSE_DO_READDIRPLUS	0x2000	v7.21
FUSE_READDIRPLUS_AUTO	0x4000	v7.21
FUSE_ASYNC_DIO	0x8000	v7.22
FUSE_WRITEBACK_CACHE	0x10000	v7.23
FUSE_NO_OPEN_SUPPORT	0x20000	v7.24
FUSE_PARALLEL_DIROPS	0x40000	v7.25
FUSE_HANDLE_KILLPRIV	0x80000	v7.26
FUSE_POSIX_ACL	0x100000	v7.26
FUSE_ABORT_ERROR	0x200000	v7.27
FUSE_MAX_PAGES	0x400000	v7.28
FUSE_CACHE_SYMLINKS	0x800000	v7.28
FUSE_NO_OPENDIR_SUPPORT	0x1000000	v7.29
FUSE_EXPLICIT_INVAL_DATA	0x2000000	v7.30

{ "name": "fuse_init_out (v7.2)", "fields": [ {"name": "major", "type": "u32"}, {"name": "minor", "type": "u32"} ] }

{ "name": "fuse_init_out (v7.6)", "fields": [ {"name": "major", "type": "u32"}, {"name": "minor", "type": "u32"}, {"name": "max_readahead", "type": "u32"}, {"name": "flags", "type": "u32"}, {"name": "padding", "type": "u32"}, {"name": "max_write", "type": "u32"} ] }

{ "name": "fuse_init_out (v7.13)", "fields": [ {"name": "major", "type": "u32"}, {"name": "minor", "type": "u32"}, {"name": "max_readahead", "type": "u32"}, {"name": "flags", "type": "u32"}, {"name": "max_background", "type": "u16"}, {"name": "congestion_threshold", "type": "u16"}, {"name": "max_write", "type": "u32"} ] }

{ "name": "fuse_init_out (v7.23)", "fields": [ {"name": "major", "type": "u32"}, {"name": "minor", "type": "u32"}, {"name": "max_readahead", "type": "u32"}, {"name": "flags", "type": "u32"}, {"name": "max_background", "type": "u16"}, {"name": "congestion_threshold", "type": "u16"}, {"name": "max_write", "type": "u32"}, {"name": "time_gran", "type": "u32"}, {"name": "padding", "type": "u32"}, {"name": "padding", "type": "u64"}, {"name": "padding", "type": "u64"}, {"name": "padding", "type": "u64"}, {"name": "padding", "type": "u64"} ] }

FUSE_INTERRUPT

TODO

FUSE_IOCTL

TODO

FUSE_LINK

TODO

FUSE_LISTXATTR

TODO

FUSE_LOOKUP

The request is a NUL-terminated bytestring. Incoming name length is constrained to some maximum length by the kernel:

Linux: 1024 (FUSE_NAME_MAX)
FreeBSD: 255 (MAXNAMLEN)

Response is a fuse_entry_out.

Notes:

There's two places the inode can be written, which get saved in different kernel data structures. I couldn't figure out what happens if they're different, but probably nothing good.
The FUSE inode IDs are always 64-bit, but kernels usually have an inode sized to the machine word. Linux XORs the high and low halves of the u64, while BSD just assigns u64 to u32 and lets the compiler do what it wants.
In early versions of FUSE, fuse_entry_out::nodeid had to be non-zero. Lookup failure was handled by ENOENT only. This restriction was lifted in v7.6, so that a lookup response with nodeid == 0 meant a cacheable lookup failure.
Each successful lookup increments the node's reference count, which is decremented by FUSE_FORGET.
- Exception: If fuse_attr::mode isn't a valid file type (S_REG etc), the kernel will drop the response and won't enqueue a FUSE_FORGET. A server that thought the response was successful would be stuck with that refcount forever.
- This seems like a kernel bug, TODO report and send a patch.
Changes in the size of fuse_attr propagate to all the structs that contain it, including fuse_attr_out. The fuse kernel header has constants like FUSE_COMPAT_ENTRY_OUT_SIZE set to the "old" struct size.
It looks like returning nodeid: 1 might also send an EIO to the client, because this node ID is reserved for the root node.

FUSE_LSEEK

TODO

FUSE_MKDIR

TODO

FUSE_MKNOD

TODO

FUSE_NOTIFY_REPLY

TODO

FUSE_OPEN

TODO

FUSE_OPENDIR

TODO

FUSE_POLL

TODO

FUSE_READ

TODO

FUSE_READDIR

TODO

FUSE_READDIRPLUS

TODO

https://tools.ietf.org/html/rfc1813#section-3.3.17

3.3.17 Procedure 17: READDIRPLUS - Extended read from directory

FUSE_READLINK

TODO

FUSE_RELEASE

{ "name": "FUSE_RELEASE (v7.2)", "fields": [ {"embed": "fuse_request_header", "size": 40}, {"name": "fh", "type": "u64"}, {"name": "flags", "type": "u32"}, {"name": "padding", "type": "u32"} ] }

{ "name": "FUSE_RELEASE (v7.8)", "fields": [ {"embed": "fuse_request_header", "size": 40}, {"name": "fh", "type": "u64"}, {"name": "flags", "type": "u32"}, {"name": "release_flags", "type": "u32"}, {"name": "lock_owner", "type": "u64"} ] }

TODO

FUSE_RELEASEDIR

TODO

FUSE_REMOVEXATTR

TODO

FUSE_RENAME

{ "name": "fuse_rename_request", "fields": [ {"embed": "fuse_request_header", "size": 40}, {"name": "newdir", "type": "u64"}, {"name": "old_name", "type": "string"}, {"name": "new_name", "type": "string"} ] }

FUSE_RENAME2

{ "name": "fuse_rename2_request", "fields": [ {"embed": "fuse_request_header", "size": 40}, {"name": "newdir", "type": "u64"}, {"name": "flags", "type": "u32"}, {"name": "padding", "type": "u32"}, {"name": "old_name", "type": "string"}, {"name": "new_name", "type": "string"} ] }

TODO

https://lwn.net/Articles/606237/

https://www.spinics.net/lists/linux-fsdevel/msg72068.html

https://www.systutorials.com/docs/linux/man/2-renameat2/

FUSE_RMDIR

TODO

FUSE_SETATTR

TODO

FUSE_SETLK

TODO

https://sourceforge.net/p/fuse/mailman/message/35018434/

FUSE supports mandatory locking and BSD flock if that's your thing < http://0pointer.de/blog/projects/locking.html >.

FUSE_SETLKW

TODO

FUSE_SETXATTR

{ "name": "FUSE_SETXATTR", "fields": [ {"embed": "fuse_request_header", "size": 40}, {"name": "size", "type": "u32"}, {"name": "flags", "type": "u32"} ] }

FUSE_STATFS

TODO

FUSE_SYMLINK

TODO

FUSE_UNLINK

TODO

FUSE_WRITE

TODO

CUSE_INIT

TODO

Appendix A: Structs

fuse_attr

{ "name": "fuse_attr (v7.2)", "fields": [ {"name": "ino", "type": "u64"}, {"name": "size", "type": "u64"}, {"name": "blocks", "type": "u64"}, {"name": "atime", "type": "u64"}, {"name": "mtime", "type": "u64"}, {"name": "ctime", "type": "u64"}, {"name": "atimensec", "type": "u32"}, {"name": "mtimensec", "type": "u32"}, {"name": "ctimensec", "type": "u32"}, {"name": "mode", "type": "u32"}, {"name": "nlink", "type": "u32"}, {"name": "uid", "type": "u32"}, {"name": "gid", "type": "u32"}, {"name": "rdev", "type": "u32"} ] }

{ "name": "fuse_attr (v7.9)", "fields": [ {"name": "ino", "type": "u64"}, {"name": "size", "type": "u64"}, {"name": "blocks", "type": "u64"}, {"name": "atime", "type": "u64"}, {"name": "mtime", "type": "u64"}, {"name": "ctime", "type": "u64"}, {"name": "atimensec", "type": "u32"}, {"name": "mtimensec", "type": "u32"}, {"name": "ctimensec", "type": "u32"}, {"name": "mode", "type": "u32"}, {"name": "nlink", "type": "u32"}, {"name": "uid", "type": "u32"}, {"name": "gid", "type": "u32"}, {"name": "rdev", "type": "u32"}, {"name": "blksize", "type": "u32"}, {"name": "padding", "type": "u32"} ] }

fuse_entry_out

{ "name": "fuse_entry_out", "fields": [ {"name": "nodeid", "type": "u64"}, {"name": "generation", "type": "u64"}, {"name": "entry_valid", "type": "u64"}, {"name": "attr_valid", "type": "u64"}, {"name": "entry_valid_nsec", "type": "u32"}, {"name": "attr_valid_nsec", "type": "u32"}, {"embed": "fuse_attr", "size": 0} ] }