UNIX Syscalls

Overview

On UNIX-like operating systems, userland processes invoke kernel procedures using the "syscall" feature. Each syscall is identified by a "syscall number" and has a short list of parameters, which both can vary betwen operating systems, hardware platforms, and configuration options.

Performing a syscall is usually done via a special assembly instruction, though some platforms use other mechanisms (e.g. a vDSO). This page is a catalog of how to invoke syscalls on different UNIX-like platforms.

int $0x80 (or int 80h)

int $0x80 (also styled as int 80h) is the traditional syscall instruction on i386 UNIX-like platforms. It triggers a software interrupt that transfers control to the kernel, which inspects its registers and stack to find the syscall number + parameters. It is obsolete since the mid 2000s for performance reasons, but can still be found in tutorials because it's easier to understand than more modern mechanisms.

Syscalls by OS

(incomplete)

NameStandardLinuxDarwinFreeBSD
accessPOSIXaccess(2)access(2)access(2)
creatPOSIXcreat(2)creat(2)creat(2)
exchangedataexchangedata(2)
fallocatefallocate(2)
fsyncPOSIXfsync(2)fsync(2)fsync(2)
statPOSIXstat(2)stat(2)stat(2)
fcntlPOSIXfcntl(2)fcntl(2)fcntl(2)
flockflock(2)
getxattrgetxattr(2)getxattr(2)
linkPOSIXlink(2)link(2)link(2)
listxattrlistxattr(2)listxattr(2)
lseekPOSIXlseek(2)lseek(2)lseek(2)
mkdirPOSIXmkdir(2)mkdir(2)mkdir(2)
mknodPOSIXmknod(2)mknod(2)mknod(2)
openPOSIXopen(2)open(2)open(2)
opendirPOSIXopendir(3)directory(3)directory(3)
pollPOSIXpoll(2)poll(2)poll(2)
readPOSIXread(2)read(2)read(2)
readdirPOSIXreaddir(3)directory(3)directory(3)
readlinkPOSIXreadlink(2)readlink(2)readlink(2)
removexattrremovexattr(2)removexattr(2)
renamePOSIXrename(2)rename(2)rename(2)
renameat2rename(2)
rmdirPOSIXrmdir(2)rmdir(2)rmdir(2)
chmodPOSIXchmod(2)chmod(2)chmod(2)
chownPOSIXchown(2)chown(2)chown(2)
utimePOSIXutime(2)utime(3)utime(3)
setxattrsetxattr(2)setxattr(2)
statfsstatfs(2)statfs(2)statfs(2)
symlinkPOSIXsymlink(2)symlink(2)symlink(2)
unlinkPOSIXunlink(2)unlink(2)unlink(2)
writePOSIXwrite(2)write(2)write(2)

Linux

Linux syscalls are defined in include/linux/syscalls.h. Syscalls use the same parameter order across platforms, but some (e.g. sys_stat64) are only defined on some platforms, and others (e.g. sys_clone) have different parameters depending on kernel compilation options. Syscall numbers are platform-dependent.

Manpage syscalls(2) lists syscalls and which kernel version they were added in. Manpage syscall(2) lists per-architecture calling conventions and register assignments.

Documentation and tutorials for implementing a Linux syscall:

Linux: i386 (INT 0x80)

The syscall number is passed in register eax. Syscalls with six or fewer parameters pass them in registers [ebx, ecx, edx, esi, edi, ebp]. Syscalls with more than six parameters use ebx to pass a memory address, in a way that doesn't seem to be well documented.

Linux syscall numbers for i386 are defined in arch/x86/entry/syscalls/syscall_32.tbl.

See above for background on int $0x80.

.data
	.set .L_STDOUT,        1
	.set .L_SYSCALL_EXIT,  1
	.set .L_SYSCALL_WRITE, 4
	.L_message:
		.ascii "Hello, world!\n"
		.set .L_message_len, . - .L_message

.text
	.global _start
	_start:
		# write(STDOUT, message, message_len)
		mov $.L_SYSCALL_WRITE, %eax
		mov $.L_STDOUT,        %ebx
		mov $.L_message,       %ecx
		mov $.L_message_len,   %edx
		int $0x80

		# exit(0)
		mov $.L_SYSCALL_EXIT, %eax
		mov $0,               %ebx
		int $0x80

static linking

as --32 -o hello.o hello.s
ld -m elf_i386 -o hello hello.o
file hello
# hello: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, not stripped
./hello
# Hello, world!

dynamic linking

as --32 -o hello.o hello.s
ld -m elf_i386 -o hello hello.o \
#   --dynamic-linker /lib/ld-linux.so.2 \
#   -l:ld-linux.so.2
file hello
# hello: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, not stripped
ldd hello
#     /lib/ld-linux.so.2 (0x56614000)
#     linux-gate.so.1 (0xf77ba000)
./hello
# Hello, world!

Linux: i386 (vDSO)

A vDSO is a shared library injected into processes by the kernel, rather than loaded by the dynamic linker. It's used in i386 linux to implement faster syscalls via the SYSENTER instructions available in modern 32-bit x86 processors[1] [2]. Later kernel versions also added fast paths for certain read-only syscalls[3].

This code is slightly more complicated than the int 0x80 example because all functions loaded from shared objects (including __kernel_vsyscall) must use indirect calls.

.extern __kernel_vsyscall

.data
	.set .L_STDOUT,        1
	.set .L_SYSCALL_WRITE, 4
	.set .L_SYSCALL_EXIT,  1
	.L_message:
		.ascii "Hello, world!\n"
		.set .L_message_len, . - .L_message

.text
	.global _start
	_start:
		call .L_get_pc_thunk.esi
		add  $_GLOBAL_OFFSET_TABLE_, %esi

		# write(STDOUT, message, message_len)
		mov  $.L_SYSCALL_WRITE, %eax
		mov  $.L_STDOUT,        %ebx
		mov  $.L_message,       %ecx
		mov  $.L_message_len,   %edx
		call *__kernel_vsyscall@GOT(%esi)

		# exit(0)
		mov  $.L_SYSCALL_EXIT, %eax
		mov  $0,               %ebx
		call *__kernel_vsyscall@GOT(%esi)

	.L_get_pc_thunk.esi:
		mov (%esp), %esi
		ret

The linux-gate.so.1 library that will be available at runtime is not available to the linker at compile time. To get the correct symbols and ELF headers into the executable, we need to inject some fake data:

  • --defsym __kernel_vsyscall=0 creates a place for the symbol address to be written to, once resolved. This also prevents the linker from warning about an unresolved symbol.
  • Creating a dummy shared object with ld -shared -soname=linux-gate.so.1 causes the linker to add a DT_NEEDED entry for the vDSO, so the dynamic linker will know to use it as a source of symbol addresses.

The resulting binary is a totally normal dynamic ELF executable.

echo '.type __kernel_vsyscall STT_FUNC' | as --32 -o dummy_so.o
ld -m elf_i386 -shared \
#   --defsym __kernel_vsyscall=0 \
#   -soname=linux-gate.so.1 \
#   -o dummy_so dummy_so.o
as --32 -o hello.o hello.s
ld -m elf_i386 -o hello hello.o \
#   --dynamic-linker /lib/ld-linux.so.2 \
#   -l:ld-linux.so.2 \
#   dummy_so
file hello
# hello: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, not stripped
ldd hello
#     /lib/ld-linux.so.2 (0x56625000)
#     linux-gate.so.1 (0xf77d5000)
./hello
# Hello, world!

Why not auxinfo?

Some articles about the Linux vDSO describe looking up its address using the ELF auxiliary vector. I avoided this because it seems complicated and fussy:

  • AT_SYSINFO provides the address of __kernel_vsyscall directly, but is deprecated[4] and requires the discovered address to be plumbed through client code (or assigned to a magic global in some very early initializer).
  • AT_SYSINFO_EHDR provides the address of the vDSO, which requires further parsing using an ELF library to extract relevant symbol addresses. I don't want my programs to embed ELF parsers, especially when a perfectly good one is available in ld.so.
  • The dynamic linker solution can be trivially extended to other Linux vDSO symbols like __vdso_gettimeofday, again with no ELF parsing needed.

The main disadvantage of my solution is it can't be used in a statically linked executable, which are useful for system recovery tools (e.g. busybox) or minimal Docker containers.

Why not gs:0x10?

I've seen one article recommend using call *%gs:0x10to invoke __kernel_vsyscall, because GNU libc uses this register to locate its early-initialized magic globals.

Don't do this. Everything I can find about glibc auxv handling indicates that the value of %gs is not part of the GNU libc public ABI, and it seems to be pointing to some internal datastructure that happens to have the address of __kernel_vsyscall at offset 0x10 (used to be 0x18). There is no guarantees that these properties will be true in the future, especially if you want your code to link against non-GNU libc implementations such as musl.

Linux: x86-64

The syscall number is passed in register rax. Parameters are passed in registers [rdi, rsi, rdx, rcx, r8, r9]. I haven't found documentation on what x86-64 Linux does for syscalls with more than six parameters. The syscall instruction is used to pass control to the kernel.

Linux syscall numbers for x86-64 are defined in arch/x86/entry/syscalls/syscall_64.tbl.

.data
	.set .L_STDOUT,        1
	.set .L_SYSCALL_EXIT,  60
	.set .L_SYSCALL_WRITE, 1
	.L_message:
		.ascii "Hello, world!\n"
		.set .L_message_len, . - .L_message

.text
	.global _start
	_start:
		# write(STDOUT, message, message_len)
		mov     $.L_SYSCALL_WRITE, %rax
		mov     $.L_STDOUT,        %rdi
		mov     $.L_message,       %rsi
		mov     $.L_message_len,   %rdx
		syscall

		# exit(0)
		mov     $.L_SYSCALL_EXIT, %rax
		mov     $0,               %rdi
		syscall

static linking

as --64 -o hello.o hello.s
ld -m elf_x86_64 -o hello hello.o
file hello
# hello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped
./hello
# Hello, world!

dynamic linking

as --64 -o hello.o hello.s
ld -m elf_x86_64 -o hello hello.o \
#   --dynamic-linker /lib64/ld-linux-x86-64.so.2 \
#   -l:ld-linux-x86-64.so.2
file hello
# hello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, not stripped
ldd hello
#    /lib64/ld-linux-x86-64.so.2 (0x00007f472a831000)
#    linux-vdso.so.1 (0x00007ffe83d7a000)
./hello
# Hello, world!

Linux: ARM v6 (Little-Endian, EABI)

Linux syscall numbers for ARM are defined in arch/arm/tools/syscall.tbl.

.arch armv6
.data
	.set .L_STDOUT,        1
	.set .L_SYSCALL_EXIT,  1
	.set .L_SYSCALL_WRITE, 4
	.L_message:
		.ascii "Hello, world!\n"
	.set .L_message_len, . - .L_message

.text
	.global _start
	_start:
		@ write(STDOUT, message, message_len)
		mov %r7, #.L_SYSCALL_WRITE
		mov %r0, #.L_STDOUT
		ldr %r1, =.L_message
		mov %r2, #.L_message_len
		swi #0

		@ exit(0)
		mov %r7, #.L_SYSCALL_EXIT
		mov %r0, #0
		swi #0

static linking

as -EL -o hello.o hello.s
ld -m armelf_linux_eabi -o hello hello.o
file hello
# hello: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, not stripped
./hello
# Hello, world!

dynamic linking

as -EL -o hello.o hello.s
ld -m armelf_linux_eabi -o hello hello.o \
#   --dynamic-linker /lib/ld-linux-armhf.so.3 \
#   -l:ld-linux-armhf.so.3
file hello
# hello: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, not stripped
./hello
# Hello, world!

Darwin (MacOS X)

Note that I have left out the instructions to statically link binaries because they are documented as unsupported: Technical Q&A QA1118: Statically linked binaries on Mac OS X. Apple is also known to break the syscall ABI between MacOS versions, though it should be stable enough for the syscalls inherited from BSD.

Use of lea here is because PIE addressing is required for -macos_version_min 10.7 or later. Make sure this linker flag matches the .macosx_version_min value in the assembly, or the linker may reject your object code.

10.8 and later requires linking with libSystem via ld -lSystem. Earlier versions don't need that link.

The default entry point changed from start to _main in 10.8. Use ld -e _main to build for earlier -macos_version_min values.

Darwin: i386

.macosx_version_min 10, 8

.data
	.set L_STDOUT,        1
	.set L_SYSCALL_EXIT,  1
	.set L_SYSCALL_WRITE, 4
	L_message:
		.ascii "Hello, world!\n"
		.set L_message_len, . - L_message

.text
	.global _main
	_main:
		mov %eax, %esi

		# write(STDOUT, message, message_len)
		push $L_message_len
		lea  L_message-_main(%esi), %eax
		push %eax
		push $L_STDOUT
		push $0 # stack padding
		mov  $L_SYSCALL_WRITE, %eax
		int  $0x80
		add  $16, %esp

		# exit(0)
		push $0 # exit code
		push $0 # stack padding
		mov  $L_SYSCALL_EXIT, %eax
		int  $0x80

dynamic linking

as -arch i386 -o hello.o hello.s
ld -arch i386 -macosx_version_min 10.8 -lSystem -o hello hello.o
file hello
# hello: Mach-O executable i386
otool -L hello
# hello:
#     /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1238.60.2)
./hello
# Hello, world!

Darwin: x86-64

In 64-bit MacOS X, syscall numbers are divided into "classes". The syscalls inherited from BSD are in SYSCALL_CLASS_UNIX, starting at 0x2000000. See XNU header osfmk/mach/syscall_sw.h for details.

.macosx_version_min 10, 8

.data
	.set L_STDOUT,        1
	.set L_SYSCALL_EXIT,  0x2000001
	.set L_SYSCALL_WRITE, 0x2000004
	L_message:
		.ascii "Hello, world!\n"
		.set L_message_len, . - L_message

.text
	.global _main
	_main:
		# write(STDOUT, message, message_len)
		mov     $L_SYSCALL_WRITE, %rax
		mov     $L_STDOUT,        %rdi
		lea     L_message(%rip),  %rsi
		mov     $L_message_len,   %rdx
		syscall

		# exit(0)
		mov     $L_SYSCALL_EXIT, %rax
		mov     $0,              %rdi
		syscall

dynamic linking

as -arch x86_64 hello.s -o hello.o
ld -arch x86_64 -o hello hello.o \
#     -macosx_version_min 10.8 -lSystem
file hello
# hello: Mach-O 64-bit executable x86_64
otool -L hello
# hello:
#     /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1238.60.2)
./hello
# Hello, world!

FreeBSD

The list of system calls is defined in sys/kern/syscalls.master. Syscall numbers appear to be the same across hardware platforms.

FreeBSD: i386

int $0x80 appears to be the only supported syscall mechanism for FreeBSD on i386. There is a vDSO at sys/sys/vdso.h but it doesn't contain a Linux-style generic syscall trampoline.

.data
	.set .L_STDOUT,        1
	.set .L_SYSCALL_EXIT,  1
	.set .L_SYSCALL_WRITE, 4
	.L_message:
		.ascii "Hello, world!\n"
		.set .L_message_len, . - .L_message

.text
	.global _start
	_start:
		# write(STDOUT, message, message_len)
		push $.L_message_len
		push $.L_message
		push $.L_STDOUT
		push $0 # stack padding
		mov  $.L_SYSCALL_WRITE, %eax
		int  $0x80
		add  $16, %esp

		# exit(0)
		push $0 # exit code
		push $0 # stack padding
		mov  $.L_SYSCALL_EXIT, %eax
		int  $0x80

static linking

as --32 -o hello.o hello.s
ld -m elf_i386_fbsd -o hello hello.o
file hello
# hello: ELF 32-bit LSB executable, Intel 80386, version 1 (FreeBSD), statically linked, not stripped
./hello
# Hello, world!

dynamic linking

as --32 -o hello.o hello.s
ld -m elf_i386_fbsd -o hello hello.o \
#    --dynamic-linker=/libexec/ld-elf.so.1 \
#    -L/libexec -l:ld-elf.so.1 \
#    --hash-style=gnu
file hello
# hello: ELF 32-bit LSB executable, Intel 80386, version 1 (FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1, not stripped
ldd hello
# hello:
#     /libexec/ld-elf.so.1 (0x2806e000)
./hello
# Hello, world!

FreeBSD: x86-64

Note that older FreeBSD kernels contain a bug in syscall handling that can cause crashes when using the SYSCALL instruction. Compilers targeting these old versions should use INT $0x80 instead.

.data
	.set L_STDOUT,        1
	.set L_SYSCALL_EXIT,  1
	.set L_SYSCALL_WRITE, 4
	L_message:
		.ascii "Hello, world!\n"
		.set L_message_len, . - L_message

.text
	.global _main
	_main:
		# write(STDOUT, message, message_len)
		mov     $L_SYSCALL_WRITE, %rax
		mov     $L_STDOUT,        %rdi
		mov     $L_message,       %rsi
		mov     $L_message_len,   %rdx
		syscall

		# exit(0)
		mov     $L_SYSCALL_EXIT, %rax
		mov     $0,              %rdi
		syscall

static linking

as --64 -o hello.o hello.s
ld -m elf_x86_64_fbsd -o hello hello.o
file hello
# hello: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), statically linked, not stripped
./hello
# Hello, world!

dynamic linking

as --64 -o hello.o hello.s
ld -m elf_x86_64_fbsd -o hello hello.o \
#    --dynamic-linker=/libexec/ld-elf.so.1 \
#    -L/libexec -l:ld-elf.so.1 \
#    --hash-style=gnu
file hello
# hello: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1, not stripped
ldd hello
# hello:
#     /libexec/ld-elf.so.1 (0x800822000)
./hello
# Hello, world!

SunOS 4.x (Solaris 1.x)

SunOS: SPARC v7

.seg "data"
	L_STDOUT        = 1
	L_SYSCALL_EXIT  = 1
	L_SYSCALL_WRITE = 4
	L_message:
		.ascii "Hello world!\n"
		L_message_len = . - L_message

.seg "text"
	.global _start
	_start:
		! write(STDOUT, message, message_len)
		mov L_SYSCALL_WRITE, %g1
		mov L_STDOUT,        %o0
		set L_message,       %o1
		set L_message_len,   %o2
		ta  0

		! exit(0)
		mov L_SYSCALL_EXIT, %g1
		mov 0,              %o0
		ta  0

static linking

as -o hello.o hello.s
ld -e _start -o hello hello.o
file hello
# hello:          sparc demand paged executable not stripped
ldd hello
# hello: statically linked
./hello
# Hello world!

Inline Assembly

Higher-level languages sometimes let assembly be embedded directly into their object code. The exact syntax is language- and compiler-specific.

I used x86-64 Linux as the target platform for these examples, but they should work equally well if the appropriate instructions are substituted.

A note on "clobbering": compilers require the inline assembly block to declare which CPU registers _other than the inputs and outputs_ may be modified. The exact set of clobbered registers is compiler-, platform-, and os-specific[5]. Linux on x86-64 clobbers rcx and r11 (and maybe r10, as claimed by osdev?).

Linux: x86-64 (GNU C)

See Using Assembly Language with C in the GCC manual for an overview, Machine Constraints for architecture-specific codes to pass parameters into an assembly block, and Local Register Variables for details on assigning values to specific registers.

I couldn't find documentation on which registers GNU C's inline assembly clobbers, if any.

static const int STDOUT = 1;
static const int SYSCALL_EXIT = 60;
static const int SYSCALL_WRITE = 1;
static const char message[] = "Hello, world!\n";
static const int message_len = sizeof(message);

void _start() {
	{   /* write(STDOUT, message, message_len) */
		register int         rax __asm__ ("rax") = SYSCALL_WRITE;
		register int         rdi __asm__ ("rdi") = STDOUT;
		register const char *rsi __asm__ ("rsi") = message;
		register int         rdx __asm__ ("rdx") = message_len;
		__asm__ __volatile__ ("syscall"
			: "+r" (rax)
			: "r" (rax), "r" (rdi), "r" (rsi), "r" (rdx)
			: "rcx", "r11");
	}

	{   /* exit(0) */
		register int rax __asm__ ("rax") = SYSCALL_EXIT;
		register int rdi __asm__ ("rdi") = 0;
		__asm__ __volatile__ ("syscall"
			:
			: "r" (rax), "r" (rdi)
			: "rcx", "r11");
	}
}

static linking

gcc -m64 -c -o hello.o hello.c
ld -m elf_x86_64 -o hello hello.o
file hello
# hello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped
./hello
# Hello, world!

dynamic linking

gcc -m64 -c -o hello.o hello.c
ld -m elf_x86_64 -o hello hello.o \
#   --dynamic-linker /lib64/ld-linux-x86-64.so.2 \
#   -l:ld-linux-x86-64.so.2
file hello
# hello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, not stripped
./hello
# Hello, world!

Linux: x86-64 (LLVM IR)

See Inline Assembler Expressions in the LLVM IR reference for an overview. I'm using named registers in the input list instead of moving things around in the ASM block, so that LLVM will handle the register allocation.

LLVM documentation says its ASM calls clobber registers dirflag, fpsr, and flags in addition to any registers clobbered by the kernel.

@.message = internal constant [14 x i8] c"Hello, world!\0A"

define void @_start() {
	%message_ptr = getelementptr [14 x i8], [14 x i8]* @.message , i64 0, i64 0

	; write(STDOUT, message, message_len)
	call i64 asm sideeffect "syscall",
		"={rax},{rax},{rdi},{rsi},{rdx},~{rcx},~{r11},~{dirflag},~{fpsr},~{flags}"
		( i64 1            ; {rax} SYSCALL_WRITE
		, i64 1            ; {rdi} STDOUT
		, i8* %message_ptr ; {rsi} message
		, i64 14           ; {rdx} message_len
		)

	; exit(0)
	call i64 asm sideeffect "syscall",
		"={rax},{rax},{rdi},~{rcx},~{r11},~{dirflag},~{fpsr},~{flags}"
		( i64 60 ; {rax} SYSCALL_EXIT
		, i64 0  ; {rdi} exit_code
		)

	ret void
}

static linking

llc -o hello.o hello.ll -filetype=obj
ld -m elf_x86_64 -o hello hello.o
file hello
# hello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped
./hello
# Hello, world!

dynamic linking

llc -o hello.o hello.ll -filetype=obj -relocation-model=pic
ld -m elf_x86_64 -o hello hello.o \
#   --dynamic-linker /lib64/ld-linux-x86-64.so.2 \
#   -l:ld-linux-x86-64.so.2
file hello
# hello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, not stripped
./hello
# Hello, world!

  1. LKML: Intel P6 vs P7 system call performance (Mike Hayward)

  2. LWN: How to speed up system calls

  3. manpage vdso(7)

  4. manpage getauxval(3)

  5. See the System V ABI for details.