First Impressions of Rust

I've been wanting to write a big project in Rust for a while as a learning exercise, and actually started one in late 2018 (a FUSE server implementation). But then life happened and I got busy and never went anywhere with it. Due to certain world circumstances I'm currently spending a lot of time indoors so rust-fuse (docs) now exists and is good enough to write basic hello-world filesystems. I plan to polish it up a bit more with the goal of releasing a v1.0 that supports the same use cases as libfuse.

I took some notes along the way about things that struck me as especially good or bad. Overall I quite like Rust the language, have mixed feelings about the quality of ancillary tooling, and have strong objections to some decisions made by the packaging system (Cargo + crates.io).

Background

I've been programming professionally for 15 years, primarily network servers and GUIs on Linux. Between roughly 2009 and 2015 I experimented with using Haskell for systems programming, writing several projects in pure Haskell (haskell-dbus, Anansi) and as bindings (haskell-ncurses, haskell-cpython). However, I couldn't achieve the sorts of reliability improvements over bread-and-butter C++ that I had hoped for:

  • Haskell has a lot of tools for reasoning about the structure of computation, notably monads for declarative I/O, but it doesn't do much to help the programmer with non-algorithmic concerns such as memory lifetimes. I spent a lot of time debugging dangling pointers and race conditions.
  • I found it very difficult to write Haskell code that could run as fast as C. Avoiding allocation, auto-boxing, etc felt like it required a deep knowledge of undocumented or unspecified GHC behavior.

In late 2015 I started rough sketches for a new language, Funk, that would combine the type-safety of Haskell with the low-level precision of C/C++. Funk was strongly influenced by Google's internal dialect of C++, which uses smart pointers and sum types (e.g. StatusOr<T>) to improve memory safety – many of its features later became part of the C++11 and C++14 standards. To this foundation I bolted on Haskell-style typeclasses and modules, then started writing a Funk-to-C translator based on Vala.

At some point I was looking around for inspiration on how to handle memory allocation (I planned to use scoped arenas as the fundamental dynamic memory system) and I discovered Rust. Here was a language that was solving the same problems as Funk but (1) better designed (2) already implemented and (3) supported by an entire team of compiler experts. So that was that, Funk went to /dev/null and I logged a TODO to learn Rust.

The Rust language

It shouldn't come as a surprise that someone looking for a cross between C++ and Haskell would like Rust, but I want to be clear: I really really enjoy using Rust. It is nearly everything I want in a systems programming language, and the few parts it's missing are due to legitimate technical difficulty. The amount of careful thought I've seen put into its design – crafting async/await to work in no_std environments, or the new inline assembly syntax – has produced a language that is not only better than what I could have designed, it's better among axes I was not even aware existed prior to reading the Rust RFCs.

The "nightly" release channel is an excellent idea that I wish more infrastructure software made use of. Stabilizing individual features on their own schedules lets the compiler maintain a blistering release cadence (stable releases every six weeks!). Users are empowered to choose their own preferred point on the maintenance/velocity curve, opting in to higher upgrade costs in exchange for early access to new features. The "editions" system goes a bit further, derisking backwards-incompatible syntax changes that would have stymied C++ for decades (see: trigraphs).

Type system

Rust has a reasonable amount of Haskell-style type programming, though I wouldn't mind a bit more. Some parts of its type system are limited in non-intuitive ways – for example only lifetime-kinded type parameters can be universally quantified in a trait bound. I hit a lot of compiler errors that recognized exactly what I wanted to do but wouldn't let me do it.

I wish Rust's type system supported:

  • Closed-world ("sealed") traits. Rust's rules against private types in the public API are good civilization but they make it difficult to define pseudo-private traits like Mount that I want users to name but not implement or call into.
  • Associated types in structs. Rust lets traits have associated types, and structs can have associated values, but there's no equivalent to the nested type names found in C++ or Java.
  • Very basic dependent typing, or maybe something like Eiffel's contracts, for the purpose of eliminating array bounds checks. I'd like to be able to say "this function accepts a &[u8] of at least size_of<SomeType>" so I can do safe unchecked byte poking.

Standard library

There's a lot of standard UNIX functionality that's missing from the Rust standard library. Some of it is more-or-less available from separate packages like nix, but I shouldn't have to depend on four crates plus a C compiler to get access to getuid(). I shouldn't have to depend on anything to get the definition of ENOSYS or the size of c_ulong. Go is the gold standard here – it can cross-compile to a Linux target from macOS using its own copies of the Linux syscall table – and even Haskell has Foreign.C.Types.

A std::os::unix without getuid() is incomplete but can be worked around with a small extern "C" block. Much worse is the lack of macro-dependent functions like recvmsg(), which is not a great API to begin with, or functions with OS-dependent arity like mount(). Rust is not averse to providing clean wrappers around the OS library – the std::fs and std::process modules contain little else – so it's frustrating to see these very basic functions left out.

Tooling

rustdoc

I categorize documentation generators into two basic groups:

  • First is the Sphinx group, which consumes prose and uses embedded pragmas to reference symbols of the library being documented. The output layout tends to be textbook-like, containing long "chapters" that might cover entire modules in one HTML file. Sphinx-style docs are popular among Python programmers.
  • Second is the Doxygen group, which consumes source code and generates a rigidly-structured catalog of symbols with optional attached prose. The output feels more like an encyclopedia or reference manual.

rustdoc is obviously in the second category. It is designed to consume doc comments, which are special-cased by the Rust compiler, and produces output closely matching the structure of the exported API. At this task rustdoc does a reasonable job: the page layout is navigable, the markup format (rustdoc uses Markdown) isn't great but it could be worse, and it doesn't hardcode absolute file paths into the output like Haddock.

Some of its annotations, like whether a symbol is OS-specific (rust-lang/rust#43781), are gated to the Nightly toolchain. It's not obvious to me why they do this – it's a documentation generator, why does it care what version of the Rust compiler I'm using? What's more, some of its functionality is reserved for the standard library only. I can't mark fields as unstable (subject to change in future library versions) because that annotation is based on the #[unstable] attribute, which the compiler reserves for its own use. Ditto for annotations about which version a symbol was added in. If I'm going to use a Doxygen-group tool then I don't want it to get too fussy about what libraries it's documenting.

rustfmt

Something like a cross between gofmt, clang-format, and GNU indent. It has a lot of configuration options but all the interesting ones are gated to Nightly, and most of those are much less useful than you might expect.

As a representative sample, consider rustfmt's handling of hard tabs. Given the following input there are two basic ways you might use hard tabs to indent it, depending on whether struct value alignment should apply to nested structs:

MyStruct{
  field_with_long_name: (some_big_complex_variable_name + another_big_complex_variable_name),
  another_field: 123,
  nested_struct: &NestedStruct{
    nested_struct_field: 456,
  },
  final_field: 123,
}

The first is to treat the nested struct as a "break" in the alignment (gofmt does this). I've drawn the tabs as ████ for clarity:

MyStruct{
████field_with_long_name: (some_big_complex_variable_name
████                       + another_big_complex_variable_name),
████another_field:        123,
████nested_struct: &NestedStruct{
████████nested_struct_field: 456,
████},
████final_field: 123,
}

The second is to align all the values, including the nested struct, and introduce a nested layer of tabs:

MyStruct{
████field_with_long_name: (some_big_complex_variable_name
████                       + another_big_complex_variable_name),
████another_field:        123,
████nested_struct:        &NestedStruct{
████                      ████nested_struct_field: 456,
████                      },
████final_field:          123,
}

But what rustfmt produces is an indecisive and poorly formatted combo of the two – it doesn't even properly align the parenthesized expression after line-breaking it:

MyStruct {
████field_with_long_name: (some_big_complex_variable_name
████████+ another_big_complex_variable_name),
████another_field:        123,
████nested_struct:        &NestedStruct {
████████nested_struct_field: 456,
████},
████final_field:          123,
}

I eventually gave up on trying to make the formatted rust-fuse code look pretty, and settled for "consistent".

Cargo and crates.io

While the Rust language feels carefully designed to combine the best parts of multiple popular and interesting languages, Rust's default build system (Cargo) and package repository (crates.io) are the opposite. They combine the worst parts of Cabal/Hackage and NPM, resulting in a user experience that is somehow inferior to both.

Package naming

crates.io has no namespacing. If a user uploads a package named fuse, that name is taken forever and no other person can upload a package named fuse unless the first developer transfers ownership. It so happens that someone did in fact upload crates.io/crates/fuse in 2014 (last updated: 2017), which means I'm going to have to publish mine under some stupid codename or contrived rusty-libfuse-for-rust-lib nonsense.

How did this happen? It's not like package registries are a new invention – PyPI launched in 2003, and CPAN has been running since 1995 (!). NPM has had optional namespaces ("scopes") since at least 2014.

Go demonstrates how to do distributed package naming well. A Go package is identified by a hierarchial path rooted at a DNS domain, which both solves the issue of ownership (defer to DNS) and lets big shared hosting providers like GitHub cleanly subdivide their namespace. If Cargo had done the same we might have package names like "github.com/rust-lang/git2-rs", which while not great at least avoids staking a claim on the very concept of Git.

But since crates.io is centralized, it can be terser than Go. Cargo could use crates.io as the default, using the presence of a period to distinguish non-default registries in Cargo.toml. And if you combined it with NPM's sigils, the official libgit2 binding "@rust/git2" could be registered at the same time as Jane Doe's experimental "~jdoe/git2" package, and could live on the same Internet as "example.com/rust-stuff/git2". Everyone would have the chance to contribute their code to the commons under a reasonable name.

Single-crate packaging

Cargo's unit of distribution is the crate, which is a problem because Rust's compilation unit is also the crate. Large libraries are easier to work on when parts of the build graph can be cached, but if you try to split up a library you pretty immediately run into problems:

  • Cargo won't let you define more than one [lib] per Cargo.toml, so what would be a minor refactoring requires converting the project repository into a "workspace". As a side effect this breaks many common commands, for example cargo test must be replaced with cargo test --all.
  • Cargo can handle release archives containing multiple crates (via path dependencies), but crates.io rejects uploads containing crates with path deps. This leads to an explosion of crate registrations, as each project needs to upload its internal organs as separate packages. Good luck with figuring out semver for mypkg-internal-macros – might as well version them all "v0.0.$(date +%s)".

When I was writing Rust without Cargo I was confused about why people complained about slow build times, but now I get it. Of course build times are slow if changing one line of a leaf file requires rebuilding dozens of modules. I've found Bazel and rules_rust provide a good alternative to Cargo, since Bazel can twist your build into any DAG you want, but most Rust users are unlikely to be excited about injecting 50MB of Java build system into the middle of their workflow.