Bazel School: Toolchains

I've recently been using Bazel as a multi-platform distributed build system. Bazel itself supports this pretty well, but many of the user-contributed extension libraries don't make good use of Bazel's toolchains and therefore break when multiple OSes are involved in a build. I hope the situation can be improved by documenting nascent best practices.

This page is a bit advanced. It assumes background knowledge in cross compilation, plus experience with Bazel's Starlark extension language, build rules, and repository definitions . Most users of Bazel shouldn't need to care about the details of compiler toolchains, but this is important stuff for maintainers of language rules.

Constraints

Bazel's package/toolchain design is based on constraints, which are simple text key/value pairs. Keys are defined constraint_setting, and values by constraint_value. Settings and values are true targets, which means they're addressed by label, obey visibility, and can be aliased.

A couple basic constraints come predefined in @bazel_tools//platforms:

@bazel_tools//platforms:cpu
    @bazel_tools//platforms:arm
    @bazel_tools//platforms:ppc
    @bazel_tools//platforms:s390x
    @bazel_tools//platforms:x86_32
    @bazel_tools//platforms:x86_64

@bazel_tools//platforms:os
    @bazel_tools//platforms:freebsd
    @bazel_tools//platforms:linux
    @bazel_tools//platforms:osx
    @bazel_tools//platforms:windows

Note the limited selection and lack of precision. These definitions are (as of Bazel 0.13) useful only for getting started. Most language rules will want to define their own – see @io_bazel_rules_go//go/toolchain:toolchains.bzl for an example of custom values for the built-in settings.

Platforms

Upstream docs:

A platform is a named set of constraint values (see above), plus some other metadata that I'm going to skip because it's part of the not-fully-implemented remote execution API. They can contain any number of constraint values, but at most one constraint value per constraint setting (i.e. you can't have a platform with two CPU types). Be specific – Autoconf's "GNU Triplets " are a good model to imitate here.

# platforms/BUILD

platform(
    name = "x86_64-apple-darwin",
    constraint_values = [
        "@bazel_tools//platforms:osx",
        "@bazel_tools//platforms:x86_64",
    ],
)

platform(
    name = "i686-linux-gnu",
    constraint_values = [
        "@bazel_tools//platforms:linux",
        "@bazel_tools//platforms:x86_32",
    ],
)

The rest of this page will use the standard platform definitions built into Bazel. Custom platforms are if you need to constrain on other dimensions, such as CPU vendor or libc version.

Defining Toolchains

Upstream docs:

To work with cross-compilation, toolchains themselves need to be (1) capable of generating non-native output binaries and (2) must define their Bazel constraints.

Toolchain Types

Each category of toolchain is identified by a toolchain type, which is a string in the format of a build label. There is no requirement that the value actually match any defined label. I recommend using a @-prefixed toolchain type, to avoid potential conflicts in workspaces with multiple language rules loaded.

ToolchainInfo

The ToolchainInfo provider is how your rules store toolchain configuration to Bazel. There's no special requirements about the values you can put in, so feel free to use whatever makes sense for your language.

Skylark doesn't have a public/private distinction for struct attributes, so a convention of underscore-prefixed attribute names is borrowed from Python. It's easy for rule implementations to get access to the ToolchainInfo for any registered toolchain, so be clear in your docs which attributes are part of your public API.

First you define a rule type for your toolchain info:

# demo_toolchain.bzl

DEMO_TOOLCHAIN = "@rules_demo//:demo_toolchain_type"

def _demo_toolchain_info(ctx):
    return [
        platform_common.ToolchainInfo(
            compiler = ctx.attr.compiler,
            cflags = ctx.attr.cflags,
        ),
    ]

demo_toolchain_info = rule(
    _demo_toolchain_info,
    attrs = {
        "_compiler": attr.label(
            executable = True,
            default = "//:demo_compiler"
            cfg = "host",
        ),
        "cflags": attr.string_list(),
    },
)

Then use it to create toolchain info targets, one for each unique configuration you might want to build with:

# BUILD

load(":demo_toolchain.bzl", "DEMO_TOOLCHAIN", "demo_toolchain_info")

demo_toolchain_info(
    name = "demo_toolchain_info/i686-linux-gnu",
    cflags = ["--target-os=linux", "--target-arch=i686"],
)

demo_toolchain_info(
    name = "demo_toolchain_info/x86_64-linux-gnu",
    cflags = ["--target-os=linux", "--target-arch=amd64"],
)

Registration

Once you've got your ToolchainInfo rules defined, the next step is to register them. This is where the info is associated with the toolchain type and the constraint values so Bazel can auto-detect which toolchains are usable on a particular platform.

# BUILD

toolchain(
    name = "demo_toolchain_linux_x86_32",
    exec_compatible_with = [
            "@bazel_tools//platforms:linux",
            "@bazel_tools//platforms:x86_32",
    ],
    target_compatible_with = [
            "@bazel_tools//platforms:linux",
            "@bazel_tools//platforms:x86_32",
    ],
    toolchain = ":demo_toolchain_info/i686-linux-gnu",
    toolchain_type = DEMO_TOOLCHAIN,
)

toolchain(
    name = "demo_toolchain_linux_x86_64",
    # [...]
)

Finally, the toolchains Bazel can use are passed to register_toolchains in your WORKSPACE. Usually this is done in a helper macro defined in the language rules, so that both the toolchain() rules and register_toolchains(...) args can be generated by the same logic.

# WORKSPACE

register_toolchains(
    "//:demo_toolchain_linux_x86_32",
    "//:demo_toolchain_linux_x86_64",
)

Using Toolchains

Rules can say which type toolchains they depend on, like "needs a C++ compiler". When defining the rule, set the toolchains param to all the toolchain types that will be needed to run the action. Then within the implementation, fetch the ToolchainInfo values (the same ones defined in the toolchain info rule) and inspect the content to implement your build.

# rules.bzl

def _demo_rule(ctx):
    tc = ctx.toolchains[DEMO_TOOLCHAIN]
    print("toolchain: %s %r" % (tc.compiler, tc.cflags))

demo_rule = rule(
    _demo_rule,
    toolchains = [DEMO_TOOLCHAIN],
)

Cross-Compilation

Toolchains can have different exec_compatible_with and target_compatible_with attrs. The execution compatibility is used for the platform that runs builds (i.e. the worker), and the target compatibility is the types that the toolchain can output.

Here's the definition of a cross-compiling toolchain that runs on 32-bit Linux but generates output for 64-bit Linux:

# BUILD

load(":demo_toolchain.bzl", "demo_toolchain_info")

demo_toolchain_info(
    name = "demo_toolchain_info_linux_x86_32_cross64",
    compiler = "@demo_prebuilt_compiler_linux_x86_32//:demo_compiler",
    cflags = ["--target-os=linux", "--target-arch=amd64"],
)

toolchain(
    name = "demo_toolchain_linux_x86_32_cross64",
    exec_compatible_with = [
            "@bazel_tools//platforms:linux",
            "@bazel_tools//platforms:x86_32",
    ],
    target_compatible_with = [
            "@bazel_tools//platforms:linux",
            "@bazel_tools//platforms:x86_64",
    ],
    toolchain = ":demo_toolchain_info_linux_x86_32_cross64",
    toolchain_type = DEMO_TOOLCHAIN,
)

Platform Selection Flags

Bazel (as of 0.13) has two flags to override the platform selection, which are useful when the execution platform is custom-defined or different in some important way from the machine running Bazel. The most common reason is if you're building with remote workers.

The --platforms flag specifies which platforms the final compiled binaries will run on. This flag can accept multiple platforms, in which case Bazel may generate multiple outputs for a build artifact.
The --host_platform flag overrides which platform is used for executing build commands. I'm hopeful that this flag could be split into --host_platform and --remote_platform in future versions of Bazel, so that some actions can be run locally even if the distributed build pool is different from the local workstation.

There's also the --cpu and --host_cpu flags, which (if I understand correctly) are deprecated and exist only because the built-in C++ rules haven't been migrated to the toolchains system yet.

Prebuilt Toolchains

Compiler toolchains are often large, and take a while to build. Downloading prebuilt toolchains can materially improve your users' experience, but there's some extra details to be aware of:

Do not use uname, inspection of /proc, or similar unsandboxed mechanisms to discover the execution platform. These interfere with user's customizations of the build environment, and can cause incorrect behavior when the execution platform is different from where the user is running Bazel.
If the toolchain is downloaded by a custom repository rule, put it in its own .bzl file. Repository rules are invalidated by any changes to the .bzl file they're defined in, and you don't want small changes to toolchains to force a re-download of large toolchain archives.