Bazel Toolchains

I’ve recently been using Bazel as a multi-platform distributed build system. Bazel itself supports this pretty well, but many of the user-contributed extension libraries don’t make good use of Bazel’s toolchains and therefore break when multiple OSes are involved in a build. I hope the situation can be improved by documenting nascent best practices.

This page is a bit advanced. It assumes background knowledge in cross compilation, plus experience with Bazel’s Skylark extension language, build rules, and repository definitions. Most users of Bazel shouldn’t need to care about the details of compiler toolchains, but this is important stuff for maintainers of language rules.

Constraints

Bazel’s package/toolchain design is based on constraints, which are simple text key/value pairs. Keys are defined constraint_setting, and values by constraint_value. Settings and values are true targets, which means they’re addressed by label, obey visibility, and can be aliased.

A couple basic constraints come predefined in @bazel_tools//platforms:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
@bazel_tools//platforms:cpu
  @bazel_tools//platforms:arm
  @bazel_tools//platforms:ppc
  @bazel_tools//platforms:s390x
  @bazel_tools//platforms:x86_32
  @bazel_tools//platforms:x86_64

@bazel_tools//platforms:os
  @bazel_tools//platforms:freebsd
  @bazel_tools//platforms:linux
  @bazel_tools//platforms:osx
  @bazel_tools//platforms:windows

Note the limited selection and lack of precision. These definitions are (as of Bazel 0.13) useful only for getting started. Most language rules will want to define their own – see @io_bazel_rules_go//go/toolchain:toolchains.bzl for an example of custom values for the built-in settings.

Platforms

Upstream docs:

A platform is a named set of constraint values (see above), plus some other metadata that I’m going to skip because it’s part of the not-fully-implemented remote execution API. They can contain any number of constraint values, but at most one constraint value per constraint setting (i.e. you can’t have a platform with two CPU types). Be specific – Autoconf’s “GNU Triplets” are a good model to imitate here.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
platform(
    name = "x86_64-apple-darwin",
    constraint_values = [
        "@bazel_tools//platforms:osx",
        "@bazel_tools//platforms:x86_64",
    ],
)

platform(
    name = "i686-linux-gnu",
    constraint_values = [
        "@bazel_tools//platforms:linux",
        "@bazel_tools//platforms:x86_32",
    ],
)

Defining Toolchains

Upstream docs:

To work with cross-compilation, toolchains themselves need to be (1) capable of generating non-native output binaries and (2) must define their Bazel constraints.

Toolchain Types

Each category of toolchain is identified by a toolchain type, which is a string in the format of a build label. There is no requirement that the value actually match any defined label. I recommend using a @-prefixed toolchain type, to avoid potential conflicts in workspaces with multiple language rules loaded.

1
DEMO_TOOLCHAIN = "@rules_demo//:demo_toolchain_type"

ToolchainInfo

The ToolchainInfo provider is how your rules store toolchain configuration to Bazel. There’s no special requirements about the values you can put in, so feel free to use whatever makes sense for your language.

Skylark doesn’t have a public/private distinction for struct attributes, so a convention of underscore-prefixed attribute names is borrowed from Python. It’s easy for rule implementations to get access to the ToolchainInfo for any registered toolchain, so be clear in your docs which attributes are part of your public API.

First you define a rule type for your toolchain info:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
def _demo_toolchain_info(ctx):
  return [
      platform_common.ToolchainInfo(
          compiler = ctx.attr.compiler,
          cflags = ctx.attr.cflags,
      ),
  ]

demo_toolchain_info = rule(
    _demo_toolchain_info,
    attrs = {
        "_compiler": attr.label(
            executable = True,
            default = "//:demo_compiler"
            cfg = "host",
        ),
        "cflags": attr.string_list(),
    },
)

Then use it to create toolchain info targets, one for each unique configuration you might want to build with:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
load(":toolchain.bzl", "demo_toolchain_info")

demo_toolchain_info(
    name = "demo_toolchain_info/i686-linux-gnu",
    cflags = ["--target-os=linux", "--target-arch=i686"],
)

demo_toolchain_info(
    name = "demo_toolchain_info/x86_64-linux-gnu",
    cflags = ["--target-os=linux", "--target-arch=amd64"],
)

Registration

Once you’ve got your ToolchainInfo rules defined, the next step is to register them. This is where the info is associated with the toolchain type and the constraint values so Bazel can auto-detect which toolchains are usable on a particular platform.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
load(":my_toolchain.bzl", "MY_TOOLCHAIN")
toolchain(
    name = "my_toolchain_linux_x86_32",
    exec_compatible_with = [
            "@bazel_tools//platforms:linux",
            "@bazel_tools//platforms:x86_32",
    ],
    target_compatible_with = [
            "@bazel_tools//platforms:linux",
            "@bazel_tools//platforms:x86_32",
    ],
    toolchain = ":my_toolchain_info_linux_x86_32",
    toolchain_type = MY_TOOLCHAIN,
)

toolchain(
    name = "my_toolchain_linux_x86_64",
    # [...]
)

Finally, the toolchains Bazel can use are passed to register_toolchains in your WORKSPACE. Usually this is done in a helper macro defined in the language rules, so that both the toolchain() rules and register_toolchains(...) args can be generated by the same logic.

1
2
3
4
register_toolchains(
    "//:my_toolchain_linux_x86_32",
    "//:my_toolchain_linux_x86_64",
)

Using Toolchains

Rules can say which type toolchains they depend on, like “needs a C++ compiler”. When defining the rule, set the toolchains param to all the toolchain types that will be needed to run the action. Then within the implementation, fetch the ToolchainInfo values (the same ones defined in the toolchain info rule) and inspect the content to implement your build.

1
2
3
4
5
6
7
8
def _my_rule(ctx):
  tc = ctx.toolchains[MY_TOOLCHAIN]
  print("toolchain: %s %r" % (tc.compiler, tc.cflags))

test_rule = rule(
    _my_rule,
    toolchains = [MY_TOOLCHAIN],
)

Cross-Compilation

Toolchains can have different exec_compatible_with and target_compatible_with attrs. The execution compatibility is used for the platform that runs builds (i.e. the worker), and the target compatibility is the types that the toolchain can output.

Here’s the definition of a cross-compiling toolchain that runs on 32-bit Linux but generates output for 64-bit Linux:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
load(":my_toolchain.bzl", "my_toolchain_info")

my_toolchain_info(
    name = "my_toolchain_info_linux_x86_32_cross64",
    compiler = "@my_prebuilt_compiler_linux_x86_32//:my_compiler",
    cflags = ["--target-os=linux", "--target-arch=amd64"],
)

toolchain(
    name = "my_toolchain_linux_x86_32_cross64",
    exec_compatible_with = [
            "@bazel_tools//platforms:linux",
            "@bazel_tools//platforms:x86_32",
    ],
    target_compatible_with = [
            "@bazel_tools//platforms:linux",
            "@bazel_tools//platforms:x86_64",
    ],
    toolchain = ":my_toolchain_info_linux_x86_32_cross64",
    toolchain_type = MY_TOOLCHAIN,
)

Platform Selection Flags

Bazel (as of 0.13) has two flags to override the platform selection, which are useful when the execution platform is custom-defined or different in some important way from the machine running Bazel. The most common reason is if you’re building with remote workers.

There’s also the --cpu and --host_cpu flags, which (if I understand correctly) are deprecated and exist only because the built-in C++ rules haven’t been migrated to the toolchains system yet.

Prebuilt Toolchains

Compiler toolchains are often large, and take a while to build. Downloading prebuilt toolchains can materially improve your users’ experience, but there’s some extra details to be aware of: