I've recently been using Bazel as a multi-platform distributed build system. Bazel itself supports this pretty well, but many of the user-contributed extension libraries don't make good use of Bazel's toolchains and therefore break when multiple OSes are involved in a build. I hope the situation can be improved by documenting nascent best practices.
This page is a bit advanced. It assumes background knowledge in cross compilation, plus experience with Bazel's Starlark extension language, build rules, and repository definitions . Most users of Bazel shouldn't need to care about the details of compiler toolchains, but this is important stuff for maintainers of language rules.
Bazel's package/toolchain design is based on constraints, which are simple text key/value pairs. Keys are defined constraint_setting
, and values by constraint_value
. Settings and values are true targets, which means they're addressed by label, obey visibility, and can be aliased.
A couple basic constraints come predefined in @bazel_tools//platforms
:
@bazel_tools//platforms:cpu
@bazel_tools//platforms:arm
@bazel_tools//platforms:ppc
@bazel_tools//platforms:s390x
@bazel_tools//platforms:x86_32
@bazel_tools//platforms:x86_64
@bazel_tools//platforms:os
@bazel_tools//platforms:freebsd
@bazel_tools//platforms:linux
@bazel_tools//platforms:osx
@bazel_tools//platforms:windows
Note the limited selection and lack of precision. These definitions are (as of Bazel 0.13) useful only for getting started. Most language rules will want to define their own – see @io_bazel_rules_go//go/toolchain:toolchains.bzl
for an example of custom values for the built-in settings.
Upstream docs:
A platform is a named set of constraint values (see above), plus some other metadata that I'm going to skip because it's part of the not-fully-implemented remote execution API. They can contain any number of constraint values, but at most one constraint value per constraint setting (i.e. you can't have a platform with two CPU types). Be specific – Autoconf's "GNU Triplets " are a good model to imitate here.
# platforms/BUILD
platform(
name = "x86_64-apple-darwin",
constraint_values = [
"@bazel_tools//platforms:osx",
"@bazel_tools//platforms:x86_64",
],
)
platform(
name = "i686-linux-gnu",
constraint_values = [
"@bazel_tools//platforms:linux",
"@bazel_tools//platforms:x86_32",
],
)
The rest of this page will use the standard platform definitions built into Bazel. Custom platforms are if you need to constrain on other dimensions, such as CPU vendor or libc version.
Upstream docs:
To work with cross-compilation, toolchains themselves need to be (1) capable of generating non-native output binaries and (2) must define their Bazel constraints.
Each category of toolchain is identified by a toolchain type, which is a string in the format of a build label. There is no requirement that the value actually match any defined label. I recommend using a @
-prefixed toolchain type, to avoid potential conflicts in workspaces with multiple language rules loaded.
The ToolchainInfo
provider is how your rules store toolchain configuration to Bazel. There's no special requirements about the values you can put in, so feel free to use whatever makes sense for your language.
Skylark doesn't have a public/private distinction for struct attributes, so a convention of underscore-prefixed attribute names is borrowed from Python. It's easy for rule implementations to get access to the ToolchainInfo
for any registered toolchain, so be clear in your docs which attributes are part of your public API.
First you define a rule type for your toolchain info:
# demo_toolchain.bzl
DEMO_TOOLCHAIN = "@rules_demo//:demo_toolchain_type"
def _demo_toolchain_info(ctx):
return [
platform_common.ToolchainInfo(
compiler = ctx.attr.compiler,
cflags = ctx.attr.cflags,
),
]
demo_toolchain_info = rule(
_demo_toolchain_info,
attrs = {
"_compiler": attr.label(
executable = True,
default = "//:demo_compiler"
cfg = "host",
),
"cflags": attr.string_list(),
},
)
Then use it to create toolchain info targets, one for each unique configuration you might want to build with:
# BUILD
load(":demo_toolchain.bzl", "DEMO_TOOLCHAIN", "demo_toolchain_info")
demo_toolchain_info(
name = "demo_toolchain_info/i686-linux-gnu",
cflags = ["--target-os=linux", "--target-arch=i686"],
)
demo_toolchain_info(
name = "demo_toolchain_info/x86_64-linux-gnu",
cflags = ["--target-os=linux", "--target-arch=amd64"],
)
Once you've got your ToolchainInfo
rules defined, the next step is to register them. This is where the info is associated with the toolchain type and the constraint values so Bazel can auto-detect which toolchains are usable on a particular platform.
# BUILD
toolchain(
name = "demo_toolchain_linux_x86_32",
exec_compatible_with = [
"@bazel_tools//platforms:linux",
"@bazel_tools//platforms:x86_32",
],
target_compatible_with = [
"@bazel_tools//platforms:linux",
"@bazel_tools//platforms:x86_32",
],
toolchain = ":demo_toolchain_info/i686-linux-gnu",
toolchain_type = DEMO_TOOLCHAIN,
)
toolchain(
name = "demo_toolchain_linux_x86_64",
# [...]
)
Finally, the toolchains Bazel can use are passed to register_toolchains
in your WORKSPACE
. Usually this is done in a helper macro defined in the language rules, so that both the toolchain()
rules and register_toolchains(...)
args can be generated by the same logic.
# WORKSPACE
register_toolchains(
"//:demo_toolchain_linux_x86_32",
"//:demo_toolchain_linux_x86_64",
)
Rules can say which type toolchains they depend on, like "needs a C++ compiler". When defining the rule, set the toolchains
param to all the toolchain types that will be needed to run the action. Then within the implementation, fetch the ToolchainInfo
values (the same ones defined in the toolchain info rule) and inspect the content to implement your build.
# rules.bzl
def _demo_rule(ctx):
tc = ctx.toolchains[DEMO_TOOLCHAIN]
print("toolchain: %s %r" % (tc.compiler, tc.cflags))
demo_rule = rule(
_demo_rule,
toolchains = [DEMO_TOOLCHAIN],
)
Toolchains can have different exec_compatible_with
and target_compatible_with
attrs. The execution compatibility is used for the platform that runs builds (i.e. the worker), and the target compatibility is the types that the toolchain can output.
Here's the definition of a cross-compiling toolchain that runs on 32-bit Linux but generates output for 64-bit Linux:
# BUILD
load(":demo_toolchain.bzl", "demo_toolchain_info")
demo_toolchain_info(
name = "demo_toolchain_info_linux_x86_32_cross64",
compiler = "@demo_prebuilt_compiler_linux_x86_32//:demo_compiler",
cflags = ["--target-os=linux", "--target-arch=amd64"],
)
toolchain(
name = "demo_toolchain_linux_x86_32_cross64",
exec_compatible_with = [
"@bazel_tools//platforms:linux",
"@bazel_tools//platforms:x86_32",
],
target_compatible_with = [
"@bazel_tools//platforms:linux",
"@bazel_tools//platforms:x86_64",
],
toolchain = ":demo_toolchain_info_linux_x86_32_cross64",
toolchain_type = DEMO_TOOLCHAIN,
)
Bazel (as of 0.13) has two flags to override the platform selection, which are useful when the execution platform is custom-defined or different in some important way from the machine running Bazel. The most common reason is if you're building with remote workers.
--platforms
flag specifies which platforms the final compiled binaries will run on. This flag can accept multiple platforms, in which case Bazel may generate multiple outputs for a build artifact.--host_platform
flag overrides which platform is used for executing build commands. I'm hopeful that this flag could be split into --host_platform
and --remote_platform
in future versions of Bazel, so that some actions can be run locally even if the distributed build pool is different from the local workstation.There's also the --cpu
and --host_cpu
flags, which (if I understand correctly) are deprecated and exist only because the built-in C++ rules haven't been migrated to the toolchains system yet.
Compiler toolchains are often large, and take a while to build. Downloading prebuilt toolchains can materially improve your users' experience, but there's some extra details to be aware of:
uname
, inspection of /proc
, or similar unsandboxed mechanisms to discover the execution platform. These interfere with user's customizations of the build environment, and can cause incorrect behavior when the execution platform is different from where the user is running Bazel..bzl
file. Repository rules are invalidated by any changes to the .bzl
file they're defined in, and you don't want small changes to toolchains to force a re-download of large toolchain archives.