How to install and run QEMU on K8s nodes?

If you find yourself having to emulate different CPU architectures in a Kubernetes environment you’ll probably end up running some version of binfmt as an init container or maybe manually run it once.

That would probably be ok in most cases, but wouldn’t work if, for example, you’re running multiple pod replicas on the same node concurrently (say, when you setup a CI that spawns pods for every job in your pipeline) or when nodes are autoscaled.

There’re probably ways to make sure only one pod sets up binfmt, but when running the pods concurrently, it’s very difficult to orchestrate.

But there’s light at the end of the tunnel. There’s the DaemonSet which we can use to run binfmt once on any newly created node.

You just need to create a simple config with:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# Run binfmt setup on any new node
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: binfmt
  labels:
    app: binfmt-setup
spec:
  selector:
    matchLabels:
      name: binfmt
  # https://kubernetes.io/docs/concepts/workloads/pods/#pod-templates
  template:
    metadata:
      labels:
        name: binfmt
    spec:
      tolerations:
        # Have the daemonset runnable on master nodes
        # NOTE: Remove it if your masters can't run pods
        - key: node-role.kubernetes.io/master
          effect: NoSchedule
      initContainers:
        - name: binfmt
          image: tonistiigi/binfmt
          # command: []
          args: ["--install", "all"]
          # Run the container with the privileged flag
          # https://kubernetes.io/docs/tasks/configure-pod-container/security-context/
          # https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#securitycontext-v1-core
          securityContext:
            privileged: true
      containers:
        - name: pause
          image: gcr.io/google_containers/pause
          resources:
            limits:
              cpu: 50m
              memory: 50Mi
            requests:
              cpu: 50m
              memory: 50Mi

And apply it:

1
kubectl apply -f ./daemonset.yaml

That’s it. binfmt should now be setup on your cluster.

Note

If later on you decide on a different approach, you can delete the daemonset and any resources associated with it:

1
kubectl delete daemonset binfmt --namespace=default