Home

Published

- 16 min read

Building an IPv6-Only Kubernetes Cluster with Talos on Hetzner Cloud

Hero image

Photo by Jametlene Reskp on Unsplash


The Goal

  • IPv6 single-stack Talos multi-node cluster
  • Certificate rotation (mTLS)
  • Reachable via IPv6 and IPv4 (inbound traffic)
  • Access IPv6 and IPv4 resources (outbound traffic)
  • TCP & UDP load balancing
  • HTTP/1.1 & HTTP/2 GatewayAPI

Infrastructure

Before we can install Talos we need to create the cloud infrastructure. Here I am using the Hetzner Cloud offering for affordable hosting. Of course this will work on any other cloud provider that offers firewalls or even your home lab.

Once you’ve got your API token (read & write), you can manage your Hetzner resources from your PC:

# Install Hetzner CLI tool
brew install hcloud
# Temporarily define Hetzner API key
read -rp 'Hetzner API key: ' HCLOUD_TOKEN
export HCLOUD_TOKEN
# Make sure it works
hcloud server list

Creating a Talos snapshot

Before we can get the servers, we need to prepare a Talos snapshot to base the servers on. While Hetzner does provide Talos ISOs, this ensures you’ll get the latest / desired version right away. To create this snapshot, we’ll make use of the Talos image factory and HashiCorp Packer:

# Install HashiCorp Packer
brew install hashicorp/tap/packer
# Optional: Define a GitHub personal access token to avoid rate-limiting errors
# Visit https://github.com/settings/personal-access-tokens
#  and create a fine-grained token with read-only access to public repositories
read -rp 'GitHub PAT: ' PACKER_GITHUB_API_TOKEN
export PACKER_GITHUB_API_TOKEN
# Define the steps needed to create the snapshot
cat > talos.pkr.hcl <<'EOF'
packer {
  required_plugins {
    hcloud = {
      source  = "github.com/hetznercloud/hcloud"
      version = "~> 1"
    }
  }
}
variable "talos_version" {
  type    = string
  # Change to the latest / desired version
  default = "v1.12.4"
}
variable "arch" {
  type    = string
  default = "amd64"
}
variable "server_type" {
  type    = string
  default = "cx23"
}
variable "server_location" {
  type    = string
  default = "nbg1"
}
locals {
  # Default Hetzner Cloud image from https://factory.talos.dev
  image = "https://factory.talos.dev/image/376567988ad370138ad8b2698212367b8edcb69b5fd68c80be1f2ec7d603b4ba/${var.talos_version}/hcloud-${var.arch}.raw.xz"
}
source "hcloud" "talos" {
  rescue               = "linux64"
  image                = "debian-13"
  location             = "${var.server_location}"
  server_type          = "${var.server_type}"
  public_ipv4_disabled = true
  ssh_username         = "root"

  snapshot_name   = "Talos ${var.talos_version} ${var.arch}"
  snapshot_labels = {
    type    = "infra",
    os      = "talos",
    version = "${var.talos_version}",
    arch    = "${var.arch}",
  }
}
build {
  sources = ["source.hcloud.talos"]
  provisioner "shell" {
    inline = [
      # Use NAT64 (https://nat64.net) to resolve 'factory.talos.dev': https://github.com/siderolabs/image-factory/issues/60
      "echo 'DNS=2a00:1098:2b::1 2a01:4f8:c2c:123f::1 2a01:4f9:c010:3f02::1' >> /etc/systemd/resolved.conf",
      "service systemd-resolved restart",
      "wget --inet6-only --progress=dot:giga -O /tmp/talos.raw.xz ${local.image}",
      "echo \"$(date +%H:%M:%S) | Decompressing and writing image (takes about 5 minutes)...\"",
      "xz -d -c /tmp/talos.raw.xz | dd of=/dev/sda && sync",
    ]
  }
}
EOF
# Create the snapshot (takes about 10 minutes)
packer init .
packer build .

When using another cloud provider, check if they offer Talos Linux directly or if they offer a HashiCorp Packer plugin. On dedicated servers follow the official installation instructions. When using Proxmox mind the MTU requirements of Talos and the MTU restrictions of your network.

Ordering the servers

Now it’s time to get the servers! You can start out with one load balancer and one node (or even just the node) for testing or go all in. Choose the server types that fit your workload and budget. Run hcloud server-type list to see the available options.

Using floating IPs is optional but allows you to switch over traffic to another load balancer (e.g. for maintenance) without needing to wait for DNS changes to propagate. The API endpoint URL of a Kubernetes cluster cannot be changed after creation. For better availability, IP failover could be configured to switch between multiple load balancers. This is usually achieved using keepalived, which uses VRRP (layer 2). Because Hetzner Cloud only provides us with layer 3 network access, VRRP can’t be used here - so I’ll leave IP failover as a future topic.

Note that Talos nodes start with an initially unauthenticated management port (50000). So make sure a firewall is in place before you start them!

# Create SSH key (if not already created)
# I'm using the key on the Talos instances as well even though it has no effect there because it avoids getting a root password email on each rebuild
hcloud ssh-key create --name ssh-key --public-key-from-file ~/.ssh/id_ed25519.pub
# Create firewall ruleset to allow SSH for maintenance
hcloud firewall create --name accept-ssh
hcloud firewall add-rule --direction in --protocol tcp --port 22 --source-ips '0.0.0.0/0' --source-ips '::/0' --description 'SSH' accept-ssh
# Create firewall ruleset for the internal load-balancer
hcloud firewall create --name internal-load-balancer
hcloud firewall apply-to-resource --type label_selector --label-selector 'type=load-balancer' internal-load-balancer
# Create firewall ruleset for the external load-balancer
hcloud firewall create --name external-load-balancer
hcloud firewall apply-to-resource --type label_selector --label-selector 'type=load-balancer' external-load-balancer
hcloud firewall add-rule --direction in --protocol tcp --port 80 --source-ips 0.0.0.0/0 --source-ips ::/0 --description 'HTTP' external-load-balancer
hcloud firewall add-rule --direction in --protocol tcp --port 443 --source-ips 0.0.0.0/0 --source-ips ::/0 --description 'HTTPS' external-load-balancer
hcloud firewall add-rule --direction in --protocol udp --port 443 --source-ips 0.0.0.0/0 --source-ips ::/0 --description 'HTTP/3' external-load-balancer
# Create firewall ruleset for the control-plane nodes
hcloud firewall create --name kubernetes-control-plane
hcloud firewall apply-to-resource --type label_selector --label-selector 'kubernetes-control-plane-role' kubernetes-control-plane
# Optional: Create firewall ruleset for the worker nodes
hcloud firewall create --name kubernetes-worker
hcloud firewall apply-to-resource --type label_selector --label-selector 'kubernetes-worker-role' kubernetes-worker
# Create nodes
TALOS_SNAPSHOT="$(hcloud image list | grep -i talos -m1 | awk '{print $1}')"
hcloud server create --type cx33 --name node-1 --image "$TALOS_SNAPSHOT" --without-ipv4 --location nbg1 --label 'type=kubernetes-node' --label 'kubernetes-control-plane-role' --label 'kubernetes-worker-role' --ssh-key ssh-key --enable-protection rebuild,delete
hcloud server create --type cx33 --name node-2 --image "$TALOS_SNAPSHOT" --without-ipv4 --location fsn1 --label 'type=kubernetes-node' --label 'kubernetes-control-plane-role' --label 'kubernetes-worker-role' --ssh-key ssh-key --enable-protection rebuild,delete
hcloud server create --type cx33 --name node-3 --image "$TALOS_SNAPSHOT" --without-ipv4 --location hel1 --label 'type=kubernetes-node' --label 'kubernetes-control-plane-role' --label 'kubernetes-worker-role' --ssh-key ssh-key --enable-protection rebuild,delete
# Create load balancers
# The public IPv4 address is also required for the floating IPv4 address
hcloud server create --type cx23 --name load-balancer-1 --image ubuntu-24.04 --firewall accept-ssh --location nbg1 --label 'type=load-balancer' --ssh-key ssh-key --enable-protection rebuild,delete
# Create floating IPs
hcloud floating-ip create --type ipv4 --server load-balancer-1 --name load-balancer-ipv4 --description "Load balancer IPv4" --enable-protection delete
hcloud floating-ip create --type ipv6 --server load-balancer-1 --name load-balancer-ipv6 --description "Load balancer IPv6" --enable-protection delete
# Add server IPs to firewalls
export FROM_FLOATING_IPV6="--source-ips $(hcloud floating-ip list -o columns=ip,name | grep load-balancer-ipv6 | awk '{print $1}')"
export FROM_LOAD_BALANCERS="$FROM_FLOATING_IPV6 $(hcloud server list -o columns=ipv6 -o noheader -l type=load-balancer | sed 's|^|--source-ips |')"
# The IP addresses your administering the cluster from (running kubectl, k9s, talosctl)
# Here I'm using the load balancers host to run CLI commands via SSH
export FROM_ADMINISTRATION="$FROM_LOAD_BALANCERS"
export FROM_NODES="$(hcloud server list -o columns=ipv6 -o noheader -l type=kubernetes-node | sed 's|^|--source-ips |')"
export FROM_CONTROL_PLANE="$(hcloud server list -o columns=ipv6 -o noheader -l kubernetes-control-plane-role | sed 's|^|--source-ips |')"
export FROM_WORKERS="$(hcloud server list -o columns=ipv6 -o noheader -l kubernetes-worker-role | sed 's|^|--source-ips |')"
hcloud firewall add-rule --direction in --protocol tcp --port 6443 $FROM_ADMINISTRATION $FROM_NODES --description 'Kubernetes API server' internal-load-balancer
hcloud firewall add-rule --direction in --protocol tcp --port 2379-2380 $FROM_CONTROL_PLANE --description 'Kubernetes etcd' kubernetes-control-plane
hcloud firewall add-rule --direction in --protocol tcp --port 6443 $FROM_LOAD_BALANCERS --description 'Kubernetes API server' kubernetes-control-plane
hcloud firewall add-rule --direction in --protocol tcp --port 10250 $FROM_CONTROL_PLANE --description 'Kubernetes kubelet API' kubernetes-control-plane
hcloud firewall add-rule --direction in --protocol tcp --port 50000 $FROM_ADMINISTRATION $FROM_CONTROL_PLANE --description 'Talos apid' kubernetes-control-plane
hcloud firewall add-rule --direction in --protocol tcp --port 50001 $FROM_WORKERS --description 'Talos trustd' kubernetes-control-plane
# Add any protocols that you forward from the load balancer here
hcloud firewall add-rule --direction in --protocol tcp --port 80 $FROM_LOAD_BALANCERS --description 'Forwarded: HTTP' kubernetes-worker
hcloud firewall add-rule --direction in --protocol tcp --port 443 $FROM_LOAD_BALANCERS --description 'Forwarded: HTTPS' kubernetes-worker
hcloud firewall add-rule --direction in --protocol udp --port 443 $FROM_LOAD_BALANCERS --description 'Forwarded: HTTP/3' kubernetes-worker
hcloud firewall add-rule --direction in --protocol tcp --port 10250 $FROM_CONTROL_PLANE --description 'Kubernetes kubelet API' kubernetes-worker
hcloud firewall add-rule --direction in --protocol tcp --port 50000 $FROM_CONTROL_PLANE --description 'Talos apid' kubernetes-worker
# Optional: Allow ping for manual checks
hcloud firewall add-rule --direction in --protocol icmp $FROM_LOAD_BALANCERS --description 'Ping' kubernetes-control-plane
hcloud firewall add-rule --direction in --protocol icmp $FROM_LOAD_BALANCERS --description 'Ping' kubernetes-worker

Note: From now on, you can expect to see errors showing up in every stage until we are done. In particular:

  • The nameservers are set to IPv4 addresses in the snapshot
  • Because certificate rotation is enabled, you won’t have a valid certificate for the Kubernetes API server until the CSR approver is deployed (and that’s the last step).
  • Some errors appear and disappear as requests are sent to pending components

Load balancing using HAProxy

As Hetzner Cloud load balancers don’t support IPv6 targets (nor UDP) we’ll create our own load balancers. Here I am using HAProxy with Docker compose for a simple and reliable setup.

The following sections should be followed on each of your load balancers (via SSH).

Configure floating IP

To use the floating IPs we assigned, we need to add them to the network configuration:

# Beware this requires access to the Hetzner API
# Alternatively copy the floating IPs (including netmask) to the variables or directly in the configuration file
export FLOATING_IPV4="$(hcloud floating-ip list -o columns=ip,name | grep load-balancer-ipv4 | awk '{print $1"/32"}')"
export FLOATING_IPV6="$(hcloud floating-ip list -o columns=ip,name | grep load-balancer-ipv6 | awk '{print $1}' | sed 's|/64|1/64|')"
sudo sed -i "/^      addresses:/a\      - $FLOATING_IPV4\n      - $FLOATING_IPV6" /etc/netplan/50-cloud-init.yaml
# Review content
sudo cat /etc/netplan/50-cloud-init.yaml
# Confirm settings by pressing enter
sudo netplan try

Internal load balancer

The internal load balancer just forwards the Kubernetes API (HTTPS via IPv6). The HTTP health check can be enabled after the cluster has been bootstrapped.

# 'hcloud' requires API access; alternatively copy IP address
export NODE1_IPV6="$(hcloud server ip -6 node-1)"
cat > internal-haproxy.cfg <<EOF
global
    log stdout format raw local0
    maxconn 4096

defaults
    log     global
    mode    tcp
    option  tcplog
    option  dontlognull
    timeout connect 5s
    timeout client  30s
    timeout server  30s
    retries 3
    option  redispatch

frontend kubernetes-api
    bind [::]:6443
    mode tcp

    default_backend kubernetes-api-server

backend kubernetes-api-server
    mode    tcp
    balance roundrobin

    timeout check 3s

    default-server check inter 3s downinter 1s fall 2
    server node-1 [$NODE1_IPV6]:6443
EOF

It’s important to only add / uncomment new servers when they have fully joined the cluster. Otherwise requests may hit unhealthy nodes.

External load balancer

The external load balancer will forward traffic from users (HTTP via IPv4 & IPv6). You can add more front- and backends here if you want to load balance additional services.

# 'hcloud' requires API access; alternatively copy IP addresses
export NODE1_IPV6="$(hcloud server ip -6 node-1)"
export NODE2_IPV6="$(hcloud server ip -6 node-2)"
export NODE3_IPV6="$(hcloud server ip -6 node-3)"
cat > external-haproxy.cfg <<EOF
global
    log stdout format raw local0
    maxconn 10000

defaults
    log     global
    option  dontlognull
    timeout connect 5s
    timeout client  60s
    timeout server  60s
    retries 3
    option  redispatch

frontend http
    bind *:80
    mode http
    option httplog

    default_backend kubernetes_http

frontend https
    bind *:443
    mode tcp
    option tcplog

    default_backend kubernetes_https

frontend https3
    bind *:443
    mode udp

    default_backend kubernetes_https3

backend kubernetes_http
    mode    http
    balance roundrobin

    timeout check 3s

    default-server check inter 3s downinter 1s fall 2
    server node-1 [$NODE1_IPV6]:80
    server node-2 [$NODE2_IPV6]:80
    server node-3 [$NODE3_IPV6]:80

backend kubernetes_https
    mode    tcp
    balance roundrobin

    timeout check 3s

    default-server check inter 3s downinter 1s fall 2
    server node-1 [$NODE1_IPV6]:443
    server node-2 [$NODE2_IPV6]:443
    server node-3 [$NODE3_IPV6]:443

backend kubernetes_https3
    mode    udp
    # QUIC connections must stay on the same backend
    balance source

    timeout check 3s

    default-server check inter 3s downinter 1s fall 2
    server node-1 [$NODE1_IPV6]:443
    server node-2 [$NODE2_IPV6]:443
    server node-3 [$NODE3_IPV6]:443
EOF

Docker compose

Install Docker compose using the official installation instructions. I am using Docker Hardened Images here. To pull them, you need to log into your Docker account using docker login dhi.io using a personal access token (public read-only permission) as the password. Alternatively, you can use the official HAProxy image.

# Use a PAT as the password
docker login dhi.io
cat > compose.yaml <<EOF
services:
  external-haproxy:
    image: dhi.io/haproxy:3.3.4-debian13
    command:
      - -f
      - /usr/local/etc/haproxy
    network_mode: host
    volumes:
      - ./external-haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:ro
    restart: always
    read_only: true
  internal-haproxy:
    image: dhi.io/haproxy:3.3.4-debian13
    command:
      - -f
      - /usr/local/etc/haproxy
    network_mode: host
    volumes:
      - ./internal-haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:ro
    restart: always
    read_only: true
EOF
docker compose up -d

Talos machine configuration

Talos is configured using machine configuration. In the future Talos will use separate configuration resources instead. I split the configuration into multiple patches for clarity.

Note on KubePrism

If you wish to use KubePrism, remove no-kube-prism.yaml from the patches folder and change k8sServiceHost and k8sServicePort as described in the Cilium configuration. I have disabled KubePrism for three reasons:

  • Even though the Hetzner Cloud instances officially don’t have an IPv4 address, they are assigned a utility IPv4 address. This IPv4 is picked up and preferred by the Talos discovery service, causing talosctl health to fail.
  • Since the deprecation of the Kubernetes discovery feature, the only two alternatives are:
  • The discovery service lags behind the actual state of the cluster, showing removed nodes for some time.

As KubePrism doesn’t properly work with my infrastructure, I have opted to disable it. On top of that there is no true self-hosted free option and the benefits of using KubePrism are not essential.

mkdir patches
cat > patches/certificate-rotation.yaml <<EOF
machine:
  kubelet:
    extraArgs:
      rotate-server-certificates: true
EOF
cat > patches/cilium.yaml <<EOF
machine:
  features:
    hostDNS:
      # Doesn't work with Cilium CNI
      forwardKubeDNSToHost: false
cluster:
  network:
    cni:
      name: none
  proxy:
    disabled: true
EOF
cat > patches/talos-ccm.yaml <<EOF
machine:
  features:
    kubernetesTalosAPIAccess:
      enabled: true
      allowedRoles:
        - os:reader
      allowedKubernetesNamespaces:
        - kube-system
  kubelet:
    extraArgs:
      cloud-provider: external
EOF
cat > patches/hetzner-ntp.yaml <<EOF
machine:
  time:
    servers:
      - ntp1.hetzner.de
      - ntp2.hetzner.com
      - ntp3.hetzner.net
      - time.cloudflare.com
EOF
cat > patches/ipv6-single-stack.yaml <<EOF
machine:
  kubelet:
    nodeIP:
      validSubnets:
        - 2000::/3
    extraConfig:
      address: "::"
#      clusterDNS: []
cluster:
  # IPAM is handled by Talos CCM, so the subnets are ignored. Removes the default IPv4 settings.
  network:
    podSubnets:
      - "fd40:10::/96"
    serviceSubnets:
      - "fd40:10:100::/112"
  apiServer:
    extraArgs:
      bind-address: "::"
  controllerManager:
    extraArgs:
      bind-address: "::"
      # IPAM is handled by Talos CCM, so the mask size is ignored.
      node-cidr-mask-size-ipv6: "112"
      # Disable Node IPAM controller as we use Talos CCM for the pod network for each node.
      controllers: "*,-node-ipam-controller"
  scheduler:
    extraArgs:
      bind-address: "::"
  etcd:
    advertisedSubnets:
      - 2000::/3
    listenSubnets:
      - 2000::/3
#    extraArgs:
#      listen-metrics-urls: "http://[::]:2381"
EOF
cat > patches/nat64.yaml <<EOF
machine:
  network:
    nameservers:
      # Public NAT64: https://nat64.net
      - 2a00:1098:2b::1
      - 2a01:4f8:c2c:123f::1
      - 2a01:4f9:c010:3f02::1
EOF
cat > patches/no-kube-prism.yaml <<EOF
machine:
  features:
    kubePrism:
      # Requires discovery
      enabled: false
cluster:
  discovery:
    # Discovery prefers IPv4 and requires public service (or self-hosting with business license)
    enabled: false
EOF

CLUSTER_NAME="talos-ipv6-single-stack"
export NODE1_IPV6="$(hcloud server ip -6 node-1)"
export FLOATING_IPV6="$(hcloud floating-ip list -o columns=ip,name | grep load-balancer-ipv6 | awk '{print $1}' | sed 's|/64|1|')"
ENDPOINT="https://[$FLOATING_IPV6]:6443"
talosctl gen secrets
# Generate configuration based on secrets and all patches
talosctl gen config --with-secrets secrets.yaml "$(ls patches | sed 's|^|--config-patch @patches/|')" "$CLUSTER_NAME" "$ENDPOINT"
cp talosconfig ~/.talos/config
talosctl config endpoint "$NODE1_IPV6"
talosctl config node "$NODE1_IPV6"
# The '--insecure' flag is required when applying configuration for the first time, as there is no established trust yet
talosctl apply-config --insecure -n "$NODE1_IPV6" --file controlplane.yaml
# Wait for 'KUBELET' status to be 'Healthy' on the dashboard (then exit using CTRL-C)
talosctl dashboard
talosctl bootstrap
# Wait until messages like "created /v1/Service/kube-dns" appear; ignore "remote error: tls: internal error" messages
talosctl dashboard
talosctl kubeconfig
# Remove control-plane taint, if you want to use the control-plane node as a worker as well
kubectl taint node node-1 node-role.kubernetes.io/control-plane:NoSchedule-

At this point you should be able to access the API server e.g. kubectl get nodes

Talos cloud controller manager

The Talos CCM acts as the IPAM (IP address management) controller and assign each node a pod CIDR (IP range; /80) that fits in the nodes netmask (/64). This allows any router to understand to which node the traffic for any given pod belongs without NAT.

cat > talos-ccm.yaml <<EOF
enabledControllers:
  - cloud-node
  - node-ipam-controller
extraArgs:
  - --allocate-node-cidrs
  - --cidr-allocator-type=CloudAllocator
  - --node-cidr-mask-size-ipv6=80
daemonSet:
  enabled: true
EOF
helm upgrade -i talos-cloud-controller-manager oci://ghcr.io/siderolabs/charts/talos-cloud-controller-manager -n kube-system -f talos-ccm.yaml

The Talos cloud controller manager pod should be running with only the CoreDNS pods being pending: kubectl get pods -A

Cilium

I use Cilium as the CNI with kube-proxy replacement and GatewayAPI enabled.

export FLOATING_IPV6="$(hcloud floating-ip list -o columns=ip,name | grep load-balancer-ipv6 | awk '{print $1}' | sed 's|/64|1|')"
cat > cilium.yaml <<EOF
operator:
  replicas: 1
# Use IPv6
ipv4:
  enabled: false
ipv6:
  enabled: true
k8s:
  requireIPv4PodCIDR: false
  requireIPv6PodCIDR: true
enableIPv6Masquerade: false
enableIPv4Masquerade: false
# Talos specific
# see: https://docs.siderolabs.com/kubernetes-guides/cni/deploying-cilium
ipam:
  mode: kubernetes
cgroup:
  autoMount:
    enabled: false
  hostRoot: /sys/fs/cgroup
securityContext:
  capabilities:
    ciliumAgent:
      - CHOWN
      - KILL
      - NET_ADMIN
      - NET_RAW
      - IPC_LOCK
      - SYS_ADMIN
      - SYS_RESOURCE
      - DAC_OVERRIDE
      - FOWNER
      - SETGID
      - SETUID
    cleanCiliumState:
      - NET_ADMIN
      - SYS_ADMIN
      - SYS_RESOURCE
# Performance
routingMode: native
loadBalancer:
  acceleration: native
# GatewayAPI
gatewayAPI:
  enabled: true
  enableAlpn: true
  enableAppProtocol: true
# kube-proxy replacement
# If enabled, requires k8sServiceHost and k8sServicePort
kubeProxyReplacement: true
# When using KubePrism (a Talos feature - see machine config)
#k8sServiceHost: localhost
#k8sServicePort: 7445
# Otherwise
k8sServiceHost: $FLOATING_IPV6
k8sServicePort: 6443
EOF
helm repo add cilium https://helm.cilium.io/
# IMPORTANT: Do not change Cilium version until the Talos kernel is updated to a patched version
# see: https://github.com/cilium/cilium/issues/44216
# see: https://github.com/siderolabs/talos/issues/12331
helm upgrade -i cilium cilium/cilium -n kube-system -f cilium.yaml --version 1.18.7

At this point the node should have no taints (except node-role.kubernetes.io/control-plane:NoSchedule if you haven’t removed it). All pods should be running: kubectl get pods -A

CSR approver

Because we enabled kubelet certificate rotation in the machine configuration, we need to approve certificate signing requests (CSRs). We’ll do that in an automated way, since the certificates are rotated regularly.

cat > kubelet-csr-approver.yaml <<EOF
replicas: 1
providerRegex: '^node-[0-9]+$'
# This requires hostnames (e.g. 'node-1') to have an DNS entry
bypassDnsResolution: true
EOF
helm repo add kubelet-csr-approver https://postfinance.github.io/kubelet-csr-approver
helm upgrade -i kubelet-csr-approver kubelet-csr-approver/kubelet-csr-approver -n kube-system -f kubelet-csr-approver.yaml

That’s it for the first node and cluster bootstrap! Now you should have a working IPv6 Talos cluster. Run talosctl health for a quick check

Joining nodes

Beware that newly provisioned nodes need to be added to the firewall rulesets. To join the remaining nodes, follow this process:

export NODE_NAME="<insert-name-here>"
export NODE_IPV6="$(hcloud server ip -6 $NODE_NAME)"
# Use 'worker.yaml' for non-control-plane nodes
talosctl apply-config --insecure --nodes $NODE_IPV6 --file controlplane.yaml
# Remove control-plane taint, if you want to use the control-plane node as a worker as well
kubectl taint node $NODE_NAME node-role.kubernetes.io/control-plane:NoSchedule-
# If the node is a control-plane, add it to the internal load balancer
echo "    server $NODE_NAME [$NODE_IPV6]:6443" >> internal-haproxy.cfg
docker compose restart internal-haproxy

Run talosctl etcd members to confirm that the nodes joined. Do not add the node to the load balancer before it has successfully joined the cluster.

Troubleshooting

If you are running into issues:

  • Consider whether the problem is temporary
    • During each stage of the bootstrapping process you will see errors up until csr-approver is installed
  • Check talosctl services
  • Check talosctl etcd members
  • Check talosctl get nodeip
  • Check talosctl logs etcd
  • Check talosctl logs kubelet
  • Check talosctl dashboard
  • Check talosctl health
  • Read the official troubleshooting guide
  • Check traffic e.g. talosctl pcap -o - | tcpdump -r - 'tcp port 2379'
  • On the load-balancer, check the HAProxy logs: docker compose logs
  • Rebuild your nodes and try again

Further reading