Published
- 16 min read
Building an IPv6-Only Kubernetes Cluster with Talos on Hetzner Cloud
Photo by Jametlene Reskp on Unsplash
The Goal
- IPv6 single-stack Talos multi-node cluster
- Certificate rotation (mTLS)
- Reachable via IPv6 and IPv4 (inbound traffic)
- Access IPv6 and IPv4 resources (outbound traffic)
- TCP & UDP load balancing
- HTTP/1.1 & HTTP/2 GatewayAPI
- HTTP/3 support depends on https://github.com/cilium/cilium/issues/28497
- TLS pass-though
Infrastructure
Before we can install Talos we need to create the cloud infrastructure. Here I am using the Hetzner Cloud offering for affordable hosting. Of course this will work on any other cloud provider that offers firewalls or even your home lab.
Once you’ve got your API token (read & write), you can manage your Hetzner resources from your PC:
# Install Hetzner CLI tool
brew install hcloud
# Temporarily define Hetzner API key
read -rp 'Hetzner API key: ' HCLOUD_TOKEN
export HCLOUD_TOKEN
# Make sure it works
hcloud server list Creating a Talos snapshot
Before we can get the servers, we need to prepare a Talos snapshot to base the servers on. While Hetzner does provide Talos ISOs, this ensures you’ll get the latest / desired version right away. To create this snapshot, we’ll make use of the Talos image factory and HashiCorp Packer:
# Install HashiCorp Packer
brew install hashicorp/tap/packer
# Optional: Define a GitHub personal access token to avoid rate-limiting errors
# Visit https://github.com/settings/personal-access-tokens
# and create a fine-grained token with read-only access to public repositories
read -rp 'GitHub PAT: ' PACKER_GITHUB_API_TOKEN
export PACKER_GITHUB_API_TOKEN
# Define the steps needed to create the snapshot
cat > talos.pkr.hcl <<'EOF'
packer {
required_plugins {
hcloud = {
source = "github.com/hetznercloud/hcloud"
version = "~> 1"
}
}
}
variable "talos_version" {
type = string
# Change to the latest / desired version
default = "v1.12.4"
}
variable "arch" {
type = string
default = "amd64"
}
variable "server_type" {
type = string
default = "cx23"
}
variable "server_location" {
type = string
default = "nbg1"
}
locals {
# Default Hetzner Cloud image from https://factory.talos.dev
image = "https://factory.talos.dev/image/376567988ad370138ad8b2698212367b8edcb69b5fd68c80be1f2ec7d603b4ba/${var.talos_version}/hcloud-${var.arch}.raw.xz"
}
source "hcloud" "talos" {
rescue = "linux64"
image = "debian-13"
location = "${var.server_location}"
server_type = "${var.server_type}"
public_ipv4_disabled = true
ssh_username = "root"
snapshot_name = "Talos ${var.talos_version} ${var.arch}"
snapshot_labels = {
type = "infra",
os = "talos",
version = "${var.talos_version}",
arch = "${var.arch}",
}
}
build {
sources = ["source.hcloud.talos"]
provisioner "shell" {
inline = [
# Use NAT64 (https://nat64.net) to resolve 'factory.talos.dev': https://github.com/siderolabs/image-factory/issues/60
"echo 'DNS=2a00:1098:2b::1 2a01:4f8:c2c:123f::1 2a01:4f9:c010:3f02::1' >> /etc/systemd/resolved.conf",
"service systemd-resolved restart",
"wget --inet6-only --progress=dot:giga -O /tmp/talos.raw.xz ${local.image}",
"echo \"$(date +%H:%M:%S) | Decompressing and writing image (takes about 5 minutes)...\"",
"xz -d -c /tmp/talos.raw.xz | dd of=/dev/sda && sync",
]
}
}
EOF
# Create the snapshot (takes about 10 minutes)
packer init .
packer build . When using another cloud provider, check if they offer Talos Linux directly or if they offer a HashiCorp Packer plugin. On dedicated servers follow the official installation instructions. When using Proxmox mind the MTU requirements of Talos and the MTU restrictions of your network.
Ordering the servers
Now it’s time to get the servers!
You can start out with one load balancer and one node (or even just the node) for testing or go all in.
Choose the server types that fit your workload and budget. Run hcloud server-type list to see the available options.
Using floating IPs is optional but allows you to switch over traffic to another load balancer (e.g. for maintenance) without needing to wait for DNS changes to propagate.
The API endpoint URL of a Kubernetes cluster cannot be changed after creation. For better availability, IP failover could be configured to switch between multiple load balancers.
This is usually achieved using keepalived, which uses VRRP (layer 2). Because Hetzner Cloud only provides us with layer 3 network access, VRRP can’t be used here - so I’ll leave IP failover as a future topic.
Note that Talos nodes start with an initially unauthenticated management port (50000). So make sure a firewall is in place before you start them!
# Create SSH key (if not already created)
# I'm using the key on the Talos instances as well even though it has no effect there because it avoids getting a root password email on each rebuild
hcloud ssh-key create --name ssh-key --public-key-from-file ~/.ssh/id_ed25519.pub
# Create firewall ruleset to allow SSH for maintenance
hcloud firewall create --name accept-ssh
hcloud firewall add-rule --direction in --protocol tcp --port 22 --source-ips '0.0.0.0/0' --source-ips '::/0' --description 'SSH' accept-ssh
# Create firewall ruleset for the internal load-balancer
hcloud firewall create --name internal-load-balancer
hcloud firewall apply-to-resource --type label_selector --label-selector 'type=load-balancer' internal-load-balancer
# Create firewall ruleset for the external load-balancer
hcloud firewall create --name external-load-balancer
hcloud firewall apply-to-resource --type label_selector --label-selector 'type=load-balancer' external-load-balancer
hcloud firewall add-rule --direction in --protocol tcp --port 80 --source-ips 0.0.0.0/0 --source-ips ::/0 --description 'HTTP' external-load-balancer
hcloud firewall add-rule --direction in --protocol tcp --port 443 --source-ips 0.0.0.0/0 --source-ips ::/0 --description 'HTTPS' external-load-balancer
hcloud firewall add-rule --direction in --protocol udp --port 443 --source-ips 0.0.0.0/0 --source-ips ::/0 --description 'HTTP/3' external-load-balancer
# Create firewall ruleset for the control-plane nodes
hcloud firewall create --name kubernetes-control-plane
hcloud firewall apply-to-resource --type label_selector --label-selector 'kubernetes-control-plane-role' kubernetes-control-plane
# Optional: Create firewall ruleset for the worker nodes
hcloud firewall create --name kubernetes-worker
hcloud firewall apply-to-resource --type label_selector --label-selector 'kubernetes-worker-role' kubernetes-worker
# Create nodes
TALOS_SNAPSHOT="$(hcloud image list | grep -i talos -m1 | awk '{print $1}')"
hcloud server create --type cx33 --name node-1 --image "$TALOS_SNAPSHOT" --without-ipv4 --location nbg1 --label 'type=kubernetes-node' --label 'kubernetes-control-plane-role' --label 'kubernetes-worker-role' --ssh-key ssh-key --enable-protection rebuild,delete
hcloud server create --type cx33 --name node-2 --image "$TALOS_SNAPSHOT" --without-ipv4 --location fsn1 --label 'type=kubernetes-node' --label 'kubernetes-control-plane-role' --label 'kubernetes-worker-role' --ssh-key ssh-key --enable-protection rebuild,delete
hcloud server create --type cx33 --name node-3 --image "$TALOS_SNAPSHOT" --without-ipv4 --location hel1 --label 'type=kubernetes-node' --label 'kubernetes-control-plane-role' --label 'kubernetes-worker-role' --ssh-key ssh-key --enable-protection rebuild,delete
# Create load balancers
# The public IPv4 address is also required for the floating IPv4 address
hcloud server create --type cx23 --name load-balancer-1 --image ubuntu-24.04 --firewall accept-ssh --location nbg1 --label 'type=load-balancer' --ssh-key ssh-key --enable-protection rebuild,delete
# Create floating IPs
hcloud floating-ip create --type ipv4 --server load-balancer-1 --name load-balancer-ipv4 --description "Load balancer IPv4" --enable-protection delete
hcloud floating-ip create --type ipv6 --server load-balancer-1 --name load-balancer-ipv6 --description "Load balancer IPv6" --enable-protection delete
# Add server IPs to firewalls
export FROM_FLOATING_IPV6="--source-ips $(hcloud floating-ip list -o columns=ip,name | grep load-balancer-ipv6 | awk '{print $1}')"
export FROM_LOAD_BALANCERS="$FROM_FLOATING_IPV6 $(hcloud server list -o columns=ipv6 -o noheader -l type=load-balancer | sed 's|^|--source-ips |')"
# The IP addresses your administering the cluster from (running kubectl, k9s, talosctl)
# Here I'm using the load balancers host to run CLI commands via SSH
export FROM_ADMINISTRATION="$FROM_LOAD_BALANCERS"
export FROM_NODES="$(hcloud server list -o columns=ipv6 -o noheader -l type=kubernetes-node | sed 's|^|--source-ips |')"
export FROM_CONTROL_PLANE="$(hcloud server list -o columns=ipv6 -o noheader -l kubernetes-control-plane-role | sed 's|^|--source-ips |')"
export FROM_WORKERS="$(hcloud server list -o columns=ipv6 -o noheader -l kubernetes-worker-role | sed 's|^|--source-ips |')"
hcloud firewall add-rule --direction in --protocol tcp --port 6443 $FROM_ADMINISTRATION $FROM_NODES --description 'Kubernetes API server' internal-load-balancer
hcloud firewall add-rule --direction in --protocol tcp --port 2379-2380 $FROM_CONTROL_PLANE --description 'Kubernetes etcd' kubernetes-control-plane
hcloud firewall add-rule --direction in --protocol tcp --port 6443 $FROM_LOAD_BALANCERS --description 'Kubernetes API server' kubernetes-control-plane
hcloud firewall add-rule --direction in --protocol tcp --port 10250 $FROM_CONTROL_PLANE --description 'Kubernetes kubelet API' kubernetes-control-plane
hcloud firewall add-rule --direction in --protocol tcp --port 50000 $FROM_ADMINISTRATION $FROM_CONTROL_PLANE --description 'Talos apid' kubernetes-control-plane
hcloud firewall add-rule --direction in --protocol tcp --port 50001 $FROM_WORKERS --description 'Talos trustd' kubernetes-control-plane
# Add any protocols that you forward from the load balancer here
hcloud firewall add-rule --direction in --protocol tcp --port 80 $FROM_LOAD_BALANCERS --description 'Forwarded: HTTP' kubernetes-worker
hcloud firewall add-rule --direction in --protocol tcp --port 443 $FROM_LOAD_BALANCERS --description 'Forwarded: HTTPS' kubernetes-worker
hcloud firewall add-rule --direction in --protocol udp --port 443 $FROM_LOAD_BALANCERS --description 'Forwarded: HTTP/3' kubernetes-worker
hcloud firewall add-rule --direction in --protocol tcp --port 10250 $FROM_CONTROL_PLANE --description 'Kubernetes kubelet API' kubernetes-worker
hcloud firewall add-rule --direction in --protocol tcp --port 50000 $FROM_CONTROL_PLANE --description 'Talos apid' kubernetes-worker
# Optional: Allow ping for manual checks
hcloud firewall add-rule --direction in --protocol icmp $FROM_LOAD_BALANCERS --description 'Ping' kubernetes-control-plane
hcloud firewall add-rule --direction in --protocol icmp $FROM_LOAD_BALANCERS --description 'Ping' kubernetes-worker Note: From now on, you can expect to see errors showing up in every stage until we are done. In particular:
- The nameservers are set to IPv4 addresses in the snapshot
- Because certificate rotation is enabled, you won’t have a valid certificate for the Kubernetes API server until the CSR approver is deployed (and that’s the last step).
- Some errors appear and disappear as requests are sent to pending components
Load balancing using HAProxy
As Hetzner Cloud load balancers don’t support IPv6 targets (nor UDP) we’ll create our own load balancers. Here I am using HAProxy with Docker compose for a simple and reliable setup.
The following sections should be followed on each of your load balancers (via SSH).
Configure floating IP
To use the floating IPs we assigned, we need to add them to the network configuration:
# Beware this requires access to the Hetzner API
# Alternatively copy the floating IPs (including netmask) to the variables or directly in the configuration file
export FLOATING_IPV4="$(hcloud floating-ip list -o columns=ip,name | grep load-balancer-ipv4 | awk '{print $1"/32"}')"
export FLOATING_IPV6="$(hcloud floating-ip list -o columns=ip,name | grep load-balancer-ipv6 | awk '{print $1}' | sed 's|/64|1/64|')"
sudo sed -i "/^ addresses:/a\ - $FLOATING_IPV4\n - $FLOATING_IPV6" /etc/netplan/50-cloud-init.yaml
# Review content
sudo cat /etc/netplan/50-cloud-init.yaml
# Confirm settings by pressing enter
sudo netplan try Internal load balancer
The internal load balancer just forwards the Kubernetes API (HTTPS via IPv6). The HTTP health check can be enabled after the cluster has been bootstrapped.
# 'hcloud' requires API access; alternatively copy IP address
export NODE1_IPV6="$(hcloud server ip -6 node-1)"
cat > internal-haproxy.cfg <<EOF
global
log stdout format raw local0
maxconn 4096
defaults
log global
mode tcp
option tcplog
option dontlognull
timeout connect 5s
timeout client 30s
timeout server 30s
retries 3
option redispatch
frontend kubernetes-api
bind [::]:6443
mode tcp
default_backend kubernetes-api-server
backend kubernetes-api-server
mode tcp
balance roundrobin
timeout check 3s
default-server check inter 3s downinter 1s fall 2
server node-1 [$NODE1_IPV6]:6443
EOF It’s important to only add / uncomment new servers when they have fully joined the cluster. Otherwise requests may hit unhealthy nodes.
External load balancer
The external load balancer will forward traffic from users (HTTP via IPv4 & IPv6). You can add more front- and backends here if you want to load balance additional services.
# 'hcloud' requires API access; alternatively copy IP addresses
export NODE1_IPV6="$(hcloud server ip -6 node-1)"
export NODE2_IPV6="$(hcloud server ip -6 node-2)"
export NODE3_IPV6="$(hcloud server ip -6 node-3)"
cat > external-haproxy.cfg <<EOF
global
log stdout format raw local0
maxconn 10000
defaults
log global
option dontlognull
timeout connect 5s
timeout client 60s
timeout server 60s
retries 3
option redispatch
frontend http
bind *:80
mode http
option httplog
default_backend kubernetes_http
frontend https
bind *:443
mode tcp
option tcplog
default_backend kubernetes_https
frontend https3
bind *:443
mode udp
default_backend kubernetes_https3
backend kubernetes_http
mode http
balance roundrobin
timeout check 3s
default-server check inter 3s downinter 1s fall 2
server node-1 [$NODE1_IPV6]:80
server node-2 [$NODE2_IPV6]:80
server node-3 [$NODE3_IPV6]:80
backend kubernetes_https
mode tcp
balance roundrobin
timeout check 3s
default-server check inter 3s downinter 1s fall 2
server node-1 [$NODE1_IPV6]:443
server node-2 [$NODE2_IPV6]:443
server node-3 [$NODE3_IPV6]:443
backend kubernetes_https3
mode udp
# QUIC connections must stay on the same backend
balance source
timeout check 3s
default-server check inter 3s downinter 1s fall 2
server node-1 [$NODE1_IPV6]:443
server node-2 [$NODE2_IPV6]:443
server node-3 [$NODE3_IPV6]:443
EOF Docker compose
Install Docker compose using the official installation instructions.
I am using Docker Hardened Images here.
To pull them, you need to log into your Docker account using docker login dhi.io using a personal access token (public read-only permission) as the password.
Alternatively, you can use the official HAProxy image.
# Use a PAT as the password
docker login dhi.io
cat > compose.yaml <<EOF
services:
external-haproxy:
image: dhi.io/haproxy:3.3.4-debian13
command:
- -f
- /usr/local/etc/haproxy
network_mode: host
volumes:
- ./external-haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:ro
restart: always
read_only: true
internal-haproxy:
image: dhi.io/haproxy:3.3.4-debian13
command:
- -f
- /usr/local/etc/haproxy
network_mode: host
volumes:
- ./internal-haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:ro
restart: always
read_only: true
EOF
docker compose up -d Talos machine configuration
Talos is configured using machine configuration. In the future Talos will use separate configuration resources instead. I split the configuration into multiple patches for clarity.
Note on KubePrism
If you wish to use KubePrism, remove no-kube-prism.yaml from the patches folder and change k8sServiceHost and k8sServicePort as described in the Cilium configuration.
I have disabled KubePrism for three reasons:
- Even though the Hetzner Cloud instances officially don’t have an IPv4 address, they are assigned a utility IPv4 address.
This IPv4 is picked up and preferred by the Talos discovery service, causing
talosctl healthto fail. - Since the deprecation of the Kubernetes discovery feature, the only two alternatives are:
- Using SideroLabs’ official discovery service at “https://discovery.talos.dev/” (default)
- Getting a license to self-host the discovery service
- The discovery service lags behind the actual state of the cluster, showing removed nodes for some time.
As KubePrism doesn’t properly work with my infrastructure, I have opted to disable it. On top of that there is no true self-hosted free option and the benefits of using KubePrism are not essential.
mkdir patches
cat > patches/certificate-rotation.yaml <<EOF
machine:
kubelet:
extraArgs:
rotate-server-certificates: true
EOF
cat > patches/cilium.yaml <<EOF
machine:
features:
hostDNS:
# Doesn't work with Cilium CNI
forwardKubeDNSToHost: false
cluster:
network:
cni:
name: none
proxy:
disabled: true
EOF
cat > patches/talos-ccm.yaml <<EOF
machine:
features:
kubernetesTalosAPIAccess:
enabled: true
allowedRoles:
- os:reader
allowedKubernetesNamespaces:
- kube-system
kubelet:
extraArgs:
cloud-provider: external
EOF
cat > patches/hetzner-ntp.yaml <<EOF
machine:
time:
servers:
- ntp1.hetzner.de
- ntp2.hetzner.com
- ntp3.hetzner.net
- time.cloudflare.com
EOF
cat > patches/ipv6-single-stack.yaml <<EOF
machine:
kubelet:
nodeIP:
validSubnets:
- 2000::/3
extraConfig:
address: "::"
# clusterDNS: []
cluster:
# IPAM is handled by Talos CCM, so the subnets are ignored. Removes the default IPv4 settings.
network:
podSubnets:
- "fd40:10::/96"
serviceSubnets:
- "fd40:10:100::/112"
apiServer:
extraArgs:
bind-address: "::"
controllerManager:
extraArgs:
bind-address: "::"
# IPAM is handled by Talos CCM, so the mask size is ignored.
node-cidr-mask-size-ipv6: "112"
# Disable Node IPAM controller as we use Talos CCM for the pod network for each node.
controllers: "*,-node-ipam-controller"
scheduler:
extraArgs:
bind-address: "::"
etcd:
advertisedSubnets:
- 2000::/3
listenSubnets:
- 2000::/3
# extraArgs:
# listen-metrics-urls: "http://[::]:2381"
EOF
cat > patches/nat64.yaml <<EOF
machine:
network:
nameservers:
# Public NAT64: https://nat64.net
- 2a00:1098:2b::1
- 2a01:4f8:c2c:123f::1
- 2a01:4f9:c010:3f02::1
EOF
cat > patches/no-kube-prism.yaml <<EOF
machine:
features:
kubePrism:
# Requires discovery
enabled: false
cluster:
discovery:
# Discovery prefers IPv4 and requires public service (or self-hosting with business license)
enabled: false
EOF
CLUSTER_NAME="talos-ipv6-single-stack"
export NODE1_IPV6="$(hcloud server ip -6 node-1)"
export FLOATING_IPV6="$(hcloud floating-ip list -o columns=ip,name | grep load-balancer-ipv6 | awk '{print $1}' | sed 's|/64|1|')"
ENDPOINT="https://[$FLOATING_IPV6]:6443"
talosctl gen secrets
# Generate configuration based on secrets and all patches
talosctl gen config --with-secrets secrets.yaml "$(ls patches | sed 's|^|--config-patch @patches/|')" "$CLUSTER_NAME" "$ENDPOINT"
cp talosconfig ~/.talos/config
talosctl config endpoint "$NODE1_IPV6"
talosctl config node "$NODE1_IPV6"
# The '--insecure' flag is required when applying configuration for the first time, as there is no established trust yet
talosctl apply-config --insecure -n "$NODE1_IPV6" --file controlplane.yaml
# Wait for 'KUBELET' status to be 'Healthy' on the dashboard (then exit using CTRL-C)
talosctl dashboard
talosctl bootstrap
# Wait until messages like "created /v1/Service/kube-dns" appear; ignore "remote error: tls: internal error" messages
talosctl dashboard
talosctl kubeconfig
# Remove control-plane taint, if you want to use the control-plane node as a worker as well
kubectl taint node node-1 node-role.kubernetes.io/control-plane:NoSchedule- At this point you should be able to access the API server e.g. kubectl get nodes
Talos cloud controller manager
The Talos CCM acts as the IPAM (IP address management) controller and assign each node a pod CIDR (IP range; /80) that fits in the nodes netmask (/64). This allows any router to understand to which node the traffic for any given pod belongs without NAT.
cat > talos-ccm.yaml <<EOF
enabledControllers:
- cloud-node
- node-ipam-controller
extraArgs:
- --allocate-node-cidrs
- --cidr-allocator-type=CloudAllocator
- --node-cidr-mask-size-ipv6=80
daemonSet:
enabled: true
EOF
helm upgrade -i talos-cloud-controller-manager oci://ghcr.io/siderolabs/charts/talos-cloud-controller-manager -n kube-system -f talos-ccm.yaml The Talos cloud controller manager pod should be running with only the CoreDNS pods being pending: kubectl get pods -A
Cilium
I use Cilium as the CNI with kube-proxy replacement and GatewayAPI enabled.
export FLOATING_IPV6="$(hcloud floating-ip list -o columns=ip,name | grep load-balancer-ipv6 | awk '{print $1}' | sed 's|/64|1|')"
cat > cilium.yaml <<EOF
operator:
replicas: 1
# Use IPv6
ipv4:
enabled: false
ipv6:
enabled: true
k8s:
requireIPv4PodCIDR: false
requireIPv6PodCIDR: true
enableIPv6Masquerade: false
enableIPv4Masquerade: false
# Talos specific
# see: https://docs.siderolabs.com/kubernetes-guides/cni/deploying-cilium
ipam:
mode: kubernetes
cgroup:
autoMount:
enabled: false
hostRoot: /sys/fs/cgroup
securityContext:
capabilities:
ciliumAgent:
- CHOWN
- KILL
- NET_ADMIN
- NET_RAW
- IPC_LOCK
- SYS_ADMIN
- SYS_RESOURCE
- DAC_OVERRIDE
- FOWNER
- SETGID
- SETUID
cleanCiliumState:
- NET_ADMIN
- SYS_ADMIN
- SYS_RESOURCE
# Performance
routingMode: native
loadBalancer:
acceleration: native
# GatewayAPI
gatewayAPI:
enabled: true
enableAlpn: true
enableAppProtocol: true
# kube-proxy replacement
# If enabled, requires k8sServiceHost and k8sServicePort
kubeProxyReplacement: true
# When using KubePrism (a Talos feature - see machine config)
#k8sServiceHost: localhost
#k8sServicePort: 7445
# Otherwise
k8sServiceHost: $FLOATING_IPV6
k8sServicePort: 6443
EOF
helm repo add cilium https://helm.cilium.io/
# IMPORTANT: Do not change Cilium version until the Talos kernel is updated to a patched version
# see: https://github.com/cilium/cilium/issues/44216
# see: https://github.com/siderolabs/talos/issues/12331
helm upgrade -i cilium cilium/cilium -n kube-system -f cilium.yaml --version 1.18.7 At this point the node should have no taints (except node-role.kubernetes.io/control-plane:NoSchedule if you haven’t removed it).
All pods should be running: kubectl get pods -A
CSR approver
Because we enabled kubelet certificate rotation in the machine configuration, we need to approve certificate signing requests (CSRs). We’ll do that in an automated way, since the certificates are rotated regularly.
cat > kubelet-csr-approver.yaml <<EOF
replicas: 1
providerRegex: '^node-[0-9]+$'
# This requires hostnames (e.g. 'node-1') to have an DNS entry
bypassDnsResolution: true
EOF
helm repo add kubelet-csr-approver https://postfinance.github.io/kubelet-csr-approver
helm upgrade -i kubelet-csr-approver kubelet-csr-approver/kubelet-csr-approver -n kube-system -f kubelet-csr-approver.yaml That’s it for the first node and cluster bootstrap!
Now you should have a working IPv6 Talos cluster. Run talosctl health for a quick check
Joining nodes
Beware that newly provisioned nodes need to be added to the firewall rulesets. To join the remaining nodes, follow this process:
export NODE_NAME="<insert-name-here>"
export NODE_IPV6="$(hcloud server ip -6 $NODE_NAME)"
# Use 'worker.yaml' for non-control-plane nodes
talosctl apply-config --insecure --nodes $NODE_IPV6 --file controlplane.yaml
# Remove control-plane taint, if you want to use the control-plane node as a worker as well
kubectl taint node $NODE_NAME node-role.kubernetes.io/control-plane:NoSchedule-
# If the node is a control-plane, add it to the internal load balancer
echo " server $NODE_NAME [$NODE_IPV6]:6443" >> internal-haproxy.cfg
docker compose restart internal-haproxy Run talosctl etcd members to confirm that the nodes joined.
Do not add the node to the load balancer before it has successfully joined the cluster.
Troubleshooting
If you are running into issues:
- Consider whether the problem is temporary
- During each stage of the bootstrapping process you will see errors up until
csr-approveris installed
- During each stage of the bootstrapping process you will see errors up until
- Check
talosctl services - Check
talosctl etcd members - Check
talosctl get nodeip - Check
talosctl logs etcd - Check
talosctl logs kubelet - Check
talosctl dashboard - Check
talosctl health - Read the official troubleshooting guide
- Check traffic e.g.
talosctl pcap -o - | tcpdump -r - 'tcp port 2379' - On the load-balancer, check the HAProxy logs:
docker compose logs - Rebuild your nodes and try again