> ./exec Devops_cloud.sh — ARTICLE
Kubernetes on Hetzner Cloud: The Complete Terraform Setup Guide
MarcusStraight talk up front: Any mid-market company with 50 to 250 employees migrating workloads to Kubernetes routinely pays three to five times more on AWS EKS or Azure AKS than on Hetzner Cloud, with no meaningful added value for the typical application load at that company size. Choosing Hetzner is not a compromise. It is the architecturally correct decision.
Martin Fowler describes in Patterns of Enterprise Application Architecture (2002) a principle more relevant today than ever: complexity must never be introduced without proportional benefit. AWS and Azure deliver that complexity in industrial quantities. Hetzner delivers a clean, European, GDPR-compliant API with predictable costs. That is sufficient for nine out of ten mid-market projects.
This guide shows how to set up a production-ready Kubernetes cluster on Hetzner Cloud with Terraform. No Hello World. No "it depends." A setup that goes live on Monday.
The Architecture We Are Building
Three control-plane nodes (HA), three worker nodes, one Hetzner load balancer, a private network via Hetzner Private Network, Cilium as the CNI, Hetzner CSI Driver for block storage, and k3s as the Kubernetes distribution.
Why k3s and not kubeadm? Sam Newman writes in Building Microservices (2nd ed., 2021) about "operational overhead" as a hidden architectural decision. kubeadm is powerful, but the maintenance burden for a DevOps team of two to four people is substantial: etcd backups, certificate rotation, node lifecycle management. k3s abstracts most of that away, is CNCF-certified, and has been running in thousands of production environments since 2019. The choice is clear.
Why Cilium and not Flannel? Cilium is built on eBPF and delivers NetworkPolicy enforcement, integrated observability via Hubble, and a measured 20–30% improvement in network performance over Flannel on identical hardware (Cilium Benchmark Report, 2023). Flannel is acceptable for learning purposes. Not for production.
Terraform: Provider and Remote State
terraform {
required_providers {
hcloud = {
source = "hetznercloud/hcloud"
version = "~> 1.47"
}
}
backend "s3" {
# Hetzner Object Storage, S3-compatible
endpoint = "https://fsn1.your-objectstorage.com"
bucket = "tf-state-k8s"
key = "kubernetes/terraform.tfstate"
region = "eu-central-1"
skip_credentials_validation = true
skip_metadata_api_check = true
skip_region_validation = true
force_path_style = true
}
}
provider "hcloud" {
token = var.hcloud_token
}
Remote State is not optional. Anyone versioning the Terraform state locally or in Git has a serious security problem: the state file contains secrets in plaintext. Hetzner Object Storage as an S3 backend costs a few cents per month and eliminates the problem entirely.
Network and SSH Infrastructure
resource "hcloud_ssh_key" "k8s_deploy" {
name = "k8s-deploy"
public_key = file("~/.ssh/k8s_hetzner.pub")
}
resource "hcloud_network" "k8s_network" {
name = "k8s-private"
ip_range = "10.0.0.0/16"
}
resource "hcloud_network_subnet" "k8s_subnet" {
network_id = hcloud_network.k8s_network.id
type = "cloud"
network_zone = "eu-central"
ip_range = "10.0.1.0/24"
}
The private network is the central security path. Kubernetes-internal communication runs exclusively over this subnet. The load balancer is the only entry point from the outside. Direct SSH access to worker nodes must be blocked in production, only a bastion host or an equivalent jump service is permitted.
Control Plane Nodes: HA Across Three Locations
resource "hcloud_server" "control_plane" {
count = 3
name = "k8s-cp-${count.index + 1}"
server_type = "cax21" # Ampere ARM64, 4 vCPU, 8 GB RAM
image = "ubuntu-24.04"
location = element(["nbg1", "fsn1", "hel1"], count.index)
ssh_keys = [hcloud_ssh_key.k8s_deploy.id]
network {
network_id = hcloud_network.k8s_network.id
ip = "10.0.1.${count.index + 10}"
}
labels = {
role = "control-plane"
environment = var.environment
}
}
Three control-plane nodes, distributed across Nuremberg, Falkenstein, and Helsinki. This is not paranoia, it is the minimum for HA. A single-node control plane is not acceptable, even for small clusters. The embedded etcd in k3s requires an odd quorum count. Three is the pragmatic optimum for mid-market projects.
The choice of cax21 (ARM64/Ampere) is deliberate: Hetzner's ARM instances deliver the same price-to-performance ratio as AWS Graviton, at Hetzner prices. Go, Rust, Python, and Node.js run natively on ARM64. The only hard constraint: proprietary binaries without an ARM64 build. That is an architecture problem, not a Hetzner problem.
Worker Nodes and Load Balancer
resource "hcloud_server" "worker" {
count = 3
name = "k8s-worker-${count.index + 1}"
server_type = "cax31" # Ampere ARM64, 8 vCPU, 16 GB RAM
image = "ubuntu-24.04"
location = element(["nbg1", "fsn1", "hel1"], count.index)
ssh_keys = [hcloud_ssh_key.k8s_deploy.id]
network {
network_id = hcloud_network.k8s_network.id
ip = "10.0.1.${count.index + 20}"
}
labels = {
role = "worker"
environment = var.environment
}
}
resource "hcloud_load_balancer" "k8s_ingress" {
name = "k8s-ingress"
load_balancer_type = "lb11"
location = "nbg1"
}
resource "hcloud_load_balancer_network" "k8s_ingress_network" {
load_balancer_id = hcloud_load_balancer.k8s_ingress.id
network_id = hcloud_network.k8s_network.id
}
k3s Bootstrap: Initializing the Cluster
After terraform apply, the nodes are initialized via cloud-init. The first control-plane node starts as the cluster initiator; the remaining ones join as servers, not as agents:
# First Control Plane Node
curl -sfL https://get.k3s.io | sh -s - server \
--cluster-init \
--tls-san "${LOAD_BALANCER_IP}" \
--disable traefik \
--disable servicelb \
--flannel-backend=none \
--disable-network-policy \
--secrets-encryption \
--node-ip="${PRIVATE_IP}"
# Additional Control Plane Nodes
curl -sfL https://get.k3s.io | sh -s - server \
--server "https://${FIRST_CP_PRIVATE_IP}:6443" \
--token "${K3S_TOKEN}" \
--tls-san "${LOAD_BALANCER_IP}" \
--disable traefik \
--disable servicelb \
--flannel-backend=none \
--disable-network-policy \
--secrets-encryption
--disable traefik and --disable servicelb are non-negotiable: we replace both with the NGINX Ingress Controller and the Hetzner Cloud Controller Manager. --secrets-encryption enables etcd encryption, it is disabled by default in k3s, leaving Kubernetes secrets in plaintext on disk. That is not an acceptable state.
Cilium and Hetzner Cloud Controller Manager
# Cilium via Helm
helm repo add cilium https://helm.cilium.io/
helm install cilium cilium/cilium \
--namespace kube-system \
--set operator.replicas=2 \
--set kubeProxyReplacement=true \
--set hubble.enabled=true \
--set hubble.relay.enabled=true
# Hetzner Cloud Controller Manager
kubectl apply -f \
https://github.com/hetznercloud/hcloud-cloud-controller-manager/releases/latest/download/ccm-networks.yaml
# Hetzner CSI Driver
kubectl apply -f \
https://raw.githubusercontent.com/hetznercloud/csi-driver/main/deploy/kubernetes/hcloud-csi.yml
The Hetzner CCM is the key to native integration: it automatically provisions load balancers for Service resources of type LoadBalancer. The CSI driver enables dynamic PersistentVolume creation via Hetzner Block Storage, NVMe-backed, up to 10.000 IOPS per volume.
Anti-Patterns I See Every Day
1. Cluster Autoscaler from day one. Auto-scaling solves no problem that a mid-market cluster typically has. It creates new ones: unpredictable costs, race conditions during node initialization, hard-to-debug states. Start with fixed node groups. Scale manually via Terraform. Introduce the Cluster Autoscaler when concrete utilization data is available, not before.
2. Everything in the default namespace.
Namespace isolation is not a luxury. It is the foundation for RBAC, NetworkPolicies, and resource quotas. Anyone deploying production workloads to default has no tenant separation, not even de facto. One namespace per team or service is the minimum.
3. kubectl apply directly to production.
John Vlissides et al. describe in Design Patterns (GoF, 1994) why uncontrolled direct mutations corrupt a system's design. This applies to software objects just as much as to cluster state. GitOps via ArgoCD or Flux is not an optional add-on. It is the only answer to "what is currently running on my cluster, and why?"
4. No PodDisruptionBudget. Node drains during upgrades or maintenance windows can completely interrupt deployments without a PDB. Two lines of YAML prevent unplanned downtime. There is no justification for omitting those two lines.
Pre-Go-Live Production Checklist
- Remote State (Hetzner Object Storage) configured, state locking enabled
--secrets-encryptionset on all control-plane nodes- Daily etcd snapshots to Object Storage (
k3s etcd-snapshot save) - Cilium NetworkPolicies: default-deny between namespaces
- cert-manager with Let's Encrypt installed and verified
- ArgoCD or Flux deployed, all manifests sourced from Git
- kube-prometheus-stack (Prometheus + Grafana + Alertmanager) active
- PodDisruptionBudgets defined for all critical deployments
- Resource requests and limits set for every container
- Firewall rules: worker nodes not directly reachable from the internet
- Hetzner snapshot automation enabled for node recovery
Costs: What the Comparison Actually Shows
A production-ready cluster of the described architecture (3× Control Plane cax21, 3× Worker cax31, 1× Load Balancer lb11) costs approximately €180–220/month on Hetzner, including traffic.
A comparable EKS cluster in Frankfurt (3× m6g.large control plane, 3× m6g.xlarge worker, 1× ALB) costs approximately €650–800/month on AWS, excluding data transfer costs, reserved instance discounts, and any support plan.
The 3–4× factor is real and documented. Over three years, the difference compounds to a six-figure sum that could alternatively flow into product development, engineering capacity, or reserves. That is Hetzner Cloud Consulting in concrete numbers.
Conclusion
Kubernetes on Hetzner with Terraform is not a compromise for budget-constrained teams. It is the architecturally correct answer for European mid-market businesses: GDPR-compliant, European data centers, predictable costs, no proprietary dependencies beyond a clean cloud API.
The architecture described here, k3s with HA, Cilium, Hetzner CCM and CSI, GitOps, encrypted remote state, is proven in production. It gives a DevOps team of two to four people the control they need, without generating the operational overhead that nobody budgeted for.
Recommendation: Start with this setup. Measure for three months. Then scale on the basis of real utilization data, not hyperscaler sales presentations.
Marcus is a Solution Architect at NextGen IT. He advises mid-market companies on cloud-native infrastructure, Hetzner Cloud Consulting, and Kubernetes architectures for teams of under 250 employees.
Marcus
Solution Architect
Overall architecture, ADRs, technical coherence, knowledge graphs.
Need help with Devops & Cloud?
Free initial consultation, fixed price after audit.
INIT_CONSULTATION() →