Experimenting with Kubernetes in Proxmox
I ran multiple RKE2 nodes in proxmox and realized that it was a waste of resources. What's the point of having 1 control plane and two worker nodes if it's on the same hardware?
I moved to a single VM running k3s and another vm with postgres. It's was an interesting journey, from getting it to be completely reproducable with terraform and ansible to setting up argocd to have a Vercel like experience with my side projects.
The Setup
So here's the gist. I have a single bare metal server on OVHCloud (8 vCPUs, 32gb RAM) running Proxmox. On it, I spin up two VMs: one for k3s (8 vCPUs, 22GB RAM, 250GB disk) and one for Postgres (2 vCPUs, 8GB RAM, 500GB disk). Everything sits on a private network (10.1.0.0/24) behind a WireGuard VPN so nothing is directly exposed. SSH is locked down to the VPN interface only, passwords are disabled, and the public zone on the firewall is set to DROP by default. The only things allowed through are HTTP, HTTPS, and the WireGuard port.
Making It Reproducible with Terraform
The whole VM provisioning is done through Terraform using the bpg/proxmox provider. It starts by downloading a Debian 12 cloud image, creating a template VM from it, and then cloning that template for both VMs. Cloud-init handles the initial setup like SSH keys and package installation.
resource "proxmox_virtual_environment_vm" "k3s" {
name = "k3s-vm"
node_name = var.pm_node
clone {
vm_id = 9000
full = true
}
cpu {
cores = var.k3s_cores
type = "host"
}
memory {
dedicated = var.k3s_memory_mb
}
disk {
datastore_id = var.pm_storage
interface = "scsi0"
size = var.k3s_disk_gb
}
initialization {
datastore_id = var.pm_storage
user_data_file_id = proxmox_virtual_environment_file.secure_cloud_init.id
ip_config {
ipv4 {
address = "${var.k3s_ip}/24"
gateway = var.vm_gateway
}
}
}
}The cloud-init config disables password auth and only allows SSH key access. It also installs qemu-guest-agent so Terraform can talk to the VM properly.
Ansible Does the Rest
Once the VMs are up, Ansible takes over. There are essentially three phases:
Phase 1 sets up the Proxmox host itself. WireGuard, firewalld zones, the private bridge (vmbr1), and port forwarding from 80/443 on the public interface to the MetalLB VIP inside the cluster. It also creates a Terraform API user so that provisioning is fully automated.
Phase 2 sets up the Postgres VM. It installs PostgreSQL 18 from the official repo, configures TLS, tunes it for 8GB of RAM, and locks down pg_hba.conf so only the k3s VM can connect:
hostssl all all 10.1.0.10/32 scram-sha-256
I also run a private Docker registry on this same VM. Caddy sits in front as a reverse proxy handling TLS termination, and the registry itself listens on localhost. The firewall only allows the k3s VM to reach both Postgres (5432) and the registry (5000).
Phase 3 installs k3s on the other VM. It disables the built-in Traefik and ServiceLB since I'm using my own, loads the required kernel modules, and configures the private registry so k3s knows where to pull images from:
mirrors:
"10.1.0.15:5000":
endpoint:
- "https://10.1.0.15:5000"
configs:
"10.1.0.15:5000":
tls:
insecure_skip_verify: trueThe GitOps Part
This is where ArgoCD comes in and makes everything feel like Vercel. The Ansible playbook bootstraps ArgoCD, generates an SSH deploy key, and creates an ApplicationSet that watches a directory in my Git repo:
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
spec:
generators:
- git:
repoURL: git@github.com:org/infra.git
revision: HEAD
directories:
- path: helm-charts/*
template:
spec:
destination:
namespace: "{{path.basename}}-system"
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=trueEvery directory under helm-charts/ becomes an ArgoCD application automatically. Push a new chart, ArgoCD picks it up, creates the namespace, and deploys it. Delete a chart, ArgoCD cleans it up. It auto-prunes and self-heals, so if someone manually changes something in the cluster it gets reverted to match what's in Git.
I also have ArgoCD Image Updater running which watches my private registry for new image tags. So the workflow is: push code, CI builds and pushes an image, Image Updater detects it and updates the deployment, ArgoCD syncs. No kubectl, no manual deploys. Just push and forget.
What's Running in the Cluster
The infrastructure layer has a few core components all deployed as Helm charts through ArgoCD:
- MetalLB for load balancing. Since this is bare metal there's no cloud provider to hand out IPs, so MetalLB advertises a single VIP (10.1.0.100) via L2. The Proxmox host forwards 80/443 to this IP.
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: first-pool
spec:
addresses:
- 10.1.0.100-10.1.0.100- Traefik as the ingress controller, sitting behind the MetalLB VIP. It handles routing and TLS termination for all the apps.
- cert-manager with a Let's Encrypt ClusterIssuer using DNS-01 challenges through Cloudflare. I could not get this to work with HTTP-01 challenges and I'm so glad that an alternative exists.
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
solvers:
- dns01:
cloudflare:
apiTokenSecretRef:
name: cloudflare-api-token
key: api-token- Sealed Secrets for managing secrets in Git. I can encrypt secrets locally with `kubeseal`, commit them, and the controller decrypts them in the cluster. No secrets in plain text anywhere in the repo
- GitHub Actions Runner Controller running self-hosted runners with Docker-in-Docker so CI/CD runs inside the cluster itself.
The Traffic Flow
It goes something like this:
Internet -> Proxmox host (ports 80/443)
-> firewalld forwards to MetalLB VIP (10.1.0.100)
-> Traefik picks up the request
-> Routes to the right app based on hostname
-> App talks to Postgres over the private networkEverything internal uses Kubernetes service DNS. So an app that needs to talk to the auth service just hits auth.auth-system.svc.cluster.local:3000 instead of going through the public ingress.
Was It Worth It?
Honestly yeah. Going from 3 RKE2 nodes (all on the same hardware lol) to a single k3s node freed up a ton of resources. The GitOps setup with ArgoCD means I barely touch kubectl anymore. I push code, it builds, it deploys. If something breaks, I check the ArgoCD dashboard or just look at Git history. Rolling back is just a git revert.
The whole thing is reproducible too. If my server dies tomorrow, I run Terraform to create the VMs, Ansible to configure them, and ArgoCD pulls everything from Git. The only manual step is adding the deploy key to GitHub and restoring the Sealed Secrets encryption key from backup.