Beyond the Install: Architecting a Hardened RKE2 + Rancher Platform on Rocky Linux
Kubernetes is notoriously easy to install but significantly harder to "wire" correctly—especially when you factor in High Availability (HA) networking, certificate management, and a production-grade backup strategy from day one.
This week, I stood up a new Kubernetes platform built on Rocky Linux using RKE2 and Rancher, with Kasten K10 integrated for data protection. Here is a look at the architecture and the hardening steps taken to get it production-ready.
Architecture Highlights
- RKE2: Leveraging RKE2 for an upstream-aligned Kubernetes distribution with embedded
etcd. - HA Networking: HAProxy provides a Virtual IP (VIP) for both the API server (6443) and ingress traffic (80/443).
- Ingress Path: RKE2 ingress-nginx sits behind a NodePort, which is fronted by the HAProxy layer.
- Centralized Management: Rancher handles the cluster lifecycle and RBAC.
- Internal PKI: A custom Root CA was used for all externally exposed services to maintain a private, trusted chain.
The Hardening "Gotchas"
Setting up a cluster is one thing; hardening it requires navigating some specific friction points:
- TLS Identity: I replaced the default
dynamiclistenercertificates with internally signed TLS. - Ingress Clean-up: I removed
cert-manageringress shims to prevent secret regeneration conflicts that often occur when mixing internal CA logic with automated controllers. - Access Control: Implemented explicit RBAC-backed access to Kasten using Kubernetes bearer tokens rather than broad permissions.
- Scoped Protection: Backup policies are scoped at the namespace level, moving away from broad cluster-wide defaults to ensure granular recovery points.
The Storage & Backup Layer
For this phase, the storage layer uses an NFS external provisioner as the default StorageClass. Because this is a non-CSI backend, Kasten is currently configured for file-level backups.
The Takeaway
The real work of platform engineering isn't in the yum install. It’s in the integration: making sure the certificates don't conflict, the load balancer correctly tracks the ingress nodes, and the backup policy knows exactly what it’s looking at.
Next Phase: Validating restore workflows, evaluating CSI-backed storage to enable snapshot-based protection, and S3 bucket integration testing.