k8s

ADR-005: Centralized Ops Cluster Topology

Status

Accepted

Context

I needed to design a multi-cluster Kubernetes architecture that supports:

Common multi-cluster patterns have trade-offs:

Decision

I adopted a centralized Ops cluster topology where:

Ops Cluster (Central Command & Control):

Workload Clusters (Dev, Staging, Prod):

Key Principles:

  1. Ops cluster = authoritative source: All deployment decisions originate here
  2. Workload clusters = execution targets: They run workloads, not control planes
  3. Unidirectional flow: Ops → Workload clusters (GitOps reconciliation)
  4. Telemetry flows back: Metrics/logs/traces flow from workloads → Ops
  5. Local enforcement: Policy (Kyverno) and ingress (Envoy Gateway) run locally for low latency

Consequences

Positive

Negative

Mitigations

Alternatives Considered

1. Federated Control Planes (Argo CD in Each Cluster)

Rejected because:

2. Regional Ops Clusters (Multiple Ops Clusters)

Rejected because:

3. Managed Argo CD (Argo CD SaaS)

Rejected because:

4. GitOps Tool per Cluster (Flux, Argo CD, etc.)

Rejected because:

5. No Centralized Control (Manual kubectl)

Rejected because:

Implementation Details

Argo CD Configuration:

Observability Flow:

Network Requirements:

Security:

Disaster Recovery

Ops Cluster Outage:

Workload Cluster Outage:

Future Considerations

References