A consistent policy & compliance layer ensures platform guardrails are predictable, observable, progressive, and reversible. This document outlines how to use Kyverno (cluster runtime admission / mutation / validation) and Checkov (CI Infrastructure-as-Code scanning) under the same GitOps promotion model (App‑of‑Apps) to prevent last‑minute surprises.
In Scope: Kubernetes resource policies (Pods, Deployments, Ingress/Gateway, Secrets config), Helm chart templates (pre-commit / PR) (Checkov). Out of Scope: Runtime container behavior (handled by other controls) and cloud account CSPM (separate program).
Kyverno operates inside each cluster as an admission controller applying:
Kyverno advantages: Native K8s resources (no custom DSL needed), policy CRDs stored in Git, integrates cleanly with App‑of‑Apps for multi‑cluster rollout.
All new policies follow this state machine:
Draft (local branch) → Audit (dev) → Audit (staging) → Audit (prod) → Enforce (dev) → Enforce (staging) → Enforce (prod)
Rules:
Key spec fields:
validate:
failureAction: Audit # later switched to Enforce
failureActionOverrides: # fine-grained exceptions (optional)
- action: Audit
namespaces: ["job-runner"]
Exclude critical namespaces from initial enforcement to prevent cluster bootstrap or core service disruption:
kube-system, kube-public, kube-node-leaseImplementation patterns:
match selector restricts to application namespaces via label (e.g. team / app) instead of excluding system namespaces repeatedly.exclude block for explicit carve-outs until workloads remediated:
```yaml
match:
any:
Prefer structured exception channels, not ad-hoc disabling:
policy.exception/<rule>=approved) added via pull request with reviewer sign‑off.precondition or pattern negation.Example (skip privileged check if approved):
preconditions:
all:
- key: ""
operator: NotEquals
value: "true"
If teams need conditional resources (jobs, migrations), provide a separate Generate NetworkPolicy or allowlist policy limited by annotation.
Install Kyverno Policy Reports + report-ui to surface:
Argo CD sync ensures report-ui is versioned like policies. Grafana (optional) can scrape kyverno_policy_results_total metrics for alerting (e.g. spike after new chart release).
Operational checks:
| Check | Command | Outcome |
|——-|———|———|
| Policy CR status | kubectl get policyreports | Shows counts (pass/warn/fail) |
| Specific rule | kubectl get clusterpolicy disallow-privileged -o yaml | Confirms failureAction |
| Report UI | Browser → /kyverno/ path | Lists policies & violations |
| Phase | Focus | Actions | |——-|——-|———| | 0 | Visibility | Deploy Kyverno + policies (Audit) + report-ui | | 1 | Hygiene | Enforce low-risk (labels, resource limits) | | 2 | Security Baseline | Enforce no privileged / hostPath / forbidden capabilities | | 3 | Image Integrity | Add VerifyImages (signatures), still audit first | | 4 | Supply Chain | Enforce provenance attestations (later) |
Graduation criteria: sustained low (near-zero) audit violations for 1–2 release cycles in staging.
Checkov scans Terraform, Kubernetes manifests (Helm-rendered), and other IaC artifacts before merge:
Pipeline pattern:
helm template) for changed charts.Alignment with Kyverno:
cluster, environment labels) enforced both in CI (Checkov custom policies) and at runtime (Kyverno validate rules).Enforce in dev.| Task | Action | Tool |
|——|——–|——|
| Add new policy | Author ClusterPolicy (Audit), commit to dev values / policies dir | GitOps + Argo |
| Review violations | Open report-ui, filter by namespace / policy | UI |
| Promote to staging | Merge branch updating failureAction (still Audit) | Git |
| Enforce policy | Change failureAction: Enforce after thresholds | Git |
| Add exception | PR adding label/annotation + (optional) policy precondition tweak | Git |
| Rollback enforcement | Revert commit (Enforce → Audit) | Git |
| Monitor spike | Alert on metric delta (violations/min) | Grafana |
Adopt iteratively: visibility first (Audit), then low-risk hygiene, then privilege/security, then supply chain controls.