k8s

Frequently Asked Questions (FAQ)

Common questions about the Platform in a Box, organized by topic.

Table of Contents


General Questions

What is “Platform in a Box”?

Platform in a Box is a batteries-included, GitOps-driven foundation for operating a Kubernetes platform using the Argo CD App-of-Apps pattern. It provides a complete set of Helm charts that compose the core traffic, security, observability, data, and enablement layers so teams can onboard applications quickly with consistent guardrails.

What Kubernetes version is required?

Kubernetes v1.27+ is required, primarily for Gateway API support (v1.27+ includes stable Gateway API CRDs). Some components may work with earlier versions, but v1.27+ is recommended.

What’s the difference between the ops cluster and workload clusters?

Can I use this with a single cluster?

Yes, you can deploy everything to a single cluster. However, the architecture is designed for multi-cluster scenarios. For single cluster, you can:

What cloud providers are supported?

The platform is cloud-agnostic and works with any Kubernetes distribution. However, some components have cloud-specific integrations:


Setup & Configuration

How do I add a new cluster?

  1. Create cluster values file:
    cp charts/app-of-apps/values.ops-01.yaml charts/app-of-apps/values.new-cluster.yaml
    

    Edit and update cluster name, server endpoint, and component enablement.

  2. Create bootstrap file:
    cp argocd-bootstrap-apps/ops-01.yaml argocd-bootstrap-apps/new-cluster.yaml
    

    Update application name, cluster server, and values file reference.

  3. Register cluster in Argo CD:
    argocd cluster add <cluster-context> --name <cluster-name>
    
  4. Apply bootstrap:
    kubectl apply -f argocd-bootstrap-apps/new-cluster.yaml
    

See Getting Started Guide for detailed steps.

How do I update a component version?

Update the source.targetRevision in the component’s block in your values file:

monitoring:
  enable: true
  source:
    targetRevision: v1.2.3  # or branch name, commit SHA

Or update the chart dependency version in the component’s Chart.yaml if using a Helm dependency.

How do I enable/disable a component?

Edit the environment values file (e.g., charts/app-of-apps/values.ops-01.yaml):

# Enable a component
monitoring:
  enable: true

# Disable a component
redis:
  enable: false

After committing and pushing, Argo CD will automatically sync the changes.

How do I customize component configuration?

Each component has its own values file structure. Override settings in the environment-specific values file:

monitoring:
  enable: true
  helm:
    values: |
      prometheus:
        replicaCount: 3
      thanos:
        enabled: true

Or create a separate values file and reference it:

monitoring:
  enable: true
  helm:
    valueFiles:
      - values.monitoring-custom.yaml

What’s the difference between targetRevision: HEAD, staging, and stable?

This follows the Branching & Promotion Model.


Argo CD & GitOps

How does the App-of-Apps pattern work?

The App-of-Apps pattern uses a root Argo CD Application that manages other Applications. In this platform:

  1. Bootstrap Application (ops-01-bootstrap-apps) points to the app-of-apps chart
  2. The app-of-apps chart renders multiple Argo CD Application CRs (one per enabled component)
  3. Each child Application manages its component’s Helm chart
  4. All orchestrated from a single root, enabling consistent deployment across clusters

How do I sync applications manually?

# Sync a specific application
argocd app sync <app-name>

# Sync all applications
argocd app sync --all

# Force refresh (clears cache)
argocd app get <app-name> --hard-refresh

What are sync waves and why do I need them?

Sync waves control the order in which Argo CD syncs applications. Lower numbers sync first. This ensures dependencies are ready before dependents:

sealedSecrets:
  enable: true
  annotations:
    argocd.argoproj.io/sync-wave: "-1"  # Syncs first
certManager:
  enable: true
  annotations:
    argocd.argoproj.io/sync-wave: "0"   # Syncs after sealed-secrets
envoyGateway:
  enable: true
  annotations:
    argocd.argoproj.io/sync-wave: "1"   # Syncs after cert-manager

See Traffic Management Guide for recommended sync waves.

How do I rollback a deployment?

Since everything is Git-driven, rollback by reverting the Git commit or moving the tag:

# Revert commit
git revert <commit-sha>
git push

# Or move stable tag to previous commit
git tag -f stable <previous-commit-sha>
git push -f origin stable

Argo CD will automatically detect the change and sync to the previous state.

Can I use a different Git repository?

Yes, update the source.repoURL in:

Ensure Argo CD has access to the repository (SSH key or HTTPS credentials).


Components

How do I rotate sealed secrets keys?

  1. Generate new key pair:
    openssl req -x509 -days 3650 -nodes -newkey rsa:4096 \
      -keyout new-sealing.key \
      -out new-sealing.crt \
      -subj "/CN=sealed-secret/O=sealed-secret"
    
  2. Create new secret in cluster:
    kubectl -n sealed-secrets create secret tls new-sealing-key \
      --cert=new-sealing.crt --key=new-sealing.key
    
  3. Label as active:
    kubectl -n sealed-secrets label secret new-sealing-key \
      sealedsecrets.bitnami.com/sealed-secrets-key=active
    
  4. Remove label from old key:
    kubectl -n sealed-secrets label secret old-sealing-key \
      sealedsecrets.bitnami.com/sealed-secrets-key-
    
  5. Reseal existing secrets (if needed):
    kubeseal --cert=new-sealing.crt < secret.yaml > sealed-secret.yaml
    

See Sealed Secrets README for details.

How do I add a new domain for certificate issuance?

  1. Create DNS zone (in your DNS provider)

  2. Update cert-manager values to include new domain:
    issuers:
      newDomain: "enable"
    
  3. Create ClusterIssuer template (if needed) in charts/cert-manager/templates/certificates/

  4. Create Certificate CR for the domain:
    apiVersion: cert-manager.io/v1
    kind: Certificate
    metadata:
      name: newdomain-com
    spec:
      dnsNames:
        - "*.newdomain.com"
        - "newdomain.com"
      issuerRef:
        kind: ClusterIssuer
        name: newdomain-com-issuer
      secretName: newdomain-com-tls
    
  5. Update external-dns domain filters to include new domain

See Traffic Management Guide for details.

How do I expose a service via Envoy Gateway?

  1. Create HTTPRoute:
    apiVersion: gateway.networking.k8s.io/v1
    kind: HTTPRoute
    metadata:
      name: my-service
    spec:
      parentRefs:
        - name: envoy-public
          namespace: envoy-gateway-system
      hostnames:
        - my-service.example.com
      rules:
        - matches:
            - path:
                type: PathPrefix
                value: /
          backendRefs:
            - name: my-service
              port: 80
    
  2. Ensure Gateway exists (created by envoy-gateway chart)

  3. Ensure certificate exists for the hostname

  4. Annotate for DNS (if using external-dns):
    metadata:
      annotations:
        external-dns.alpha.kubernetes.io/hostname: my-service.example.com
    

See Envoy Gateway README for details.

What’s the difference between Audit and Enforce mode in Kyverno?

Always start with Audit mode, measure violations, remediate, then enable Enforce mode. See Compliance Guide.

How do I add a new Kyverno policy?

  1. Create policy file in charts/kyverno/templates/policies/ops/ or security/

  2. Wrap with values conditional:
    apiVersion: kyverno.io/v1
    kind: ClusterPolicy
    ...
    
  3. Add to values file:
    opsPolicies:
      newPolicy:
        enabled: true
        mode: Audit  # Start with Audit
    
  4. Document rationale in policy annotations

See Kyverno README for details.

How do I add a new Prometheus alert rule?

  1. Create or edit rule file in charts/monitoring/configs/alert-rules/

  2. Update Alert Catalog with the new alert

  3. Test the rule:

    promtool check rules configs/alert-rules/my-alerts.yaml
    

Operations & Maintenance

How often should I update components?

Follow the Promotion Flow: dev → staging → stable.

How do I backup the platform configuration?

Everything is in Git, so your Git repository is the backup. Additionally:

How do I scale components?

Update replica counts in component values:

monitoring:
  enable: true
  helm:
    values: |
      prometheus:
        replicaCount: 3  # Scale Prometheus

Or scale directly (not recommended, will be overridden by GitOps):

kubectl scale deployment <deployment> -n <namespace> --replicas=3

How do I check platform health?

# Check all Argo CD applications
kubectl get applications -n argocd

# Check component pods
kubectl get pods -A | grep -E "monitoring|logging|envoy|kyverno"

# Check via Argo CD UI
# Navigate to Argo CD and review application health status

How do I add a new environment (e.g., QA)?

  1. Create values file: charts/app-of-apps/values.qa-01.yaml
  2. Create bootstrap file: argocd-bootstrap-apps/qa-01.yaml
  3. Register cluster in Argo CD
  4. Apply bootstrap: kubectl apply -f argocd-bootstrap-apps/qa-01.yaml

Follow the same pattern as existing environments.

How do I migrate from traditional Helm to this GitOps approach?

  1. Export existing Helm releases:
    helm list -A
    helm get values <release> -n <namespace> > values.yaml
    
  2. Create equivalent values files in this repository structure

  3. Disable old Helm releases (don’t delete yet)

  4. Bootstrap Argo CD and let it take over

  5. Verify everything works, then remove old Helm releases

See Getting Started Guide for the GitOps setup process.


Troubleshooting

Why is my application stuck in “Syncing” state?

Common causes:

See Troubleshooting Guide for detailed diagnosis.

Why are certificates not issuing?

Common causes:

See Troubleshooting Guide for diagnosis steps.

Why are DNS records not being created?

Common causes:

See Troubleshooting Guide for solutions.

Why is Prometheus not scraping my service?

Common causes:

See Troubleshooting Guide for diagnosis.

Why is Elasticsearch cluster status RED?

Common causes:

See Troubleshooting Guide for solutions.

How do I debug Argo CD sync issues?

# Check application status
kubectl describe application <app-name> -n argocd

# Check controller logs
kubectl logs -n argocd -l app.kubernetes.io/name=argocd-application-controller

# Force refresh
argocd app get <app-name> --hard-refresh

See Troubleshooting Guide for comprehensive debugging steps.


Best Practices

Should I use sealed secrets or external secrets operator?

This platform uses Sealed Secrets because:

External Secrets Operator is an alternative if you prefer cloud-native secret management (AWS Secrets Manager, HashiCorp Vault, etc.).

Should I use Gateway API or Ingress?

Use Gateway API (Envoy Gateway) for:

Use Ingress (NGINX) only for:

The platform includes both, but Gateway API is preferred.

How should I organize my Git repository?

Recommended structure:

k8s/
├── charts/              # Helm charts
│   ├── app-of-apps/
│   ├── monitoring/
│   └── ...
├── argocd-bootstrap-apps/  # Bootstrap applications
├── docs/                # Documentation
└── README.md

Keep environment-specific values in charts/<chart>/values.<env>.yaml.

Should I use tags or branches for production?

Use tags (stable) for production because:

Branches can move, making it harder to track what’s deployed. Tags provide deterministic deployments.

How do I handle secrets for multiple clusters?

Options:

  1. Same key, different scopes: Use namespace-scoped sealed secrets
  2. Different keys per cluster: Generate cluster-specific sealing keys
  3. Centralized key with cluster labels: Use cluster-wide scope with cluster-specific labels

Recommendation: Use cluster-specific keys for better isolation and security.

Should I enable all components in all clusters?

No. Enable components based on cluster role:

See What Runs Where for the component matrix.

How do I handle certificate renewal?

Cert-manager handles automatic renewal. However:

Certificates are automatically renewed 30 days before expiration.

How do I ensure high availability?

See Argo CD Best Practices for HA configuration.

How do I monitor the platform itself?

The monitoring stack monitors itself:

Set up alerts for:


Still Have Questions?

If you’re stuck or need further help, feel free to reach out directly.