March 12, 2026 · 8 min read

Multi-Cluster Kubernetes: Patterns, Tools, and When to Use Them

Multi-cluster Kubernetes patterns explained: when to use them, Cluster API, ArgoCD ApplicationSets, Cilium ClusterMesh, and cross-cluster load balancing.

Multi-cluster Kubernetes solves problems that can’t be solved within a single cluster — but it introduces a layer of operational complexity that many teams underestimate. Before adopting multi-cluster, you need to be clear on which problem you’re solving. Multi-cluster for the wrong reasons leads to twice the operational burden with half the benefit.

This guide covers the legitimate reasons for multi-cluster, the architectural patterns that work at scale, and the tooling ecosystem (Cluster API, ArgoCD ApplicationSets, Cilium ClusterMesh, Istio multi-primary) that makes it manageable.

When Multi-Cluster Actually Makes Sense

Availability and blast radius reduction: A cluster failure affects all workloads in that cluster. Multiple clusters in multiple regions or availability zones limit the scope of a failure. This matters when you have an SLA that a single-cluster deployment can’t meet.

Compliance and data residency: Many regulations require data to stay within specific geographic boundaries. You can’t run a single cluster that spans a US region and an EU region and satisfy GDPR data residency requirements. Multi-cluster is mandatory when workloads have different data residency requirements.

Workload isolation: Some workloads should be physically isolated for security reasons — not just namespace-isolated. A PCI DSS cardholder data environment (CDE) may need to run in a dedicated cluster with stricter network controls, separate from your general workloads.

Scale beyond single-cluster limits: A single Kubernetes cluster scales to approximately 5,000 nodes and 150,000 pods (the practical limit is usually lower due to etcd performance). Most organizations won’t hit this. But very large deployments may need to shard workloads across clusters.

Team autonomy at scale: In large organizations, giving separate business units or product teams their own cluster provides clean separation of concerns, independent upgrade schedules, and cost attribution.

When NOT to use multi-cluster:

You have fewer than 50 engineers
Your motivation is “namespace isolation is complex” (fix your namespace strategy first)
You’re trying to solve a cost problem (multi-cluster increases costs)
You don’t have dedicated platform engineering capacity (multi-cluster significantly increases platform toil)

Cluster Lifecycle Management with Cluster API

Cluster API (CAPI) is the Kubernetes-native way to provision and manage multiple clusters. Instead of using cloud-specific CLI tools or Terraform, you declare clusters as Kubernetes resources and a CAPI controller creates and manages them.

# A Cluster API cluster definition (simplified)
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: production-eu-west
  namespace: default
spec:
  clusterNetwork:
    pods:
      cidrBlocks: ["192.168.0.0/16"]
  infrastructureRef:
    kind: AWSCluster        # Infrastructure provider: AWS, GCP, Azure, vSphere, etc.
    name: production-eu-west
  controlPlaneRef:
    kind: KubeadmControlPlane
    name: production-eu-west-control-plane

CAPI’s value proposition: cluster lifecycle (create, upgrade, scale, delete) becomes a GitOps operation. Your cluster infrastructure is declared in Git, and changes are applied by the CAPI controller running in your management cluster.

Management cluster pattern: one cluster runs CAPI and manages all workload clusters. This is the management cluster. Workload clusters run your applications. Never run CAPI on the same cluster you’re provisioning (bootstrapping problem).

CAPI providers exist for all major infrastructure platforms — AWS (CAPA), GCP (CAPG), Azure (CAPZ), VMware vSphere (CAPV), and bare metal.

GitOps Across Clusters with ArgoCD ApplicationSets

ArgoCD ApplicationSets are the scalable way to deploy applications to multiple clusters from a single ArgoCD instance.

The cluster generator creates one ArgoCD Application per registered cluster:

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: platform-services
  namespace: argocd
spec:
  generators:
  - clusters:
      selector:
        matchLabels:
          environment: production    # Only production clusters
  template:
    metadata:
      name: '{{name}}-platform-services'
    spec:
      project: platform
      source:
        repoURL: https://github.com/my-org/platform-config
        targetRevision: HEAD
        path: platform-services/
      destination:
        server: '{{server}}'
        namespace: platform-system

The matrix generator combines multiple generators to create the cross-product — for example, one application per (cluster × environment):

generators:
- matrix:
    generators:
    - clusters: {}
    - git:
        repoURL: https://github.com/my-org/apps
        revision: HEAD
        files:
        - path: "apps/*/config.json"

This pattern enables “deploy this set of applications to all clusters” as a declarative configuration — add a new cluster, add its label, and all ApplicationSets targeting that label automatically create application deployments for it.

Service Connectivity: Cilium ClusterMesh

Cilium ClusterMesh extends pod networking across multiple clusters, enabling:

Services in one cluster to be accessed by pods in other clusters
Global services that load-balance across clusters
Shared identity for network policy enforcement across clusters

# Enable ClusterMesh on both clusters
cilium clustermesh enable --context cluster-a
cilium clustermesh enable --context cluster-b

# Connect the clusters
cilium clustermesh connect \
  --context cluster-a \
  --destination-context cluster-b

# Verify mesh connectivity
cilium clustermesh status --context cluster-a

To expose a service across clusters, add an annotation:

apiVersion: v1
kind: Service
metadata:
  name: my-service
  annotations:
    service.cilium.io/global: "true"          # Share across all clusters in mesh
    service.cilium.io/shared: "true"           # Include in global load balancing

With these annotations, pods in any cluster in the mesh can reach my-service and requests will be load-balanced across all pods with that service name in all clusters.

Use case: active-active deployment across two clusters in different regions. Cilium ClusterMesh provides transparent cross-cluster load balancing without application changes.

Service Mesh Cross-Cluster: Istio Multi-Primary

Istio multi-primary runs an Istio control plane in each cluster (each cluster is a primary) and enables cross-cluster service discovery and mTLS.

Architecture for two clusters in the same network:

# Install Istio on cluster 1 (multi-primary profile)
istioctl install --context cluster-1 \
  -f cluster1.yaml     # Contains mesh ID, cluster name, network

# Install Istio on cluster 2
istioctl install --context cluster-2 \
  -f cluster2.yaml

# Create remote secrets so each cluster can discover the other's endpoints
istioctl x create-remote-secret \
  --context cluster-1 \
  --name cluster-1 | kubectl apply --context cluster-2 -f -

istioctl x create-remote-secret \
  --context cluster-2 \
  --name cluster-2 | kubectl apply --context cluster-1 -f -

After setup, services in either cluster can call services in the other cluster by name. Istio handles service discovery and mTLS transparently.

Multi-primary vs primary-remote: multi-primary (each cluster has its own control plane) provides better availability — the control plane failure in one cluster doesn’t affect the other. Primary-remote (one control plane manages multiple clusters) is simpler but creates a single point of failure.

DNS Across Clusters with ExternalDNS

ExternalDNS automatically creates DNS records in your DNS provider (Route53, Cloud DNS, Azure DNS) for Kubernetes services and ingresses. In a multi-cluster setup, it enables cross-cluster DNS resolution.

Pattern: each cluster writes to a shared DNS zone

# ExternalDNS configuration — cluster-specific txt-owner-id
apiVersion: apps/v1
kind: Deployment
metadata:
  name: external-dns
spec:
  template:
    spec:
      containers:
      - name: external-dns
        args:
        - --source=service
        - --source=ingress
        - --provider=aws
        - --aws-zone-type=public
        - --registry=txt
        - --txt-owner-id=cluster-us-east-1   # Unique per cluster
        - --domain-filter=myapp.com

With txt-owner-id set per-cluster, each cluster creates and manages its own DNS records without overwriting records from other clusters.

Weighted routing for active-active: Route53 (and equivalents) support weighted routing. ExternalDNS with the weighted routing annotation enables percentage-based traffic splitting across clusters:

annotations:
  external-dns.alpha.kubernetes.io/aws-weight: "50"

Ingress and Cross-Cluster Load Balancing

For global load balancing across clusters, several patterns exist:

Pattern 1: Global DNS load balancing (simplest)

Route53 / Cloud DNS GeoDNS routes users to the nearest cluster
No cross-cluster traffic awareness
Works with any ingress controller
Failure mode: DNS TTL means slow failover (30-60 seconds minimum)

Pattern 2: Global load balancer (AWS Global Accelerator, Cloudflare)

Anycast routing to nearest cluster
Health check-based failover in seconds
Works at layer 4 or layer 7
Additional cost, but best latency and failover

Pattern 3: Service mesh with cross-cluster locality-aware routing (Istio)

Locality-weighted load balancing prefers local cluster
Automatic failover to remote cluster if local becomes unhealthy
Requires Istio multi-cluster setup
Most sophisticated, best for stateless services with strict latency requirements

Workload Migration with Liqo

Liqo enables live workload migration between Kubernetes clusters without application changes. It creates a virtual node in the source cluster that maps to capacity in the destination cluster — pods “scheduled” on the virtual node actually run in the destination cluster.

Use cases:

Migrate workloads from on-premises to cloud (without reconfiguring workloads)
Dynamic capacity bursting from one cluster to another
Gradual migration during cluster upgrades

# Install Liqo on both clusters
liqoctl install --context cluster-source
liqoctl install --context cluster-destination

# Peer the clusters
liqoctl peer --context cluster-source \
  --remote-context cluster-destination

Once peered, Liqo advertises available capacity from the destination cluster as a virtual node in the source cluster. The Kubernetes scheduler places pods on the virtual node normally, and Liqo handles cross-cluster networking and pod mirroring.

Multi-Cluster Maturity Stages

Stage 1: Manual management, separate kubeconfigs, kubectl context switching. Fine for 2-3 clusters.

Stage 2: ArgoCD with multiple cluster registrations. Centralized deployment, but no unified networking or service discovery.

Stage 3: ArgoCD ApplicationSets + Cluster API for lifecycle management. Full GitOps for cluster provisioning and application deployment.

Stage 4: Service mesh (Cilium ClusterMesh or Istio) enabling transparent cross-cluster service calls. Global load balancing with ExternalDNS.

Stage 5: Full fleet management with automated cluster upgrades, cross-cluster observability (traces spanning multiple clusters), FinOps attribution per cluster/team.

Most teams should target Stage 3 as their goal before expanding further.

Design Your Multi-Cluster Strategy

Multi-cluster architecture decisions made early are hard to reverse. The networking model, GitOps structure, and observability design need to be thought through before the second cluster is provisioned.

→ Multi-Cluster Strategy service at kubernetes.ae — we design the architecture, select the tooling, and implement the first production multi-cluster setup with full GitOps, networking, and observability.

Get Expert Kubernetes Help

Talk to a certified Kubernetes expert. Free 30-minute consultation — actionable findings within days.

Talk to an Expert