Lab Setup: Chaos Mesh, Scaling, and Pod affinity

This guide outlines steps to enhance the resilience of a UI service by implementing high availability practices. We'll cover installing helm, scaling the UI service, implementing pod anti-affinity, and using a helper script to visualize pod distribution across availability zones.

Installing Chaos Mesh

To enhance our cluster's resilience testing capabilities, we'll install Chaos Mesh. Chaos Mesh is a powerful chaos engineering tool for Kubernetes environments. It allows us to simulate various failure scenarios and test how our applications respond.

Let's install Chaos Mesh in our cluster using Helm:

~$helm repo add chaos-mesh https://charts.chaos-mesh.org

~$helm upgrade --install chaos-mesh chaos-mesh/chaos-mesh \

--namespace chaos-mesh \

--create-namespace \

--version 2.5.1 \

--set dashboard.create=true \

--wait

Release "chaos-mesh" does not exist. Installing it now.

NAME: chaos-mesh

LAST DEPLOYED: Tue Aug 20 04:44:31 2024

NAMESPACE: chaos-mesh

STATUS: deployed

REVISION: 1

TEST SUITE: None

Scaling and Topology Spread Constraints

We use a Kustomize patch to modify the UI deployment, scaling it to 5 replicas and adding topology spread constraints rules. This ensures UI pods are distributed across different nodes, reducing the impact of node failures.

Here's the content of our patch file:

Kustomize Patch
Deployment/ui
Diff

~/environment/eks-workshop/modules/observability/resiliency/high-availability/config/scale_and_affinity_patch.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ui
  namespace: ui
spec:
  replicas: 5
  selector:
    matchLabels:
      app: ui
  template:
    metadata:
      labels:
        app: ui
    spec:
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: ScheduleAnyway
          labelSelector:
            matchLabels:
              app: ui
        - maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: ScheduleAnyway
          labelSelector:
            matchLabels:
              app: ui

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/created-by: eks-workshop
    app.kubernetes.io/type: app
  name: ui
  namespace: ui
spec:
  replicas: 5
  selector:
    matchLabels:
      app: ui
      app.kubernetes.io/component: service
      app.kubernetes.io/instance: ui
      app.kubernetes.io/name: ui
  template:
    metadata:
      annotations:
        prometheus.io/path: /actuator/prometheus
        prometheus.io/port: "8080"
        prometheus.io/scrape: "true"
      labels:
        app: ui
        app.kubernetes.io/component: service
        app.kubernetes.io/created-by: eks-workshop
        app.kubernetes.io/instance: ui
        app.kubernetes.io/name: ui
    spec:
      containers:
        - env:
            - name: JAVA_OPTS
              value: -XX:MaxRAMPercentage=75.0 -Djava.security.egd=file:/dev/urandom
            - name: METADATA_KUBERNETES_POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: METADATA_KUBERNETES_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: METADATA_KUBERNETES_NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
          envFrom:
            - configMapRef:
                name: ui
          image: public.ecr.aws/aws-containers/retail-store-sample-ui:1.2.1
          imagePullPolicy: IfNotPresent
          livenessProbe:
            httpGet:
              path: /actuator/health/liveness
              port: 8080
            initialDelaySeconds: 45
            periodSeconds: 20
          name: ui
          ports:
            - containerPort: 8080
              name: http
              protocol: TCP
          resources:
            limits:
              memory: 1.5Gi
            requests:
              cpu: 250m
              memory: 1.5Gi
          securityContext:
            capabilities:
              add:
                - NET_BIND_SERVICE
              drop:
                - ALL
            readOnlyRootFilesystem: true
            runAsNonRoot: true
            runAsUser: 1000
          volumeMounts:
            - mountPath: /tmp
              name: tmp-volume
      securityContext:
        fsGroup: 1000
      serviceAccountName: ui
      topologySpreadConstraints:
        - labelSelector:
            matchLabels:
              app: ui
          maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: ScheduleAnyway
        - labelSelector:
            matchLabels:
              app: ui
          maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: ScheduleAnyway
      volumes:
        - emptyDir:
            medium: Memory
          name: tmp-volume

     app.kubernetes.io/type: app
   name: ui
   namespace: ui
 spec:
-  replicas: 1
+  replicas: 5
   selector:
     matchLabels:
+      app: ui
       app.kubernetes.io/component: service
       app.kubernetes.io/instance: ui
       app.kubernetes.io/name: ui
   template:
[...]
         prometheus.io/path: /actuator/prometheus
         prometheus.io/port: "8080"
         prometheus.io/scrape: "true"
       labels:
+        app: ui
         app.kubernetes.io/component: service
         app.kubernetes.io/created-by: eks-workshop
         app.kubernetes.io/instance: ui
         app.kubernetes.io/name: ui
[...]
               name: tmp-volume
       securityContext:
         fsGroup: 1000
       serviceAccountName: ui
+      topologySpreadConstraints:
+        - labelSelector:
+            matchLabels:
+              app: ui
+          maxSkew: 1
+          topologyKey: topology.kubernetes.io/zone
+          whenUnsatisfiable: ScheduleAnyway
+        - labelSelector:
+            matchLabels:
+              app: ui
+          maxSkew: 1
+          topologyKey: kubernetes.io/hostname
+          whenUnsatisfiable: ScheduleAnyway
       volumes:
         - emptyDir:
             medium: Memory
           name: tmp-volume

Apply the changes using Kustomize patch and Kustomization file:

~$kubectl delete deployment ui -n ui

~$kubectl apply -k ~/environment/eks-workshop/modules/observability/resiliency/high-availability/config/

Verify Retail Store Accessibility

After applying these changes, it's important to verify that your retail store is accessible:

~$wait-for-lb $(kubectl get ingress -n ui -o jsonpath='{.items[0].status.loadBalancer.ingress[0].hostname}')

Waiting for k8s-ui-ui-5ddc3ba496-721427594.us-west-2.elb.amazonaws.com...

You can now access http://k8s-ui-ui-5ddc3ba496-721427594.us-west-2.elb.amazonaws.com

Once this command completes, it will output a URL. Open this URL in a new browser tab to verify that your retail store is accessible and functioning correctly.

tip

The retail url may take 5-10 minutes to become operational.

Helper Script: Get Pods by AZ

The get-pods-by-az.sh script helps visualize the distribution of Kubernetes pods across different availability zones in the terminal. You can view the script file on github here.

Script Execution

To run the script and see the distribution of pods across availability zones, execute:

~$timeout 10s ~/$SCRIPT_DIR/get-pods-by-az.sh | head -n 30

------us-west-2a------

  ip-10-42-127-82.us-west-2.compute.internal:

       ui-6dfb84cf67-6fzrk   1/1   Running   0     56s

       ui-6dfb84cf67-dsp55   1/1   Running   0     56s

------us-west-2b------

  ip-10-42-153-179.us-west-2.compute.internal:

       ui-6dfb84cf67-2pxnp   1/1   Running   0     59s

------us-west-2c------

  ip-10-42-186-246.us-west-2.compute.internal:

       ui-6dfb84cf67-n8x4f   1/1   Running   0     61s

       ui-6dfb84cf67-wljth   1/1   Running   0     61s

info

For more information on these changes, check out these sections:

Installing Chaos Mesh​

Scaling and Topology Spread Constraints​

Verify Retail Store Accessibility​

Helper Script: Get Pods by AZ​

Script Execution​

Installing Chaos Mesh

Scaling and Topology Spread Constraints

Verify Retail Store Accessibility

Helper Script: Get Pods by AZ

Script Execution