Lab Setup: Chaos Mesh, Scaling, and Pod affinity
This guide outlines steps to enhance the resilience of a UI service by implementing high availability practices. We'll cover installing helm, scaling the UI service, implementing pod anti-affinity, and using a helper script to visualize pod distribution across availability zones.
Installing Chaos Mesh
To enhance our cluster's resilience testing capabilities, we'll install Chaos Mesh. Chaos Mesh is a powerful chaos engineering tool for Kubernetes environments. It allows us to simulate various failure scenarios and test how our applications respond.
Let's install Chaos Mesh in our cluster using Helm:
Release "chaos-mesh" does not exist. Installing it now.
NAME: chaos-mesh
LAST DEPLOYED: Tue Aug 20 04:44:31 2024
NAMESPACE: chaos-mesh
STATUS: deployed
REVISION: 1
TEST SUITE: None
Scaling and Topology Spread Constraints
We use a Kustomize patch to modify the UI deployment, scaling it to 5 replicas and adding topology spread constraints rules. This ensures UI pods are distributed across different nodes, reducing the impact of node failures.
Here's the content of our patch file:
- Kustomize Patch
- Deployment/ui
- Diff
apiVersion: apps/v1
kind: Deployment
metadata:
name: ui
namespace: ui
spec:
replicas: 5
selector:
matchLabels:
app: ui
template:
metadata:
labels:
app: ui
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: ui
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: ui
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/created-by: eks-workshop
app.kubernetes.io/type: app
name: ui
namespace: ui
spec:
replicas: 5
selector:
matchLabels:
app: ui
app.kubernetes.io/component: service
app.kubernetes.io/instance: ui
app.kubernetes.io/name: ui
template:
metadata:
annotations:
prometheus.io/path: /actuator/prometheus
prometheus.io/port: "8080"
prometheus.io/scrape: "true"
labels:
app: ui
app.kubernetes.io/component: service
app.kubernetes.io/created-by: eks-workshop
app.kubernetes.io/instance: ui
app.kubernetes.io/name: ui
spec:
containers:
- env:
- name: JAVA_OPTS
value: -XX:MaxRAMPercentage=75.0 -Djava.security.egd=file:/dev/urandom
envFrom:
- configMapRef:
name: ui
image: public.ecr.aws/aws-containers/retail-store-sample-ui:0.4.0
imagePullPolicy: IfNotPresent
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 45
periodSeconds: 20
name: ui
ports:
- containerPort: 8080
name: http
protocol: TCP
resources:
limits:
memory: 1.5Gi
requests:
cpu: 250m
memory: 1.5Gi
securityContext:
capabilities:
add:
- NET_BIND_SERVICE
drop:
- ALL
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- mountPath: /tmp
name: tmp-volume
securityContext:
fsGroup: 1000
serviceAccountName: ui
topologySpreadConstraints:
- labelSelector:
matchLabels:
app: ui
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
- labelSelector:
matchLabels:
app: ui
maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
volumes:
- emptyDir:
medium: Memory
name: tmp-volume
app.kubernetes.io/type: app
name: ui
namespace: ui
spec:
- replicas: 1
+ replicas: 5
selector:
matchLabels:
+ app: ui
app.kubernetes.io/component: service
app.kubernetes.io/instance: ui
app.kubernetes.io/name: ui
template:
[...]
prometheus.io/path: /actuator/prometheus
prometheus.io/port: "8080"
prometheus.io/scrape: "true"
labels:
+ app: ui
app.kubernetes.io/component: service
app.kubernetes.io/created-by: eks-workshop
app.kubernetes.io/instance: ui
app.kubernetes.io/name: ui
[...]
name: tmp-volume
securityContext:
fsGroup: 1000
serviceAccountName: ui
+ topologySpreadConstraints:
+ - labelSelector:
+ matchLabels:
+ app: ui
+ maxSkew: 1
+ topologyKey: topology.kubernetes.io/zone
+ whenUnsatisfiable: ScheduleAnyway
+ - labelSelector:
+ matchLabels:
+ app: ui
+ maxSkew: 1
+ topologyKey: kubernetes.io/hostname
+ whenUnsatisfiable: ScheduleAnyway
volumes:
- emptyDir:
medium: Memory
name: tmp-volume
Apply the changes using Kustomize patch and Kustomization file:
Verify Retail Store Accessibility
After applying these changes, it's important to verify that your retail store is accessible:
Waiting for k8s-ui-ui-5ddc3ba496-721427594.us-west-2.elb.amazonaws.com...
You can now access http://k8s-ui-ui-5ddc3ba496-721427594.us-west-2.elb.amazonaws.com
Once this command completes, it will output a URL. Open this URL in a new browser tab to verify that your retail store is accessible and functioning correctly.
The retail url may take 5-10 minutes to become operational.
Helper Script: Get Pods by AZ
The get-pods-by-az.sh
script helps visualize the distribution of Kubernetes pods across different availability zones in the terminal. You can view the script file on github here.
Script Execution
To run the script and see the distribution of pods across availability zones, execute:
------us-west-2a------
ip-10-42-127-82.us-west-2.compute.internal:
ui-6dfb84cf67-6fzrk 1/1 Running 0 56s
ui-6dfb84cf67-dsp55 1/1 Running 0 56s
------us-west-2b------
ip-10-42-153-179.us-west-2.compute.internal:
ui-6dfb84cf67-2pxnp 1/1 Running 0 59s
------us-west-2c------
ip-10-42-186-246.us-west-2.compute.internal:
ui-6dfb84cf67-n8x4f 1/1 Running 0 61s
ui-6dfb84cf67-wljth 1/1 Running 0 61s
For more information on these changes, check out these sections: