Skip to main content

Karpenter Setup

In this section we will configure Karpenter to allow the creation of Inferentia and Trainium EC2 instances. Karpenter can detect the pending Pods that require an inf2 or trn1 instance. Karpenter will then launch the required instance to schedule the Pod.

tip

You can learn more about Karpenter in the Karpenter module that's provided in this workshop.

Karpenter has been installed in our EKS cluster, and runs as a deployment:

~$kubectl get deployment -n kube-system
NAME        READY   UP-TO-DATE   AVAILABLE   AGE
...
karpenter   2/2     2            2           11m

Karpenter requires a NodePool to provision nodes. This is the Karpenter NodePool that we will create:

~/environment/eks-workshop/modules/aiml/inferentia/nodepool/nodepool.yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: aiml
spec:
template:
metadata:
labels:
instanceType: "neuron"
provisionerType: "karpenter"
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values:
- on-demand
- key: karpenter.k8s.aws/instance-family
operator: In
values:
- inf2
- trn1
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: aiml
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: aiml
spec:
amiFamily: AL2
amiSelectorTerms:
- alias: al2@latest
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
deleteOnTermination: true
volumeSize: 100Gi
volumeType: gp3
role: ${KARPENTER_NODE_ROLE}
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: ${EKS_CLUSTER_NAME}
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: ${EKS_CLUSTER_NAME}
tags:
app.kubernetes.io/created-by: eks-workshop
A

In this section we assign what instances this NodePool is allowed to provision for us

B

You can see here that we've configured this NodePool to only allow the creation of inf2 and trn1 instances

Apply the NodePool and EC2NodeClass manifest:

~$kubectl kustomize ~/environment/eks-workshop/modules/aiml/inferentia/nodepool \
| envsubst | kubectl apply -f-

Now the NodePool is ready for the creation for our training and inference Pods.