Amazon Web Services¶
This small tutorial wall walk through how to run the Flux Operator (from a development standpoint) on AWS.
Install¶
You should first install eksctrl and make sure you have access to an AWS cloud (e.g., with credentials or similar in your environment). E.g.,:
export AWS_ACCESS_KEY_ID=xxxxxxxxxxxxxxxxxxx
export AWS_SECRET_ACCESS_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
export AWS_SESSION_TOKEN=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
The last session token may not be required depending on your setup. We assume you also have kubectl.
Setup SSH¶
You’ll need an ssh key for EKS. Here is how to generate it:
ssh-keygen
# Ensure you enter the path to ~/.ssh/id_eks
This is used so you can ssh (connect) to your workers!
Create Cluster¶
Next, let’s create our cluster using eksctl “eks control.” IMPORTANT you absolutely need to choose a size that has IsTrunkingCompatible true. Here is an example configuration. Note that we are choosing zones that works for our account (this might vary for you) and an instance size that is appropriate for our workloads. Also note that we are including the path to the ssh key we just generated.
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: flux-operator
region: us-east-1
version: "1.22"
availabilityZones: ["us-east-1a", "us-east-1b", "us-east-1d"]
managedNodeGroups:
- name: workers
instanceType: c5.xlarge
minSize: 4
maxSize: 4
labels: { "fluxoperator": "true" }
ssh:
allow: true
publicKeyPath: ~/.ssh/id_eks.pub
Given the above file eks-cluster-config.yaml
we create the cluster as follows:
$ eksctl create cluster -f eks-cluster-config.yaml
🚧️ Warning! 🚧️ The above takes 15-20 minutes! Go have a party! Grab an avocado! 🥑️ And then come back and view your nodes.
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-192-168-28-166.ec2.internal Ready <none> 4m58s v1.22.12-eks-be74326
ip-192-168-4-145.ec2.internal Ready <none> 4m27s v1.22.12-eks-be74326
ip-192-168-49-92.ec2.internal Ready <none> 5m3s v1.22.12-eks-be74326
ip-192-168-79-92.ec2.internal Ready <none> 4m57s v1.22.12-eks-be74326
Deploy Operator¶
To deploy the Flux Operator, choose one of the options here to deploy the operator. Whether you apply a yaml file, use flux-cloud or clone the repository and make deploy
you will see the operator install to the operator-system
namespace.
...
namespace/operator-system created
customresourcedefinition.apiextensions.k8s.io/miniclusters.flux-framework.org unchanged
serviceaccount/operator-controller-manager created
role.rbac.authorization.k8s.io/operator-leader-election-role created
clusterrole.rbac.authorization.k8s.io/operator-manager-role configured
clusterrole.rbac.authorization.k8s.io/operator-metrics-reader unchanged
clusterrole.rbac.authorization.k8s.io/operator-proxy-role unchanged
rolebinding.rbac.authorization.k8s.io/operator-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/operator-manager-rolebinding unchanged
clusterrolebinding.rbac.authorization.k8s.io/operator-proxy-rolebinding unchanged
configmap/operator-manager-config created
service/operator-controller-manager-metrics-service created
deployment.apps/operator-controller-manager created
Ensure the operator-system
namespace was created:
$ kubectl get namespace
NAME STATUS AGE
default Active 12m
kube-node-lease Active 12m
kube-public Active 12m
kube-system Active 12m
operator-system Active 11s
$ kubectl describe namespace operator-system
Name: operator-system
Labels: control-plane=controller-manager
kubernetes.io/metadata.name=operator-system
Annotations: <none>
Status: Active
No resource quota.
No LimitRange resource.
And you can find the name of the operator pod as follows:
$ kubectl get pod --all-namespaces
<none>
operator-system operator-controller-manager-6c699b7b94-bbp5q 2/2 Running 0 80s 192.168.30.43 ip-192-168-28-166.ec2.internal <none> <none>
Make your namespace for the flux-operator custom resource definition (CRD):
$ kubectl create namespace flux-operator
Then apply your CRD to generate the MiniCluster (default should be size 4, the max nodes of our cluster):
$ make apply
# OR
$ kubectl apply -f config/samples/flux-framework.org_v1alpha1_minicluster.yaml
And now you can get logs for the manager:
$ kubectl logs -n operator-system operator-controller-manager-6c699b7b94-bbp5q
You’ll see “errors” that the ip addresses aren’t ready yet, and the operator
will reconcile until they are. You can add -f
so the logs hang to watch:
$ kubectl logs -n operator-system operator-controller-manager-6c699b7b94-bbp5q -f
Once the logs indicate they are ready, you can look at the listing of nodes and the log for the indexed job (which choosing one pod randomly to show):
$ make list
kubectl get -n flux-operator pods
NAME READY STATUS RESTARTS AGE
flux-sample-0-zfmzc 1/1 Running 0 2m11s
flux-sample-1-p2hh5 1/1 Running 0 2m11s
flux-sample-2-zs4h6 1/1 Running 0 2m11s
flux-sample-3-prtn9 1/1 Running 0 2m11s
And when the containers are running, in the logs you’ll see see lots of cute emojis to indicate progress, and then the start of your server! You’ll need an exposed host to see the user interface, or you can interact to submit jobs via the RESTful API. A Python client is available here.
Clean up¶
Make sure you clean everything up!
$ make undeploy
And then:
$ eksctl delete cluster -f eks-cluster-config.yaml
It might be better to add --wait
, which will wait until all resources are cleaned up:
$ eksctl delete cluster -f eks-cluster-config.yaml --wait
Either way, it’s good to check the web console too to ensure you didn’t miss anything.