Amazon Web Services

This small tutorial wall walk through how to run the Flux Operator (from a development standpoint) on AWS.


You should first install eksctrl and make sure you have access to an AWS cloud (e.g., with credentials or similar in your environment). E.g.,:

export AWS_ACCESS_KEY_ID=xxxxxxxxxxxxxxxxxxx
export AWS_SECRET_ACCESS_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
export AWS_SESSION_TOKEN=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

The last session token may not be required depending on your setup. We assume you also have kubectl.

Setup SSH

You’ll need an ssh key for EKS. Here is how to generate it:

# Ensure you enter the path to ~/.ssh/id_eks

This is used so you can ssh (connect) to your workers!

Create Cluster

Next, let’s create our cluster using eksctl “eks control.” IMPORTANT you absolutely need to choose a size that has IsTrunkingCompatible true. Here is an example configuration. Note that we are choosing zones that works for our account (this might vary for you) and an instance size that is appropriate for our workloads. Also note that we are including the path to the ssh key we just generated.

kind: ClusterConfig

  name: flux-operator
  region: us-east-1
  version: "1.22"

availabilityZones: ["us-east-1a", "us-east-1b", "us-east-1d"]
  - name: workers
    instanceType: c5.xlarge
    minSize: 4
    maxSize: 4
    labels: { "fluxoperator": "true" }
      allow: true
      publicKeyPath: ~/.ssh/

Given the above file eks-cluster-config.yaml we create the cluster as follows:

$ eksctl create cluster -f eks-cluster-config.yaml

🚧️ Warning! 🚧️ The above takes 15-20 minutes! Go have a party! Grab an avocado! 🥑️ And then come back and view your nodes.

$ kubectl get nodes
NAME                             STATUS   ROLES    AGE     VERSION
ip-192-168-28-166.ec2.internal   Ready    <none>   4m58s   v1.22.12-eks-be74326
ip-192-168-4-145.ec2.internal    Ready    <none>   4m27s   v1.22.12-eks-be74326
ip-192-168-49-92.ec2.internal    Ready    <none>   5m3s    v1.22.12-eks-be74326
ip-192-168-79-92.ec2.internal    Ready    <none>   4m57s   v1.22.12-eks-be74326

Deploy Operator

To deploy the Flux Operator, choose one of the options here to deploy the operator. Whether you apply a yaml file, use flux-cloud or clone the repository and make deploy you will see the operator install to the operator-system namespace.

namespace/operator-system created unchanged
serviceaccount/operator-controller-manager created created configured unchanged unchanged created unchanged unchanged
configmap/operator-manager-config created
service/operator-controller-manager-metrics-service created
deployment.apps/operator-controller-manager created

Ensure the operator-system namespace was created:

$ kubectl get namespace
NAME              STATUS   AGE
default           Active   12m
kube-node-lease   Active   12m
kube-public       Active   12m
kube-system       Active   12m
operator-system   Active   11s
$ kubectl describe namespace operator-system
Name:         operator-system
Labels:       control-plane=controller-manager
Annotations:  <none>
Status:       Active

No resource quota.

No LimitRange resource.

And you can find the name of the operator pod as follows:

$ kubectl get pod --all-namespaces
operator-system   operator-controller-manager-6c699b7b94-bbp5q   2/2     Running   0             80s    ip-192-168-28-166.ec2.internal   <none>           <none>

Make your namespace for the flux-operator custom resource definition (CRD):

$ kubectl create namespace flux-operator

Then apply your CRD to generate the MiniCluster (default should be size 4, the max nodes of our cluster):

$ make apply
# OR
$ kubectl apply -f config/samples/flux-framework.org_v1alpha1_minicluster.yaml 

And now you can get logs for the manager:

$ kubectl logs -n operator-system operator-controller-manager-6c699b7b94-bbp5q

You’ll see “errors” that the ip addresses aren’t ready yet, and the operator will reconcile until they are. You can add -f so the logs hang to watch:

$ kubectl logs -n operator-system operator-controller-manager-6c699b7b94-bbp5q -f

Once the logs indicate they are ready, you can look at the listing of nodes and the log for the indexed job (which choosing one pod randomly to show):

$ make list
kubectl get -n flux-operator pods
NAME                  READY   STATUS    RESTARTS   AGE
flux-sample-0-zfmzc   1/1     Running   0          2m11s
flux-sample-1-p2hh5   1/1     Running   0          2m11s
flux-sample-2-zs4h6   1/1     Running   0          2m11s
flux-sample-3-prtn9   1/1     Running   0          2m11s

And when the containers are running, in the logs you’ll see see lots of cute emojis to indicate progress, and then the start of your server! You’ll need an exposed host to see the user interface, or you can interact to submit jobs via the RESTful API. A Python client is available here.

Clean up

Make sure you clean everything up!

$ make undeploy

And then:

$ eksctl delete cluster -f eks-cluster-config.yaml

It might be better to add --wait, which will wait until all resources are cleaned up:

$ eksctl delete cluster -f eks-cluster-config.yaml --wait

Either way, it’s good to check the web console too to ensure you didn’t miss anything.

Last update: Jan 27, 2023