Amazon Web Services

This set ofl tutorials wall walk through how to run the Flux Operator on AWS! You can start with setup and then move down to examples.

Setup

Install

You should first install eksctrl and make sure you have access to an AWS cloud (e.g., with credentials or similar in your environment). E.g.,:

export AWS_ACCESS_KEY_ID=xxxxxxxxxxxxxxxxxxx
export AWS_SECRET_ACCESS_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
export AWS_SESSION_TOKEN=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

The last session token may not be required depending on your setup. We assume you also have kubectl.

Setup SSH

This step is optional if you want to access your nodes. You’ll need an ssh key for EKS. Here is how to generate it:

ssh-keygen
# Ensure you enter the path to ~/.ssh/id_eks

This is used so you can ssh (connect) to your workers!

Create Cluster

Next, let’s create our cluster using eksctl “eks control.” IMPORTANT you absolutely need to choose a size that has IsTrunkingCompatible true. Here is an example configuration. Note that we are choosing zones that works for our account (this might vary for you) and an instance size that is appropriate for our workloads. Also note that we are including the path to the ssh key we just generated.

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: flux-operator
  region: us-east-1
  version: "1.22"

availabilityZones: ["us-east-1a", "us-east-1b", "us-east-1d"]
managedNodeGroups:
  - name: workers
    instanceType: c5.xlarge
    minSize: 4
    maxSize: 4
    labels: { "fluxoperator": "true" }
    ssh:
      allow: true
      publicKeyPath: ~/.ssh/id_eks.pub

If you don’t need an ssh key:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: flux-operator
  region: us-east-1
  version: "1.23"

availabilityZones: ["us-east-1a", "us-east-1b", "us-east-1d"]
managedNodeGroups:
  - name: workers
    instanceType: c5.xlarge
    minSize: 4
    maxSize: 4
    labels: { "fluxoperator": "true" }

Given the above file eks-cluster-config.yaml we create the cluster as follows:

$ eksctl create cluster -f eksctl-config.yaml

# use the provided
$ eksctl create cluster -f ./examples/storage/aws/oidc/eksctl-config.yaml

🚧️ Warning! 🚧️ The above takes 15-20 minutes! Go have a party! Grab an avocado! 🥑️ And then come back and view your nodes.

$ kubectl get nodes
NAME                             STATUS   ROLES    AGE     VERSION
ip-192-168-28-166.ec2.internal   Ready    <none>   4m58s   v1.22.12-eks-be74326
ip-192-168-4-145.ec2.internal    Ready    <none>   4m27s   v1.22.12-eks-be74326
ip-192-168-49-92.ec2.internal    Ready    <none>   5m3s    v1.22.12-eks-be74326
ip-192-168-79-92.ec2.internal    Ready    <none>   4m57s   v1.22.12-eks-be74326

Deploy Operator

To deploy the Flux Operator, choose one of the options here to deploy the operator. Whether you apply a yaml file, use flux-cloud or clone the repository and make deploy. You can also deploy a development image:

$ make test-deploy
$ kubectl apply -f examples/dist/flux-operator-dev.yaml

you will see the operator install to the operator-system namespace.

...
namespace/operator-system created
customresourcedefinition.apiextensions.k8s.io/miniclusters.flux-framework.org unchanged
serviceaccount/operator-controller-manager created
role.rbac.authorization.k8s.io/operator-leader-election-role created
clusterrole.rbac.authorization.k8s.io/operator-manager-role configured
clusterrole.rbac.authorization.k8s.io/operator-metrics-reader unchanged
clusterrole.rbac.authorization.k8s.io/operator-proxy-role unchanged
rolebinding.rbac.authorization.k8s.io/operator-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/operator-manager-rolebinding unchanged
clusterrolebinding.rbac.authorization.k8s.io/operator-proxy-rolebinding unchanged
configmap/operator-manager-config created
service/operator-controller-manager-metrics-service created
deployment.apps/operator-controller-manager created

Ensure the operator-system namespace was created:

$ kubectl get namespace
NAME              STATUS   AGE
default           Active   12m
kube-node-lease   Active   12m
kube-public       Active   12m
kube-system       Active   12m
operator-system   Active   11s
$ kubectl describe namespace operator-system
Name:         operator-system
Labels:       control-plane=controller-manager
              kubernetes.io/metadata.name=operator-system
Annotations:  <none>
Status:       Active

No resource quota.

No LimitRange resource.

And you can find the name of the operator system pod as follows:

$ kubectl get pod --all-namespaces
      <none>
operator-system   operator-controller-manager-6c699b7b94-bbp5q   2/2     Running   0             80s   192.168.30.43    ip-192-168-28-166.ec2.internal   <none>           <none>

Create the flux-operator namespace

Make your namespace for the flux-operator custom resource definition (CRD):

$ kubectl create namespace flux-operator

Examples

After setup, choose one of the following examples to run.

Run LAMMPS

Then apply your CRD to generate the MiniCluster (default should be size 4, the max nodes of our cluster):

$ make apply
# OR
$ kubectl apply -f config/samples/flux-framework.org_v1alpha1_minicluster.yaml

And now you can get logs for the manager:

$ kubectl logs -n operator-system operator-controller-manager-6c699b7b94-bbp5q

You’ll see “errors” that the ip addresses aren’t ready yet, and the operator will reconcile until they are. You can add -f so the logs hang to watch:

$ kubectl logs -n operator-system operator-controller-manager-6c699b7b94-bbp5q -f

Once the logs indicate they are ready, you can look at the listing of nodes and the log for the indexed job (which choosing one pod randomly to show):

$ make list
kubectl get -n flux-operator pods
NAME                  READY   STATUS    RESTARTS   AGE
flux-sample-0-zfmzc   1/1     Running   0          2m11s
flux-sample-1-p2hh5   1/1     Running   0          2m11s
flux-sample-2-zs4h6   1/1     Running   0          2m11s
flux-sample-3-prtn9   1/1     Running   0          2m11s

And when the containers are running, in the logs you’ll see see lots of cute emojis to indicate progress, and then the start of your server! You’ll need an exposed host to see the user interface, or you can interact to submit jobs via the RESTful API. A Python client is available here. To use Flux Cloud to programmatically submit jobs, see the guides here.

Run Snakemake with a Shared Filesystem

This small tutorial will run a Snakemake workflow on AWS that requires a shared filesystem.

Prepare S3 Storage

Let’s first get our Snakemake pre-requisite analysis files into S3. To start, prepare your data in a temporary directory.

$ git clone --depth 1 https://github.com/snakemake/snakemake-tutorial-data /tmp/workflow

You’ll want to add the Snakefile for your workflow along with a plotting script:

$ wget -O /tmp/workflow/Snakefile https://raw.githubusercontent.com/rse-ops/flux-hpc/main/snakemake/atacseq/Snakefile
$ mkdir -p /tmp/workflow/scripts
$ wget -O /tmp/workflow/scripts/plot-quals.py https://raw.githubusercontent.com/rse-ops/flux-hpc/main/snakemake/atacseq/scripts/plot-quals.py

You should have this structure:

$ tree /tmp/workflow
/tmp/workflow/
├── data
│   ├── genome.fa
│   ├── genome.fa.amb
│   ├── genome.fa.ann
│   ├── genome.fa.bwt
│   ├── genome.fa.fai
│   ├── genome.fa.pac
│   ├── genome.fa.sa
│   └── samples
│       ├── A.fastq
│       ├── B.fastq
│       └── C.fastq
├── Dockerfile
├── environment.yaml
├── README.md
├── scripts
│   └── plot-quals.py
└── Snakefile

We can then use the aws command line client to make a bucket “mb” and upload to it.

$ aws s3 mb s3://flux-operator-bucket --region us-east-1

Sanity check that listing buckets shows your bucket:

$ aws s3 ls
2023-02-18 18:14:32 flux-operator-workflows
...

Now copy the entire workflow to a faux “subdirectory” there:

# Present working directory 
$ aws s3 cp --recursive . s3://flux-operator-bucket/snakemake-workflow --exclude ".git*"

Sanity check again by listing that path in the bucket

$ aws s3 ls s3://flux-operator-bucket/snakemake-workflow/
S3 Storage Policy

For our testing case, we made the public and created the following Permission -> Bucket policy:

{
  "Version":"2012-10-17",
  "Statement":[
    {
      "Sid":"AddPerm",
      "Effect":"Allow",
      "Principal": "*",
      "Action": "s3:*",
      "Resource":["arn:aws:s3:::my-bucket-name/*"]
    }
  ]
}

You should obviously create a policy that would be associated with your user or credential (IAM) account.

Prepare the OIDC Provider for EKS

MiniCluster YAML

We will be following this guide. First, create an OIDC role for the cluster:

$ aws eks describe-cluster --name flux-operator --query "cluster.identity.oidc.issuer" --output text

Get the identifier EXAMPLEXXXXXXXXXXXXXXX and check if you’ve already done this. If you have, the following command will have output:

$ aws iam list-open-id-connect-providers | grep EXAMPLED539D4633E53DE1B7

If there is new output, open up the following section to create the OIDC provider:

Create the OIDC provider
$ eksctl utils associate-iam-oidc-provider --cluster flux-operator --approve
2023-02-19 19:56:55 [ℹ]  will create IAM Open ID Connect provider for cluster "flux-operator" in "us-east-1"
2023-02-19 19:56:56 [✔]  created IAM Open ID Connect provider for cluster "flux-operator" in "us-east-1"

Then create a policy.json with your bucket name (you only need to do this once). The s3* is important so we have permissions to mount, read, write, etc.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::<your-bucket-name>"
            ]
        }
    ]
}

Apply the policy (here is with the example and bucket we provide):

$ aws iam create-policy --policy-name kubernetes-s3-access --policy-document file://./examples/storage/aws/oidc/policy.json

Once you have the policy, create the otomount namespace:

$ kubectl create namespace otomount

And then use eksctl to create the iam service account:

$ eksctl create iamserviceaccount --name s3-mounter --namespace otomount --cluster flux-operator \
    --role-name "eks-otomounter-role" --attach-policy-arn arn:aws:iam::633731392008:policy/kubernetes-s3-access --approve
2023-03-01 13:32:53 [ℹ]  1 iamserviceaccount (otomount/s3-mounter) was included (based on the include/exclude rules)
2023-03-01 13:32:53 [!]  serviceaccounts that exist in Kubernetes will be excluded, use --override-existing-serviceaccounts to override
2023-03-01 13:32:53 [ℹ]  1 task: { 
    2 sequential sub-tasks: { 
        create IAM role for serviceaccount "otomount/s3-mounter",
        create serviceaccount "otomount/s3-mounter",
    } }2023-03-01 13:32:53 [ℹ]  building iamserviceaccount stack "eksctl-flux-operator-addon-iamserviceaccount-otomount-s3-mounter"
2023-03-01 13:32:53 [ℹ]  deploying stack "eksctl-flux-operator-addon-iamserviceaccount-otomount-s3-mounter"
2023-03-01 13:32:54 [ℹ]  waiting for CloudFormation stack "eksctl-flux-operator-addon-iamserviceaccount-otomount-s3-mounter"
2023-03-01 13:33:24 [ℹ]  waiting for CloudFormation stack "eksctl-flux-operator-addon-iamserviceaccount-otomount-s3-mounter"
2023-03-01 13:33:24 [ℹ]  created serviceaccount "otomount/s3-mounter"

Check to make sure it worked:

$ aws iam get-role --role-name eks-otomounter-role --query Role.AssumeRolePolicyDocument

And sanity check the attached role policies:

$ aws iam list-attached-role-policies --role-name eks-otomounter-role --query AttachedPolicies[].PolicyArn --output text
arn:aws:iam::633731392008:policy/kubernetes-s3-access

Save that to a variable

export policy_arn=arn:aws:iam::633731392008:policy/kubernetes-s3-access

Look at it again, to sanity check the annotation of the arn:

$ aws iam get-policy --policy-arn $policy_arn
$ aws iam get-policy-version --policy-arn $policy_arn --version-id v1

Finally, ensure the role shows up alongside the service account:

$ kubectl describe serviceaccount s3-mounter -n otomount

Also ensure that the region your bucket is in matches the resources you are interacting with above! At this point you have the correct service account and policy, and next need to create a daemonset that will create the mount using that service account!

Install the S3 mounter

This storage drive can be installed with helm, but for reproducibility we will install directly from yaml (that was generated via helm). The difference here is that we have already created the service account. In case you need to see how we generated the original oidc.yaml:

$ helm template s3-mounter otomount/s3-otomount  --namespace otomount --set bucketName="flux-operator-bucket" \
   --set iamRoleARN=arn:aws:iam::633731392008:policy/kubernetes-s3-access --create-namespace > ./examples/aws/oidc/oidc.yaml

To install the mounter pods (that are going to run goofys), we create a daemon set that will do the work as follows:

$ kubectl apply -f ./examples/storage/aws/oidc/oidc.yaml

A few notes about this file:

  • the mount permissions assume that root (uid/gid 0) is going to run the workflow, runFluxAsRoot is set to true

  • the volume needs to be mounted in the pod under /tmp/* otherwise you won’t be able to cleanup

  • the goofys flags are what we used, but it is not clear if all are needed.

Check the pods are running:

$ kubectl get -n otomount pods

You can look at their logs to debug any issues with the mount or permissions. You should ensure they are running with no obvious error before continuing! Then (assuming you’ve already installed the operator and created the flux-operator namespace):

$ kubectl create -f ./examples/storage/aws/oidc/minicluster.yaml

The biggest factor of whether your mount will work (with permission to read and write) is determined by the S3 storage policy and rules. For testing, we were irresponsible and made the bucket public, but you likely don’t want to do that. Next, get pods - you’ll see the containers creating and then running:

$ kubectl get -n flux-operator pods
NAME                         READY   STATUS              RESTARTS   AGE
flux-sample-0-f5znt          0/1     ContainerCreating   0          100s
flux-sample-1-th589          0/1     ContainerCreating   0          100s

You can get the output file in the terminal (and don’t worry about saving it too much, as it will save to storage). By default cleanup is set to False so you shouldn’t lose the pod to get the pods for it.

$ kubectl get -n flux-operator flux-sample-0-f5znt

Finally, note that with snakemake, once the output file in plots, called_reads and mapped_reads exist, if you run it a second time, snakemake will determine there isn’t anything to do.

Clean up

Make sure you clean everything up! Detach roles:

$ eksctl delete iamserviceaccount --name s3-mounter --namespace otomount --cluster flux-operator

Delete the flux operator in some way:

$ make undeploy
$ kubectl delete -f examples/dist/flux-operator-dev.yaml
$ kubectl delete -f examples/dist/flux-operator.yaml

If you created roles, you probably want to clean these up too:

$ aws iam delete-role --role-name eks-otomounter-role
$ aws iam delete-policy --policy-arn arn:aws:iam::633731392008:policy/kubernetes-s3-access
$ aws iam delete-open-id-connect-provider --open-id-connect-provider-arn "arn:aws:iam::633731392008:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/xxxxxxxxxxxxxx"

And then delete your cluster (e.g., one of the following)

$ eksctl delete cluster -f examples/storage/aws/oidc/eksctl-config.yaml --wait
$ eksctl delete cluster -f eksctl-config.yaml --wait

Either way, it’s good to check the web console too to ensure you didn’t miss anything.


Last update: Sep 10, 2023