Scaling¶
This functionality requires Kubernetes 1.27 and later.
While Flux does not natively support scaling or elasticity (yet) we can do some tricks with the Flux Operator to enable it! Specifically:
We tell Flux to create a cluster at the maximum size that is possible (and the broker config sees this many nodes)
We update the resource spec to reflect that.
The cluster cannot be smaller than 1 node, meaning only a broker.
A cluster defined with an initial
size
can be scaled down (and back up) (see the basic example)A cluster cannot be larger than it’s original
size
ormaxSize
You are allowed to start smaller and specify a maxSize (see the expand example)
Your cluster will start at
size
and can go up tomaxSize
The cluster will start with the minimum number of nodes
size
You can go below the original size.
Note
This setup works most effectively in the container->launcher or interactive:true mode, where the broker is started without setting it to a scoped number of tasks for a specific command. When you provide a command (not using interactive or launcher mode) the operator will prepare a "flux submit" with the number of tasks available (the smaller size) and this will be the number of tasks allocated to your command. If you then scaled the cluster up, although the added resources might be seen by the broker or available for you to interact with otherwise, they won't be magically added to the command you initially ran. For this reason, we suggest that scaling is done in launcher or interactive mode, or with a strategy in mind for what to do with the new resources.
Basic Example¶
Starting with a cluster at a maximum size and scaling it up and down
To run this example:
$ minikube start --kubernetes-version=1.27.0
Install the operator, create the namespace, and create the MiniCluster:
kubectl apply -f ./examples/dist/flux-operator.yaml
kubectl apply -f examples/scaling/basic/minicluster.yaml
Check Initial Size¶
Wait until the cluster finishes, and you see the pods are ready to go (Running state):
kubectl get pods
NAME READY STATUS RESTARTS AGE
flux-sample-0-kfl7p 1/1 Running 0 76s
flux-sample-1-2v57b 1/1 Running 0 76s
flux-sample-2-m7n85 1/1 Running 0 76s
flux-sample-3-zkmvq 1/1 Running 0 76s
We recommend (in another terminal) shelling into the broker pod and connecting to the broker’s Flux instance so that you can follow the changes.
$ kubectl exec -it flux-sample-0-xd2gc -- bash
source /mnt/flux/flux-view.sh
flux proxy $fluxsocket bash
Here is how to look at the state of the cluster. When you first create it, we will have 4 pods, and all of them are up.
[root@flux-sample-0 /]# flux resource list
STATE NNODES NCORES NGPUS NODELIST
free 4 40 0 flux-sample-[0-3]
allocated 0 0 0
down 0 0 0
At this point we want to try changing the size.
Ask for a Larger Size¶
- size: 4
+ size: 5
Let’s first try asking for something the operator can’t give us - a larger size. After the change above, do:
$ kubectl apply -f examples/scaling/basic/minicluster.yaml
The reason a larger size isn’t supported is because the Flux broker already has registered N nodes, known by their fully qualified domain name, and we would need to do some tricks to update that configuration to add another one. While this might be possible (and likely will be in the future) for now we don’t support it. Thus, if you do a request for a size that is larger than the originally created maximum size, you’ll see this in the operator logs:
1.6831543179369373e+09 INFO minicluster-reconciler MiniCluster {"Size": 4, "Requested Size": 5}
1.6831543179369428e+09 INFO minicluster-reconciler MiniCluster {"PatchSize": 5, "Status": "Denied"}
Ask for Smaller Size¶
Asking for a smaller size will work! Let’s decrease the original CRD from 4 to 3:
- size: 4
+ size: 3
Apply the CRD again:
$ kubectl apply -f examples/scaling/basic/minicluster.yaml
The first thing you will notice is that a pod is terminating
kubectl get pods
NAME READY STATUS RESTARTS AGE
flux-sample-0-xd2gc 1/1 Running 0 30m
flux-sample-1-tbj7c 1/1 Running 0 30m
flux-sample-2-wbpf9 1/1 Running 0 30m
flux-sample-3-cfs6c 1/1 Terminating 0 26m
When the pod is gone, if (in your second terminal) you look at the resource status, Flux will now report this pod as down.
flux@flux-sample-0:/code$ flux resource list
STATE NNODES NCORES NODELIST
free 3 12 flux-sample-[0-2]
allocated 0 0
down 1 4 flux-sample-3
And importantly, the rest of the cluster keeps running smoothly! We haven’t interrupted the Flux broker or install by changing the size, at least superficially and not running any jobs on the pod that was terminated.
Ask for Larger Size¶
Finally, let’s scale back up! Restore the original CRD size to 4:
- size: 3
+ size: 4
Apply again:
$ kubectl apply -f examples/scaling/basic/minicluster.yaml
And time time you’ll see the container creating:
$ kubectl get -n flux-operator pods
NAME READY STATUS RESTARTS AGE
flux-sample-0-xd2gc 1/1 Running 0 35m
flux-sample-1-tbj7c 1/1 Running 0 35m
flux-sample-2-wbpf9 1/1 Running 0 35m
flux-sample-3-ll76s 0/1 ContainerCreating 0 1s
And when it’s running, Flux will notice it online again. Your full cluster is online again. And that’s it!
$ flux resource list
STATE NNODES NCORES NODELIST
free 4 16 flux-sample-[0-3]
allocated 0 0
down 0 0
We will have a tutorial for expanding a cluster size soon. Flux doesn’t allow the hosts to be greater than nodes currently, so we haven’t added this yet.
Expand Example¶
Starting with a small cluster that is able to grow to a maximum size
To run this example:
$ minikube start --kubernetes-version=1.27.
Install the operator, create the namespace, and create the MiniCluster:
$ kubectl apply -f ./examples/dist/flux-operator.yaml
$ kubectl apply -f examples/scaling/basic/minicluster.yaml
Create the cluster¶
First, apply the CRD to create the MiniCluster. Note that we are asking for a size of 2, but allowing for a maximum size of 4.
apiVersion: flux-framework.org/v1alpha2
kind: MiniCluster
metadata:
name: flux-sample
spec:
# Number of pods to create for MiniCluster to start
size: 2
# Number of pods to allow scaling to (the number that flux will see)
maxSize: 4
# This needs to be in interactive or launcher mode to work
# otherwise we submit as a job (and it will be running under the smaller size number of tasks)
interactive: true
# This is a list because a pod can support multiple containers
containers:
- image: rockylinux:9
$ kubectl apply -f ./examples/scaling/expand/minicluster.yaml
Since our initial size is 2, you’ll see two pods creating:
kubectl get -n flux-operator pods
NAME READY STATUS RESTARTS AGE
flux-sample-0-r2cxt 0/1 ContainerCreating 0 1s
flux-sample-1-bxwbw 0/1 ContainerCreating 0 1s
And then running!
kubectl get -n flux-operator pods
NAME READY STATUS RESTARTS AGE
flux-sample-0-77tnb 1/1 Running 0 102s
flux-sample-1-vfg7x 1/1 Running 0 102s
Theoretically, we can pretend that we brought up a small cluster, ran some smaller part of a workflow, and then needed to scale larger. Let’s pretend to do that now, and change the size up to 4 (the MaxSize):
- size: 2
+ size: 4
Apply again:
$ kubectl apply -f ./examples/scaling/expand/minicluster.yaml
You’ll see the pods create very quickly, and come online to connect to the broker:
NAME READY STATUS RESTARTS AGE
flux-sample-0-fq4kn 1/1 Running 0 34s
flux-sample-1-hx745 1/1 Running 0 34s
flux-sample-2-gt4hs 0/1 ContainerCreating 0 2s
flux-sample-3-ndqvm 1/1 Running 0 2s
We can again exec into the broker pod to inspect what resources Flux sees:
kubectl exec -it flux-sample-0-xd2gc -- bash
source /mnt/flux/flux-view.sh
flux proxy ${fluxsocket} bash
And just like they had been there all along, we have four nodes!
$ flux resource list
STATE NNODES NCORES NODELIST
free 4 16 flux-sample-[0-3]
allocated 0 0
down 0 0
You can then try scaling down, as we did before. The pods will terminate, and Flux will show the pods as down again. If you are curious, you can actually scale lower than the original minimum size. As an example, we can change the size to 1:
- size: 4
+ size: 1
Apply again:
$ kubectl apply -f ./examples/scaling/expand/minicluster.yaml
The broker is resilient and will actually keep running! I found this cool and surprising.
$ flux resource list
STATE NNODES NCORES NODELIST
free 1 4 flux-sample-0
allocated 0 0
down 3 12 flux-sample-[1-3]
Of course, the operator will not let you scale to a size 0, as that would be deleting the jobs. if you want to delete the MiniCluster, just do that. :)