Whether you choose a LAMMPS or Snakemake tutorial, the setup for Google cloud is the same!
For all tutorials you’ll need to install
gcloud, and for storage tutorials you will need
to prepare some data in Google Storage.
You should first install gcloud and ensure you are logged in and have kubectl installed:
$ gcloud auth login
Depending on your install, you can either install kubectl with gcloud:
$ gcloud components install kubectl
or just on your own. I already had it installed so I was good to go.
This step is only required if you are doing a storage tutorial.
To start, prepare your data in a temporary directory (that we will upload into Google cloud storage):
$ git clone --depth 1 https://github.com/snakemake/snakemake-tutorial-data /tmp/workflow
You’ll want to add the Snakefile for your workflow along with a plotting script:
$ wget -O /tmp/workflow/Snakefile https://raw.githubusercontent.com/rse-ops/flux-hpc/main/snakemake/atacseq/Snakefile $ mkdir -p /tmp/workflow/scripts $ wget -O /tmp/workflow/scripts/plot-quals.py https://raw.githubusercontent.com/rse-ops/flux-hpc/main/snakemake/atacseq/scripts/plot-quals.py
You should have this structure:
$ tree /tmp/workflow
/tmp/workflow/ ├── data │ ├── genome.fa │ ├── genome.fa.amb │ ├── genome.fa.ann │ ├── genome.fa.bwt │ ├── genome.fa.fai │ ├── genome.fa.pac │ ├── genome.fa.sa │ └── samples │ ├── A.fastq │ ├── B.fastq │ └── C.fastq ├── Dockerfile ├── environment.yaml ├── README.md ├── scripts │ └── plot-quals.py └── Snakefile
We can then use Google Cloud (
gcloud) to create a bucket and upload to it.
$ gcloud storage buckets create gs://flux-operator-storage --project=dinodev --location=US-CENTRAL1 --uniform-bucket-level-access
In the above, the storage class defaults to “Standard” Once we’ve created the bucket, let’s go to our snakemake data and upload the data to it.
$ cd /tmp/workflow $ gcloud storage cp --recursive . gs://flux-operator-storage/snakemake-workflow/
You can either view your files in the Google Storage console
or view with gcloud again:
$ gcloud storage ls gs://flux-operator-storage/snakemake-workflow/
gs://flux-operator-storage/snakemake-workflow/.gitpod.yml gs://flux-operator-storage/snakemake-workflow/Dockerfile gs://flux-operator-storage/snakemake-workflow/README.md gs://flux-operator-storage/snakemake-workflow/Snakefile gs://flux-operator-storage/snakemake-workflow/environment.yaml gs://flux-operator-storage/snakemake-workflow/data/ gs://flux-operator-storage/snakemake-workflow/scripts/
Permissions via Secrets¶
For storage tutorials, we will need to give permission for the nodes to access storage, and we can do that via these instructions to create a service account key (a json file) from a service account. E.g., I first created a custom service account that has these permissions:
Note that for the fusion tutorial I also added Kubernetes admin to that. And then I could find the new identifier in a listing:
$ gcloud iam service-accounts list DISPLAY NAME EMAIL DISABLED Compute Engine default service account 270958151865[email protected] False flux-operator-gke [email protected] False
And create a credential file:
$ gcloud iam service-accounts keys create <FILE_NAME>.json --iam-account <EMAIL>
And create a secret from it! This is basically giving your cluster permission to interact with a specific bucket.
$ kubectl create secret generic csi-gcs-secret --from-literal=bucket=flux-operator-storage --from-file=key=<PATH_TO_SERVICE_ACCOUNT_KEY>
If you ever need to list your accounts again (e.g., you’ll need the email in the Fusion tutorial):
$ gcloud iam service-accounts list
Next you can proceed to choose one of the other tutorials mentioned here.