Fusion with AWS EKS and S3 object storage
Fusion streamlines the deployment of Nextflow pipeline in a Kubernetes cluster, because it replaces the need to configure and maintain a shared file system in your cluster.
Kubernetes config
You will need to create a namespace and a service account in your Kubernetes cluster to run the job submitted by the pipeline execution.
The following manifest shows the bare minimum configuration.
---
apiVersion: v1
kind: Namespace
metadata:
    name: fusion-demo
---
apiVersion: v1
kind: ServiceAccount
metadata:
    namespace: fusion-demo
    name: fusion-sa
    annotations:
        eks.amazonaws.com/role-arn: "arn:aws:iam::<YOUR ACCOUNT ID>:role/fusion-demo-role"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
    namespace: fusion-demo
    name: fusion-role
rules:
    - apiGroups: [""]
      resources: ["pods", "pods/status", "pods/log", "pods/exec"]
      verbs: ["get", "list", "watch", "create", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
    namespace: fusion-demo
    name: fusion-rolebind
roleRef:
    apiGroup: rbac.authorization.k8s.io
    kind: Role
    name: fusion-role
subjects:
    - kind: ServiceAccount
      name: fusion-sa
The AWS IAM role should provide read-write permission to the S3 bucket used as the pipeline work directory. For example:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": ["s3:ListBucket"],
            "Resource": ["arn:aws:s3:::<YOUR-BUCKET>"]
        },
        {
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:PutObjectTagging",
                "s3:DeleteObject"
            ],
            "Resource": ["arn:aws:s3:::<YOUR-BUCKET>/*"],
            "Effect": "Allow"
        }
    ]
}
In the above policy replace <YOUR-BUCKET> with a bucket name of your choice.
Also, make sure that the role defines a trust relationship similar to this:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::<YOUR ACCOUNT ID>:oidc-provider/oidc.eks.<YOUR REGION>.amazonaws.com/id/<YOUR CLUSTER ID>"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringEquals": {
                    "oidc.eks.eu-west-2.amazonaws.com/id/<YOUR CLUSTER ID>:aud": "sts.amazonaws.com",
                    "oidc.eks.eu-west-2.amazonaws.com/id/<YOUR CLUSTER ID>:sub": "system:serviceaccount:fusion-demo:fusion-sa"
                }
            }
        }
    ]
}
Nextflow configuration
The minimal Nextflow configuration looks like the following:
wave.enabled = true
fusion.enabled = true
process.executor = 'k8s'
k8s.context = '<YOUR K8S CLUSTER CONTEXT>'
k8s.namespace = 'fusion-demo'
k8s.serviceAccount = 'fusion-sa'
In the above snippet replace YOUR K8S CLUSTER CONTEXT with Kubernetes context in your Kubernetes config, and save it
to a file named nextflow.config into the pipeline launching directory.
Then launch the pipeline execution with the usual run command:
nextflow run <YOUR PIPELINE SCRIPT> -w s3://<YOUR-BUCKET>/work
Replacing YOUR PIPELINE SCRIPT with the URI of your pipeline Git repository
and YOUR-BUCKET with a S3 bucket of your choice.
To achieve best performance make sure to setup a SSD volumes as temporary directory. See the section SSD storage for details.