How to parallelize independent tasks

Sometimes you need to run many similar calculations. Instead of running them one after the other and managing many job scripts or writing elaborate shell scripts, we will learn how to run them potentially at the same time in parallel, using a workflow manager without writing many job scripts.

We will demonstrate and practice this using Snakemake. If your calculation is a for-loop in Python or R or a shell script and it takes too long to run, this session is for you.

This course is not about how to parallelize using MPI or OpenMP, or multi-threading, or similar.

Preparations

  • Access to a Linux cluster with Slurm scheduler (but also OK to just read or watch along)

  • Little bit of familiarity with running calculations on a cluster

  • Snakemake is available on the cluster (for one of the episodes)

How to get the example files to practice with

If you want to type-along with the instructors or try this later on your own following our steps, you can download and extract the example like this:

cd
wget https://gitlab.sigma2.no/training/tutorials/independent-jobs-in-parallel/-/raw/main/content/exercise-files/exercise-workflow.tar.gz
tar xf exercise-workflow.tar.gz
cd exercise-workflow
code/download-binary.sh
  • If you only want to browse the files without downloading them, you can find them here.

  • The example contains a set of Slurm scripts. You might need to adapt the #SBATCH lines in the scripts to match your cluster’s configuration.

Episodes

Exercises:

Front page NRIS training

How to use this material

We highly recommend that you try to work through all the steps either during the course or in your own time.

Having tried all the steps hands-on will make it easier to apply the approach to your own computational work.