Introduction to using Conda on NRIS resources

Instructor note

Total: 45min (Teaching:45Min | Discussion:0Min | Breaks:0Min | Exercises:0Min)

Objectives

  • Objectives

  • Keypoints

    • Use the Anaconda/miniconda modules to get conda

    • Do not modify your $HOME/.bashrc file to always load conda

    • Avoid using conda init

    • Do not use home directory as conda cache

    • Use prefix to specify where to install environments

Software on NRIS

Discussion

How do you access software on NRIS managed HPC systems (Pause for 5min for learners to respond)

Answer here

Software modules

On NRIS managed HPC systems the recommended way to access software is using the installed modules. The modules installed are optimized for the infrastructure and has been through some basic tests, to make sure they work. In addition the centrally installed modules are used by other users and the EasyBuild system that is used to installed them are tested by other HPC communities. Which means if you use the module system to load the software you need, the support staff would be able to help you better.

When to use CONDA

  • Developing software and need to keep track of specific version of libraries

  • When there is no module for the software(version) you are looking for and there is no EasyBuild recipe to install it

  • Found some instructions on the internet or from a colleague, which shows how to setup a software or a pipeline with conda. You just want to try it out without spending too much effort.

  • Need a specific version of a software, that is easiest to get through conda

  • More ideas - please write them in the shared document

How get CONDA on NRIS managed systems

  1. Use the Anaconda or Miniconda module to get the conda base. i.e. do not install this your self

# To load Miniconda

# Clear any loaded modules
# It important not to mix conda with other modules
# to imporve reproducibility and make trouble shooting
# easier
user@SAGA ~]$ module purge

# Load the module. Version shown below may change 
# Depending on the system you are in and 
# with time, as we upgrade modules
# use  'module avail miniconda' to 
# check which module are available
user@SAGA ~]$ module load  Miniconda3/22.11.1-1

# Activate the base environment. Without this step
# you would not be able to use the environments created
user@SAGA ~]$ source $EBROOTMINICONDA3/bin/activate

(22.11.1-1) user@SAGA ~]$ 

How to configure conda

We have limits for size and maximum number of files allowed on the home directories. One reason this limits get exceeded, without users explicitly placing files is conda. So if you use conda and the command dusage report as you are using more than you are allowed, first thing to check is the size of the .conda directory

user@SAGA ~]$ du -hs $HOME/.conda
1,4G	/cluster/home/user/.conda

# Above shows that conda is using 1.4Gb in the home directory.

To avoid the home directory getting filled we can use following directives.

  • Point package cache to a temporary location not counted against the quota. We would recommend to use the folder /cluster/work/users/$USER/conda.

  • Point installation directory to the project area. This way there is an additional advantage that, other project members can also use the installation. e.g. /cluster/projects/nn9999k/conda

# Create a directory 
user@SAGA ~]$ mkdir  /cluster/work/users/$USER/conda/cache

# Create a file called `$HOME/.condarc` and include the location of the new cache
# and whare to place the environmnets. 
# See the content of an example of this below. Please note the example show
# a project that you probabaly do not have access. use the command `projects`
# To find out the values you could use

user@SAGA ~]$ cat $HOME/.condarc 

pkgs_dirs:
  - /cluster/work/users/$USER/conda/cache

envs_dirs:
  - /cluster/projects/nn9999k/conda

# After this change, it is important to logout and login again.

How to create an environment

# Load and activate the Conda module, if not already done so
user@SAGA ~]$ module load  Miniconda3/22.11.1-1
user@SAGA ~]$ source $EBROOTMINICONDA3/bin/activate

# Create the environment and point to the location you used for
# `envs_dirs` in your `$HOME/.condarc` (see above for details)

(base) user@SAGA ~]$ conda create python=3.10 --prefix=/cluster/projects/nn9999k/conda/python310

How to use package details file to create an environment

When more than one package is needed to be installed in an environment, the recommended practice is to create a file with a list of packages first. This way it is easier to remember what was installed and also easier to share the details as a requirement for someone else to reproduce the environment.

# Create a plain text file with the pakage list 

user@SAGA ~]$: cat python310-from-file.yml 

name: python310-from-file
channels:
  - defaults
dependencies:
  - python=3.10
  - numpy
  - pandas
  - scipy

# Load and activate the Conda module, if not already done so
user@SAGA ~]$ module load  Miniconda3/22.11.1-1
user@SAGA ~]$ source $EBROOTMINICONDA3/bin/activate

# Create the environment with a prefix
# Please note the `(base)` in the begening to indicate the conda is active and the use of --prefix
(base) (user@SAGA ): conda env create --prefix /cluster/projects/nn9999k/conda/python310-from-file  --file python310-from-file.yml
 

How to use a environment created with conda

# Load and activate the Conda module, if not already done so
user@SAGA ~]$ module load  Miniconda3/22.11.1-1
user@SAGA ~]$ source $EBROOTMINICONDA3/bin/activate

# Find the environment name
(base) user@SAGA ~]$ : conda env list
# conda environments:
#
python310                /cluster/projects/nn9999k/conda/python310
python310-from-file      /cluster/projects/nn9999k/conda/python310-from-file
base                  *  /cluster/software/Miniconda3/23.5.2-0

# To activate the environment called "python310"
conda activate /cluster/projects/nn9999k/conda/python310

# After using the enviroment it is important to deactivate it before doing other work
# that does not use this environment. You could just logout and login again instead as 
# well. When using conda inside the job, it is not needed to do this step, unless you
# need to use more than one environment.

(/cluster/projects/nn9999k/conda/python310) user@SAGA ~]$ conda deactivate
(base) user@SAGA ~]$ 
(base) user@SAGA ~]$  conda deactivate
user@SAGA ~]$ module purge
The following modules were not unloaded:
  (Use "module --force purge" to unload all):

  1) StdEnv
user@SAGA ~]$  

Somethings to avoid

  • Don’t use base-environment installed packages or the python. i.e. the packages and Python you get when the module is loaded. This packages and the version of Python would change if the sys admin change the module.

  • Do not add conda artifacts to the $HOME/.bashrc file

# Lets say you did not follow the exact steps and forgot to activate the conda module.
# You will get a error and conda will suggest to use 'conda init'
user@SAGA ~]$ : module load Miniconda3/23.5.2-0
user@SAGA ~]$ : conda activate /cluster/projects/nn9999k/conda/python310

CondaError: Run 'conda init' before 'conda activate'

PLEASE DO NOT DO THIS For more details about this, please read

Search Anaconda site

Example

How to use conda in a job

When you need an environment in a job, first thing you need is to create the environment on a login node. In other words, you should not try to create environments in a job. You may use created environments inside a job. When doing this you should follow the module activation and environment activation steps as above. See the example below.

#!/usr/bin/env bash


#SBATCH --account=<Your account>
#SBATCH --job-name=test_conda
#SBATCH --qos=devel
#SBATCH --ntasks=1
#SBATCH --time=02:00


# the actual module version might be different

module purge
module load Miniconda3/23.5.2-0

source $EBROOTMINICONDA3/bin/activate

# As an example, to activate the environment we activated
# interactively above

conda activate /cluster/projects/nn9999k/conda/python310

python --version

python myanalysis.py