Monitoring
Monitoring GPUs on LUMI-G with rocm-smi
To monitor GPUs on LUMI-G during the execution of your SLURM job, you can employ the rocm-smi
command. This can be done by using srun
with the --overlap
option, which allows you to execute commands on the nodes allocated to your running job. Detailed information about using --overlap
on LUMI-G is available here.
Steps to Monitor GPUs
Identify the Allocated Nodes: First, find out which nodes are allocated to your job by using the following command, replacing
<jobid>
with the ID of your SLURM job:sacct --noheader -X -P -oNodeList --jobs=<jobid>
Execute
rocm-smi
: Once you have the node names (e.g., nid00XXXX), executerocm-smi
to monitor the GPU usage:srun --overlap --pty --jobid=<jobid> -w <node_name> rocm-smi --showuse # replace with your desired option
Replace
<node_name>
with the actual node identifier.
Adding GPU Monitoring to a Job Script on LUMI-G
Monitoring GPU usage on the LUMI-G cluster can provide you with valuable insights into the performance and efficiency of your GPU-accelerated applications. By integrating ROCm-SMI (Radeon Open Compute System Management Interface) into your SLURM job script, you can collect GPU utilization statistics throughout the runtime of your job. Follow these instructions to modify your existing job script to include GPU monitoring.
Basic Job Script Without GPU Monitoring
Below is a typical example of a job script (basic_job_script.sh
) without GPU monitoring:
1#!/bin/bash
2#SBATCH --account=project_X_____X
3#SBATCH --time=00:06:00
4#SBATCH --partition=standard-g # ROCm-SMI will not work with partial allocations for dev-g
5#SBATCH -o %x-%j.out
6
7# Load necessary modules
8
9srun your_program
For more information on job scripts on LUMI-G, visit LUMI documentation.
Script for Expanding Node Ranges
To monitor specific GPUs, we must first resolve the node range into individual node names. The following script, named expand_nodes.sh
, will be used in the job script to accomplish this:
1#!/bin/bash
2
3# Function to expand a node range (e.g., nid[005252-005254]) into a list of nodes
4expand_node_range() {
5 local node_range=$1
6 # Check if the input contains a range denoted by square brackets
7 if [[ "$node_range" == *"["* ]]; then
8 # Extract the part before "[" as a prefix (e.g., "nid")
9 local prefix=${node_range%%[*]}
10 # Extract the numerical range
11 local range_numbers=${node_range#*[}
12 # Remove trailing "]"
13 range_numbers=${range_numbers%]*}
14
15 # Use '-' as the delimiter to split the range into start and end numbers
16 local IFS='-'
17 read -r start end <<< "$range_numbers"
18
19 # Calculate the width needed for zero padding based on the start number
20 local width=${#start}
21 # Generate the sequence of node names with proper zero padding
22 for (( i=10#$start; i <= 10#$end; i++ )); do
23 # Output each node name with the prefix and padded sequence number
24 echo $(printf "${prefix}%0${width}d" $i)
25 done
26 else
27 # Output the node name as is if it's not a range
28 echo "$node_range"
29 fi
30}
31
32# Check if a node range argument is provided
33if [ $# -eq 1 ]; then
34 # Call the function with the given node range
35 expand_node_range "$1"
36else
37 # Display usage information if no arguments are given
38 echo "Usage: $0 <node_range>"
39 exit 1
40fi
Key elements of the expand_nodes.sh script:
The
expand_node_range
function (line 4) takes a string representing a range of nodes and expands it to individual nodes.Checks for the presence of “[” to determine if it’s a range (line 7).
Extracts the prefix and range numbers (lines 9-11).
Uses a for loop (lines 22-25) to iterate through the range and generate node names with proper zero padding.
Be sure to make the script executable before attempting to use it in your job script:
chmod +x expand_nodes.sh
Modified Job Script with GPU Monitoring
The following job script, monitored_job_script.sh
, has been enhanced to include GPU monitoring capabilities. The GPU monitoring is encapsulated within a function and is designed to run concurrently with the main job.
1#!/bin/bash
2# Insert the original SBATCH directives here
3# ...
4#SBATCH -o monitored_job_script-%j.out
5
6# Load necessary modules
7# ...
8
9# Define the GPU monitoring function
10gpu_monitoring() {
11 local node_name=$(hostname)
12 local monitoring_file="gpu_monitoring_${SLURM_JOBID}_node_${node_name}.csv"
13
14 echo "Monitoring GPUs on $node_name"
15 rocm-smi --showuse --csv | head -n 1 > "$monitoring_file"
16
17 while squeue -j ${SLURM_JOBID} &>/dev/null; do
18 rocm-smi --csv --showuse --showmemuse | sed '1d;/^$/d' >> "$monitoring_file"
19 sleep 30 # Change this value to adjust the monitoring frequency
20 done
21}
22
23export -f gpu_monitoring
24
25nodes_compressed="$(sacct --noheader -X -P -o NodeList --jobs=${SLURM_JOBID})"
26nodes="$(./expand_nodes.sh $nodes_compressed)"
27for node in $nodes; do
28 srun --overlap --jobid="${SLURM_JOBID}" -w "$node" bash -c 'gpu_monitoring' &
29done
30
31# Run the main job task
32srun your_program
1# Define the GPU monitoring function
2gpu_monitoring() {
3 local node_name=$(hostname)
4 local monitoring_file="gpu_monitoring_${SLURM_JOBID}_node_${node_name}.csv"
5
6 echo "Monitoring GPUs on $node_name"
7 rocm-smi --showuse --csv | head -n 1 > "$monitoring_file"
8
9 while squeue -j ${SLURM_JOBID} &>/dev/null; do
10 rocm-smi --csv --showuse --showmemuse | sed '1d;/^$/d' >> "$monitoring_file"
11 sleep 30 # Change this value to adjust the monitoring frequency
12 done
13}
14export -f gpu_monitoring
15srun --overlap --jobid="${SLURM_JOBID}" -w "$node" bash -c 'gpu_monitoring' &
16
17# Run the main job task
18srun your_program
Key elements of the monitored_job_script.sh
script:
We define a
gpu_monitoring
function (line 10) to capture GPU usage data.The
--csv
flag in therocm-smi
command (line 18) is used to format the output as comma-separated values, making it easier to parse and analyze later.The loop on lines 17-20 ensures that GPU data is captured at regular intervals until the job completes.
The function is exported (line 23) so that it can be called across different nodes within the job.
The
nodes
variable (line 26) holds the expanded list of node names.The monitoring is initiated on each node using
srun
(lines 28).
Note on ROCm-SMI flags:
The
--showuse
and--showmemuse
flags included withrocm-smi
show GPU utilization and memory usage, respectively. These flags can be substituted or extended with other flags that are relevant to the specific monitoring requirements of your job. Using the--csv
format ensures that the output is easily readable and can be processed with standard data analysis tools after the job has concluded.
Submitting the Modified Job Script
To submit the job script with GPU monitoring enabled, use the following SLURM command:
sbatch monitored_job_script.sh
Reviewing the Monitoring Data
Upon completion of your job, you can review the collected GPU usage and performance data. For each job, you will find a consolidated CSV file with a naming pattern of gpu_monitoring_<jobid>_node_<nodename>.csv
. This file contains time-stamped metrics that will allow you to assess the GPU usage over the duration of the job.
Analyze the CSV data files using your preferred data processing tool to gain insights into the GPU resource utilization and identify potential bottlenecks or inefficiencies in your application’s performance.
Note to Users: The provided scripts for GPU monitoring serve as an adaptable framework. Depending on the specific requirements of your computation workload, you may need to modify the scripts to fit your needs. Adjustments may include changing the frequency of data capture, modifying the captured metrics, or altering how the node expansion is handled. Use the scripts as a starting point and tailor them to surmount the individual challenges associated with monitoring in a HPC environment like LUMI-G.