Transferring files

Instructor note

Total: 45min (Teaching:30Min | Discussion:0min | Breaks:0min | Exercises:10Min)

Objectives

  • Questions

    • How do I upload/download files to the cluster?

  • Objectives

    • Be able to transfer files to and from a compute-cluster.

  • Keypoints

    • wget downloads a file from the internet.

    • rsync transfer files to and from your computer.

    • You can use Visual Studio Code to transfer files with drag and drop.

Computing with a remote computer offers very limited use if we cannot get files to or from the cluster. There are several options for transferring data between computing resources, from command line options to GUI programs, which we will cover here.

Downloading from the internet

Download files from the internet using wget

One of the most straightforward ways to download files is to use wget. Any file that can be downloaded in your web browser with an accessible link can be downloaded using wget. This is a quick way to download datasets or source code.

The syntax is: wget https://some/link/to/a/file.tar.gz. For example, download the lesson sample files using the following command:

# To find the value of <URL> refer to the downloads section of tutorial

[MY_USER_NAME@CLUSTER_NAME ~]$ wget <URL>

Downloading GitHub repositories

Sometimes the data, pipeline or software you need is stored in a repository on GitHub or GitLab. In this case you either download individual (“raw”) files using wget or the whole repository with git clone.

We can download for example this test repository, with:

[MY_USER_NAME@CLUSTER_NAME ~]$ git clone https://github.com/test/HelloWorld.git

It will be saved into the current directory with the new folder having the name of the repository, so “HelloWorld” in this case.

Transferring files

Transferring single files and folders with rsync

To move files between your computer and the clusters we recommend using rsync. This utility allow you to transfer files in an easy and secure manner.

To transfer a single file from your local machine to a cluster, run the following command:

[user@laptop ~]$ rsync -avz /path/to/local/file username@login-1.fram.sigma2.no:/path/to/remote/directory

Replace /path/to/local/file with the path to the file on your local machine, username with your username on the cluster, and /path/to/remote/directory with the path to the remote directory where you want to store the file.

An example:

[user@laptop ~]$ echo $(date) > from_laptop.txt
[user@laptop ~]$ rsync -avz from_laptop.txt username@fram.sigma2.no:

# Login to fram and check the file in the HOME folder

To download from the cluster:

# Create a file on fram
[MY_USER_NAME@login-1.fram~]$ echo $(hostname) > from_cluster.txt
[MY_USER_NAME@login-1.fram~]$ echo $(date) >> from_cluster.txt

# From the laptop download it
[user@laptop ~]$ rsync -avz username@login-1.fram.sigma2.no:from_cluster .
Password: 


[user@laptop ~]$ cat from_cluster.txt
login-1.fram
ma. 14. mars 19:14:53 +0100 2023

To transfer multiple files or directories from your local machine to the fram cluster, use the following command:

[user@laptop ~]$ rsync -avz /path/to/local/directory1 /path/to/local/file2 username@cluster:/path/to/remote/directory

To transfer multiple files or directories from a cluster to your local machine, use this command:

rsync -avz username@login-1.fram.sigma2.no:/path/to/remote/directory1 /path/to/local/directory

A trailing slash on the target directory is optional, and has no effect, but it can be important in other commands.

Adding a trailing slash on an source directory would make the command copy only the content of the folder, not the folder itself.

Transferring Large Amounts of Data

When transferring a large amount of data, it’s recommended to use the --partial and --progress options. The --partial option ensures that partially transferred files are kept, allowing you to resume the transfer in case of interruption. The --progress option displays the progress of the transfer. The -P option combines these flags into one.

To transfer a large amount of data from your local machine to the fram cluster, use the following command:

rsync -avzP /path/to/local/directory username@login-1.fram.sigma2.no:/path/to/remote/directory

To transfer a large amount of data from the fram cluster to your local machine, use this command:

rsync -avzP username@login-1.fram.sigma2.no:/path/to/remote/directory /path/to/local/directory

Transferring files using a graphical user interface

While command line tools like rsync are effcient for transferring files between your computer and the cluster where you want to do your work, it can be quite intimidating and overwhelming for beginners and inexperienced users. A nice built in feature of the tool [Visual Studio Code] (https://code.visualstudio.com), is the ability to act like a GUI in terms of move files with drag- and drop-functionality.

This is how to use Visual Studio Code as a GUI for file transfer:

Log in with using ssh in Visual Studio Code as described here: Connecting to a system with Visual Studio Code. Follow the instructions fully.

Then open a local folder, for instance your Documents folder. Drag a file or folder to the left side column in your VS Code window if you want to move files or folders to the remote server. If you want to copy files back to your client machine, either right-click or on Mac ctrl+left-click on the folder or file you want to download, then choose Download from the dropdown menu.

You can also use VS Code to read the content of files, edit files to some extent, delete files and make new folders and files like you are used to on your local client using GUI tools.

Working with files generated in different environments

When you transfer files between different environments, please note that opening/executing a file that is made/edited in a different environment than where you plan to use it may be challenging. A well known issue is that files transferred from a Windows environment to a Unix system environment (Mac, Linux, BSD, Solaris, etc.) can cause problems. On a Unix system, every line in a file ends with a \n (newline). On Windows, every line in a file ends with a \r\n (carriage return + newline).

Though most modern programming languages and software handles this correctly, in some rare instances, you may run into an issue. You can identify if a file has Windows line endings with cat -A filename. A file with Windows line endings will have ^M$ at the end of every line. A file with Unix line endings will have $ at the end of a line.

The solution is to either edit the file manually in the Unix system environment, or to convert a file from Windows to Unix encoding by running dos2unix filename:

 [MY_USER_NAME@CLUSTER_NAME ~]$ dos2unix File-created-on-windows.txt

(Conversely, to convert back to Windows format, you can run unix2dos filename)

Information that might be usefull

About Ports

All file transfers using the above methods use encrypted communication over port 22. This is the same connection method used by SSH. In fact, all file transfers using these methods occur through an SSH connection. If you can connect via SSH over the normal port, you will be able to transfer files.

About Backup

The files in your home (/cluster/home) and project folder (/cluster/projects) are regularly backed up to either NIRD or one of the other clusters, as described in the documentation. So if you ever accidentally delete or overwrite a file in one of those folder, you may be able to get your data back.