# Transferring files ```{instructor-note} Total: 30min (Teaching&Demo:30Min | Discussion:0min | Breaks:0min | Exercises:10Min) ``` ```{objectives} - Questions - How do I upload/download files to the cluster? - Objectives - Be able to transfer files to and from a compute-cluster. - Keypoints - `wget` downloads a file from the internet. - `rsync`/`scp` transfer files to and from your computer. - You can use Visual Studio Code to transfer files with drag and drop. ``` Computing with a remote computer offers very limited use if we cannot get files to and from the cluster. There are several options for transferring data between computing resources, from command line options to GUI programs, which we will cover here. ## Downloading from the internet ### Download files from the internet using wget One of the most straightforward ways to download files is to use `wget`. Any file that can be downloaded in your web browser with an accessible link can be downloaded using `wget`. This is a quick way to download datasets or source code. The syntax is: `wget https://some/link/to/a/file.tar.gz`. For example, download the lesson sample files using the following command: ```console # To find the value of refer to the downloads section of the tutorial [MY_USER_NAME@CLUSTER_NAME ~]$ wget ``` ### Downloading GitHub repositories Sometimes the data, pipeline or software you need is stored in a repository on GitHub or GitLab. In this case you either download individual ("raw") files using wget or the whole repository with `git clone`. We can download for example this [test repository](https://github.com/test/HelloWorld), with: ```console [MY_USER_NAME@CLUSTER_NAME ~]$ git clone https://github.com/test/HelloWorld.git ``` It will be saved into the current directory with the new folder having the name of the repository, so `HelloWorld`` in this case. ## Transferring files ### Transferring single files and folders To move files between your computer and the clusters you can use either `scp`or `rsync` or other tools. For best practices, we recommend using `rsync`. This utility allow you to transfer files in an easy and secure manner. On a Windows system, you need to use `rsync` through [Windows Subsystem for Linux (WSL)](https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux). We provide examples for both `rsync` and `scp`. #### To upload to the cluster: `````{tabs} ````{tab} scp To transfer a single file from your local machine to a cluster using `scp`, run the following command: ```console [user@laptop ~]$ scp /path/to/local/file username@saga.sigma2.no:/path/to/remote/directory ``` The example is for the **saga** cluster. Replace `/path/to/local/file` with the path to the file on your local machine, `username` with your username on the cluster, and `/path/to/remote/directory` with the path to the remote directory where you want to store the file. An example: ```console [user@laptop ~]$ echo $(date) > from_laptop.txt [user@laptop ~]$ scp from_laptop.txt username@saga.sigma2.no: # Login to saga and check the file in the HOME folder ``` ```` ````{tab} rsync To transfer a single file from your local machine to a cluster using `rsync`, run the following command: ```{literalinclude} snippets/saga/15-rsync.txt :language: bash ``` The example is for the **saga** cluster. Replace `/path/to/local/file` with the path to the file on your local machine, `username` with your username on the cluster, and `/path/to/remote/directory` with the path to the remote directory where you want to store the file. ``` An example: ```{literalinclude} snippets/saga/15-upload.txt :language: bash ``` ```` ````` #### To download from the cluster: `````{tabs} ````{tab} scp ```console # Create a file on SAGA [username@login-5.SAGA ~]$ echo $(hostname -f) > from_cluster.txt [username@login-5.SAGA ~]$ echo $(date) >> from_cluster.txt ``` ```console # From the laptop download it [user@laptop ~]$ scp username@saga.sigma2.no:from_cluster.txt . Password: [user@laptop ~]$ cat from_cluster.txt login-5.saga ma. 14. mars 19:14:53 +0100 2023 ``` The output from `cat` will vary depending on which login node you were in when you created the file. You can transfer multiple files or directories from your local machine to the cluster and vice versa with `scp`. You can use `-r` option to copy files recursively. This assumes that there's a single directory containing all of the files you want to transfer (and nothing else). For example, to transfer a directory from your local machine to the cluster ```console [user@laptop ~]$ scp -r /path/to/local/directory1 username@.saga.sigma2.no:/path/to/remote/directory ``` Or you can use the wild card `*` to transfer multiple files For example to transfer multiple files or directories from a cluster to your local machine, use this command: ```console [user@laptop ~]$ scp username@saga.sigma2.no:/path/to/remote/directory1/* /path/to/local/directory2 ``` This will copy all the files under `directory1` on the cluster to your laptop under `directory2`. Note that the `directory1` itself will not transfer, only the content. ```` ````{tab} rsync ```{literalinclude} snippets/saga/15-download.txt :language: bash ``` To transfer multiple files or directories from your local machine to the cluster, use the following command: ```console [user@laptop ~]$ rsync -avz /path/to/local/directory1 /path/to/local/file2 username@saga.sigma2.no:/path/to/remote/directory ``` To transfer multiple files or directories from a cluster to your local machine, use this command: ```console rsync -avz username@saga.sigma2.no:/path/to/remote/directory1 /path/to/local/directory ``` A trailing slash on the target directory is optional, and has no effect, but it can be important in other commands. Adding a trailing slash on an source directory would make the command copy only the content of the folder, not the folder itself. ```` ````` #### Transferring Large Amounts of Data with `rsync` When transferring a large amount of data, it's recommended to use the `--partial` and `--progress` options. The `--partial` option ensures that partially transferred files are kept, allowing you to resume the transfer in case of interruption. The `--progress` option displays the progress of the transfer. The `-P` option combines these flags into one. To transfer a large amount of data from your local machine to the `fram` cluster, use the following command: ``` rsync -avzP /path/to/local/directory username@fram.sigma2.no:/path/to/remote/directory ``` To transfer a large amount of data from the `fram` cluster to your local machine, use this command: ``` rsync -avzP username@fram.sigma2.no:/path/to/remote/directory /path/to/local/directory ``` ### Transferring files using a graphical user interface While command line tools like `rsync` are effcient for transferring files between your computer and the cluster where you want to do your work, it can be quite intimidating and overwhelming for beginners and inexperienced users. A nice built in feature of the tool [*Visual Studio Code*] (https://code.visualstudio.com), is the ability to act like a GUI in terms of moving files with drag- and drop-functionality. **This is how to use Visual Studio Code as a GUI for file transfer:** Log in with using ssh in Visual Studio Code as described here: [Connecting to a system with Visual Studio Code](https://documentation.sigma2.no/code_development/guides/vs_code/connect_to_server.html?highlight=visual%20studio%20code). Follow the instructions fully. Then open a local folder, for instance your **Documents** folder. Drag a file or folder to the left side column in your **VS Code** window if you want to move files or folders to the remote server. If you want to copy files back to your client machine, either `right-click` or on Mac `ctrl+left-click` on the folder or file you want to download, then choose `Download` from the dropdown menu. You can also use **VS Code** to read the content of files, edit files to some extent, delete files and make new folders and files like you are used to on your local client using GUI tools. ## Working with files generated in different environments When you transfer files between different environments, please note that opening/executing a file that is made/edited in a different environment than where you plan to use it may be challenging. A well known issue is that files transferred from a Windows environment to a Unix system environment (Mac, Linux, BSD, Solaris, etc.) can cause problems. On a Unix system, every line in a file ends with a `\n` (newline). On Windows, every line in a file ends with a `\r\n` (carriage return + newline). Though most modern programming languages and software handles this correctly, in some rare instances, you may run into an issue. You can identify if a file has Windows line endings with `cat -A filename`. A file with Windows line endings will have `^M$` at the end of every line. A file with Unix line endings will have `$` at the end of a line. The solution is to either edit the file manually in the Unix system environment, or to convert a file from Windows to Unix encoding by running `dos2unix filename`: ```console [MY_USER_NAME@CLUSTER_NAME ~]$ dos2unix File-created-on-windows.txt ``` (Conversely, to convert back to Windows format, you can run `unix2dos filename`) ## Information that might be usefull
About Ports All file transfers using the above methods use encrypted communication over port 22. This is the same connection method used by SSH. In fact, all file transfers using these methods occur through an SSH connection. If you can connect via SSH over the normal port, you will be able to transfer files.
About Backup The files in your home (`/cluster/home`) and project folder (`/cluster/projects`) are regularly backed up to either NIRD or one of the other clusters, as described in the [documentation](https://documentation.sigma2.no/files_storage/backup.html). So if you ever accidentally delete or overwrite a file in one of those folder, you may be able to get your data back.
About File Transfer from NIRD The NIRD project storage areas, namely NIRD Data Peak (TS) and NIRD Data Lake (DL) are mounted on the login nodes of Betzy, Fram, and Saga. One can directly access the NIRD project area from the login nodes of the mentioned compute clusters with `cp` command. More details [here](https://documentation.sigma2.no/files_storage/nird/mounts_lmd.html).