Data Transfer
How to move files to and from HPC and deal with compressed files
We often need to move files to and from the HPC. This section will cover the movement of files around computers both remote and local.
Since we are moving more toward open science and data sharing, you might find a data set or scripts posted online. The first thing we will be doing is getting a file from the web. This will be accomplished with the wget command in the command line. wget derives it’s name from world wide web and get, which makes sense considering what we are using it for.
We will need the web addresses for the data files that we are going to use. Two data sets are provided in two different formats so you can practice extracting compressed files.
If you have not done so already, open a terminal and connect to the HPC and navigate to the folder/directory where you want this file to go. Use the pwd command to verify your location.
Once there type in the command line wget
and paste the address we copied from the web: usdrcg.github.io/workshops/crimes.zip
You may have trouble pasting, right click and select paste in MobaXterm, DO NOT USE ctrl+V After hitting enter you should see the file download. Verify it is there using
ls
.
Now that we have some files to play with we are going to move them around. To make the next step easier, while in the directory where we just downloaded these files type the Print Working Directory Command like this:
Copy the output path. As before the cp
command is for copy. Similarly we have scp
for secure copy. This is used for moving files between computers over the internet. We will be using it to transfer these files between your personal computer to the HPC. To get files from your local computer, you will need to open a local terminal session on your computer (Do not connect to the HPC!, in this example, kvasir7@sc2-139 is my local PC).
We will repeat the wget command steps for the election polls file on the Welcome Wagon page for your local PC in the terminal.
Once you have this file on your local PC, you can move it (any file you want) from your local PC to the HPC. The syntax of the scp
command has four parts: 1. scp
command, 2. The file you are moving, 3. The path to the file you want to copy, and 4. The target local directory where this file will be downloaded to.
Type the scp
command as follows (paste in path that was copied from HPC):
The last line tells you stats on the transfer. You can also use scp for other data transfer applications including HPC to local PC, between HPCs, and between PCs. In a separate terminal where I am logged into the HPC, I can check that the file was transferred and put into demo_files.
Now as most things go we showed you the harder way first for transferring files.
Drag-and-Drop Option
If you like moving files using the drag-and-drop method, that is what we are going to do next. Note: using scp
on the command line can be faster for some files (including large files). This next file moving method uses sftp, or SSH (or Secure) File Transfer Protocol. For this we will open up MobaXterm or FileZilla. In MobaXterm, the file explorer/hierarchy is visible just to the left of the command line interface.
From that portion of the MobaXterm window, you can drag and drop files between either computer as you desire. Easy!
Another option that you will need to use if you are going to be transferring large volumes of data between locations is a piece of software called Globus. This is required for moving files around 1 TB or larger.
If you look here at their homepage, Globus.org, you will see they have transferred around 350 Petabytes of data (as of Jan. 11, 2018). Just for reference, the all of the information in the library of congress converted to text files is one fourth of a petabyte. Globus will not be covered here today, but if you are interested or have questions feel free to contact us.
Compressed Files
The last thing to mention in this section is the decompression of files. As you might have noticed already, the files we got using wget
are in the form of .zip and .tar.gz. (tar.gz is similar to .zip).
Compressed files are a great way to send large files via email, and when you have a directory with directories in it that you want to share as one file. We will be doing this on the HPC.
For .zip files simply move them to the directory where you wish to have the files, make sure you are in that directory, and use the command unzip. This will extract the files from the .zip.
For the other commonly encountered compressed file type, we have the tar file. This can be extracted using the tar -xzvf
command:
Note that
-xzvf
has the following meaning: -x, extract; -z gunzip; -v, verbose; -f use a file
You can copy this into Explainshell for further information.
This should cover the majority of your data transferring needs, but if you want to know more feel free to ask us.
Last updated