Data Transfer

How to move files to and from HPC and deal with compressed files

We often need to move files to and from the HPC. This section will cover the movement of files around computers both remote and local.

Since we are moving more toward open science and data sharing, you might find a data set or scripts posted online. The first thing we will be doing is getting a file from the web. This will be accomplished with the wget command in the command line. wget derives it’s name from world wide web and get, which makes sense considering what we are using it for.

We will need the web addresses for the data files that we are going to use. Two data sets are provided in two different formats so you can practice extracting compressed files.

If you have not done so already, open a terminal and connect to the HPC and navigate to the folder/directory where you want this file to go. Use the pwd command to verify your location.

[[email protected] ~]$ cd demo_files
[[email protected] demo_files]$ pwd
/home/Joseph.Madison/demo_files

Once there type in the command line wget and paste the address we copied from the web: usdrcg.github.io/workshops/crimes.zip

You may have trouble pasting, right click and select paste in MobaXterm, DO NOT USE ctrl+V After hitting enter you should see the file download. Verify it is there using ls.

[[email protected] demo_files]$ wget http://usdrcg.github.io/workshops/crimes.zip
--16:03:05-- http://usdrcg.github.io/workshops/crimes.zip
Resolving usdrcg.github.io... 151.101.32.133
Connecting to usdrcg.github.io|151.101.32.133|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16206119 (15M) [application/zip]
Saving to: `crimes.zip'
100% [===============================================
>
] 16,206,119 19.8M/s in 0.8s
16:03:09 (19.8 MB/s) - `crimes.zip' saved [16206119/16206119]
[[email protected] demo_files]$ ls
crimes.zip README water_molecule.txt

Now that we have some files to play with we are going to move them around. To make the next step easier, while in the directory where we just downloaded these files type the Print Working Directory Command like this:

[[email protected] demo_files]$ pwd
/home/Joseph.Madison/demo_files

Copy the output path. As before the cp command is for copy. Similarly we have scp for secure copy. This is used for moving files between computers over the internet. We will be using it to transfer these files between your personal computer to the HPC. To get files from your local computer, you will need to open a local terminal session on your computer (Do not connect to the HPC!, in this example, [email protected] is my local PC).

We will repeat the wget command steps for the election polls file on the Welcome Wagon page for your local PC in the terminal.

[[email protected] ~]$ cd Desktop
[[email protected] Desktop]$ wget http://usdrcg.github.io/workshops/baseball.tar.gz
--16:06:43-- http://usdrcg.github.io/workshops/baseball.tar.gz
Resolving usdrcg.github.io... 151.101.32.133
Connecting to usdrcg.github.io|151.101.32.133|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8893 (8.7K) [application/octet-stream]
Saving to: `baseball.tar.gz'
100%[=================================
>
] 8,893 --.-K/s in 0.04s
16:06:43 (210 KB/s) - `baseball.tar.gz' saved [8893/8893]
[email protected] Desktop]$ ls
baseball.tar.gz

Once you have this file on your local PC, you can move it (any file you want) from your local PC to the HPC. The syntax of the scp command has four parts: 1. scp command, 2. The file you are moving, 3. The path to the file you want to copy, and 4. The target local directory where this file will be downloaded to.

Type the scp command as follows (paste in path that was copied from HPC):

[[email protected] Desktop]$ ls
baseball.tar.gz
[[email protected] Desktop]$ scp baseball.tar.gz [email protected]:/home/Joseph.Madison/demo_files
[email protected]'s password:
baseball.tar.gz 100% 8893 8.7KB/s 00:00

The last line tells you stats on the transfer. You can also use scp for other data transfer applications including HPC to local PC, between HPCs, and between PCs. In a separate terminal where I am logged into the HPC, I can check that the file was transferred and put into demo_files.

[[email protected] demo_files]$ ls
baseball.tar.gz crimes.zip README water_molecule.txt

Now as most things go we showed you the harder way first for transferring files.

Drag-and-Drop Option

If you like moving files using the drag-and-drop method, that is what we are going to do next. Note: using scp on the command line can be faster for some files (including large files). This next file moving method uses sftp, or SSH (or Secure) File Transfer Protocol. For this we will open up MobaXterm or FileZilla. In MobaXterm, the file explorer/hierarchy is visible just to the left of the command line interface.

From that portion of the MobaXterm window, you can drag and drop files between either computer as you desire. Easy!

Another option that you will need to use if you are going to be transferring large volumes of data between locations is a piece of software called Globus. This is required for moving files around 1 TB or larger.

If you look here at their homepage, Globus.org, you will see they have transferred around 350 Petabytes of data (as of Jan. 11, 2018). Just for reference, the all of the information in the library of congress converted to text files is one fourth of a petabyte. Globus will not be covered here today, but if you are interested or have questions feel free to contact us.

Compressed Files

The last thing to mention in this section is the decompression of files. As you might have noticed already, the files we got using wget are in the form of .zip and .tar.gz. (tar.gz is similar to .zip).

Compressed files are a great way to send large files via email, and when you have a directory with directories in it that you want to share as one file. We will be doing this on the HPC.

For .zip files simply move them to the directory where you wish to have the files, make sure you are in that directory, and use the command unzip. This will extract the files from the .zip.

[[email protected] demo_files]$ ls
baseball.tar.gz crimes.zip README water_molecule.txt
[[email protected] demo_files]$ unzip crimes.zip
Archive: crimes.zip
inflating: Police_bounds.dbf
inflating: Police_bounds.prj
inflating: Police_bounds.shp
inflating: Police_bounds.shx
inflating: Police_points.dbf
inflating: Police_points.prj
inflating: Police_points.shp
inflating: Police_points.shx
inflating: ProvincePopulation.csv
inflating: SouthAfricaCrimeStats_v2.csv
[[email protected] demo_files]$ ls
baseball.tar.gz Police_bounds.prj Police_points.dbf Police_points.shx SouthAfricaCrimeStats_v2.csv
crimes.zip Police_bounds.shp Police_points.prj ProvincePopulation.csv water_molecule.txt
Police_bounds.dbf Police_bounds.shx Police_points.shp README

For the other commonly encountered compressed file type, we have the tar file. This can be extracted using the tar -xzvf command:

[[email protected] demo_files]$ ls
baseball.tar.gz crimes.zip README water_molecule.txt
[[email protected] demo_files]$ tar -xvsf baseball.tar.gz
Baseball/
Baseball/batting.csv
Baseball/pitching.csv
Baseball/college.csv
[[email protected] demo_files]$ ls
Baseball baseball.tar.gz crimes.zip README water_molecule.txt

Note that -xzvf has the following meaning: -x, extract; -z gunzip; -v, verbose; -f use a file

You can copy this into Explainshell for further information.

This should cover the majority of your data transferring needs, but if you want to know more feel free to ask us.