We often need to move files to and from the HPC. This section will cover the movement of files around computers both remote and local.
Since we are moving more toward open science and data sharing, you might find a data set or scripts posted online. The first thing we will be doing is getting a file from the web. This will be accomplished with the wget command in the command line. wget derives it’s name from world wide web and get, which makes sense considering what we are using it for.
We will need the web addresses for the data files that we are going to use. Two data sets are provided in two different formats so you can practice extracting compressed files.
If you have not done so already, open a terminal and connect to the HPC and navigate to the folder/directory where you want this file to go. Use the pwd command to verify your location.
[[email protected] ~]$ cd demo_files[[email protected] demo_files]$ pwd/home/Joseph.Madison/demo_files
Once there type in the command line
wget and paste the address we copied from the web: usdrcg.github.io/workshops/crimes.zip
You may have trouble pasting, right click and select paste in MobaXterm, DO NOT USE ctrl+V After hitting enter you should see the file download. Verify it is there using
[[email protected] demo_files]$ wget http://usdrcg.github.io/workshops/crimes.zip--16:03:05-- http://usdrcg.github.io/workshops/crimes.zipResolving usdrcg.github.io... 188.8.131.52Connecting to usdrcg.github.io|184.108.40.206|:80... connected.HTTP request sent, awaiting response... 200 OKLength: 16206119 (15M) [application/zip]Saving to: `crimes.zip'100% [===============================================>] 16,206,119 19.8M/s in 0.8s16:03:09 (19.8 MB/s) - `crimes.zip' saved [16206119/16206119]crimes.zip README water_molecule.txt
Now that we have some files to play with we are going to move them around. To make the next step easier, while in the directory where we just downloaded these files type the Print Working Directory Command like this:
[[email protected] demo_files]$ pwd/home/Joseph.Madison/demo_files
Copy the output path. As before the
cp command is for copy. Similarly we have
scp for secure copy. This is used for moving files between computers over the internet. We will be using it to transfer these files between your personal computer to the HPC. To get files from your local computer, you will need to open a local terminal session on your computer (Do not connect to the HPC!, in this example, [email protected] is my local PC).
[[email protected] ~]$
We will repeat the wget command steps for the election polls file on the Welcome Wagon page for your local PC in the terminal.
[[email protected] ~]$ cd Desktop[[email protected] Desktop]$ wget http://usdrcg.github.io/workshops/baseball.tar.gz--16:06:43-- http://usdrcg.github.io/workshops/baseball.tar.gzResolving usdrcg.github.io... 220.127.116.11Connecting to usdrcg.github.io|18.104.22.168|:80... connected.HTTP request sent, awaiting response... 200 OKLength: 8893 (8.7K) [application/octet-stream]Saving to: `baseball.tar.gz'100%[=================================>] 8,893 --.-K/s in 0.04s16:06:43 (210 KB/s) - `baseball.tar.gz' saved [8893/8893][email protected] Desktop]$ lsbaseball.tar.gz
Once you have this file on your local PC, you can move it (any file you want) from your local PC to the HPC. The syntax of the
scp command has four parts: 1.
scp command, 2. The file you are moving, 3. The path to the file you want to copy, and 4. The target local directory where this file will be downloaded to.
scp command as follows (paste in path that was copied from HPC):
[[email protected] Desktop]$ lsbaseball.tar.gz[email protected]'s password:baseball.tar.gz 100% 8893 8.7KB/s 00:00
The last line tells you stats on the transfer. You can also use scp for other data transfer applications including HPC to local PC, between HPCs, and between PCs. In a separate terminal where I am logged into the HPC, I can check that the file was transferred and put into demo_files.
baseball.tar.gz crimes.zip README water_molecule.txt
Now as most things go we showed you the harder way first for transferring files.
If you like moving files using the drag-and-drop method, that is what we are going to do next. Note: using
scp on the command line can be faster for some files (including large files). This next file moving method uses sftp, or SSH (or Secure) File Transfer Protocol. For this we will open up MobaXterm or FileZilla. In MobaXterm, the file explorer/hierarchy is visible just to the left of the command line interface.
From that portion of the MobaXterm window, you can drag and drop files between either computer as you desire. Easy!
Another option that you will need to use if you are going to be transferring large volumes of data between locations is a piece of software called Globus. This is required for moving files around 1 TB or larger.
If you look here at their homepage, Globus.org, you will see they have transferred around 350 Petabytes of data (as of Jan. 11, 2018). Just for reference, the all of the information in the library of congress converted to text files is one fourth of a petabyte. Globus will not be covered here today, but if you are interested or have questions feel free to contact us.
The last thing to mention in this section is the decompression of files. As you might have noticed already, the files we got using
wget are in the form of .zip and .tar.gz. (tar.gz is similar to .zip).
Compressed files are a great way to send large files via email, and when you have a directory with directories in it that you want to share as one file. We will be doing this on the HPC.
For .zip files simply move them to the directory where you wish to have the files, make sure you are in that directory, and use the command unzip. This will extract the files from the .zip.
baseball.tar.gz crimes.zip README water_molecule.txt[[email protected] demo_files]$ unzip crimes.zipArchive: crimes.zipinflating: Police_bounds.dbfinflating: Police_bounds.prjinflating: Police_bounds.shpinflating: Police_bounds.shxinflating: Police_points.dbfinflating: Police_points.prjinflating: Police_points.shpinflating: Police_points.shxinflating: ProvincePopulation.csvinflating: SouthAfricaCrimeStats_v2.csvbaseball.tar.gz Police_bounds.prj Police_points.dbf Police_points.shx SouthAfricaCrimeStats_v2.csvcrimes.zip Police_bounds.shp Police_points.prj ProvincePopulation.csv water_molecule.txtPolice_bounds.dbf Police_bounds.shx Police_points.shp README
For the other commonly encountered compressed file type, we have the tar file. This can be extracted using the
tar -xzvf command:
baseball.tar.gz crimes.zip README water_molecule.txt[[email protected] demo_files]$ tar -xvsf baseball.tar.gzBaseball/Baseball/batting.csvBaseball/pitching.csvBaseball/college.csvBaseball baseball.tar.gz crimes.zip README water_molecule.txt
-xzvfhas the following meaning: -x, extract; -z gunzip; -v, verbose; -f use a file
You can copy this into Explainshell for further information.
This should cover the majority of your data transferring needs, but if you want to know more feel free to ask us.