Skip Nav

Archive Server User Guide

Table of Contents

1. Introduction

This document provides an overview of the Mass Storage capabilities of the Air Force Research Laboratory (AFRL) Department of Defense Supercomputing Research Center (DSRC) and how they may be used. The AFRL DSRC is physically located in Building 676 of Area B at Wright-Patterson Air Force Base near Dayton, Ohio.

1.1. Assumed Background of the Reader

It is assumed that the reader of this guide has a firm grasp of the concepts required to use the Linux operating system and to program in either the C, C++, FORTRAN 77, Fortran 90 or Fortran 95 languages.

1.2. Document Conventions

Unless specified otherwise, % represents the command line prompt. Anything after the % should be entered at the command line prompt in the user's terminal.

file-1 ... file-N (file names) and dir-1 ... dir-N (directory names) are the files and directories to be archived.

1.3. Additional Information

Much of the information presented in this document is available online through the man pages and is accessible while logged in by typing:

man {command name}

2. AFRL DSRC Archive Server

This facility provides long-term storage for user files and data backup files. The Mass Storage servers use Oracle's Sun Storage Archive Manager (SAMFS) to move data files to tape when the files are inactive. The SAMFS is also used to move the files back to disk when a user needs to access the contents of the files. The file status information of the files that are migrated to tape remains on disk, which gives the appearance that the files are still stored on disk. The Mass Storage server is available from the AFRL DSRC High Performance Application Servers.

The Mass Storage server use Oracle's SAMFS to maintain the near-online storage system. SAMFS periodically migrates data files to tape, but retains the directory information about the files on disk. When a user attempts to access the data in a migrated file (e.g., attempting to copy the file), SAMFS will automatically copy the data file back to disk. The Mass Storage server provides large file-space capacity at the expense of longer access times (e.g., the time needed to read a file from tape) than from disk. Since the access time on this system can be much longer than the access time on other file systems and file servers, the Mass Storage server is best suited for long-term and permanent storage and the storage of large files.

Infrequently accessed files, large data files, and file archives of entire directories are best stored on the Mass Storage server. Users should not directly use files located on the Mass Storage server. Changes made by editing or appending to these files may not be saved if one of the Sun servers fails. Users should copy the files from the Mass Storage server to the system where the files are needed and copy the modified files back to the Mass Storage server. Possible destinations are the user's $HOME directory, the global /scratch directory, the Center Wide File System (CWFS), the /workspace directory of an appropriate AFRL DSRC High Performance Server, or space on a local disk (such as a hard disk attached to the user's local workstation).

NOTE: The AFRL DSRC Mass Storage System is not a replacement for the normal backups made by the system administrators and should not be used as an extension of a user's disk allocation.

2.1. Hardware

The Mass Storage system for the AFRL DSRC (herein, denoted as the Mass Storage server) is currently configured with a total capacity of six PBytes of near-online storage. The system consists of two Sun Fire X4600 M2 and a SPARC Enterprise T5220 Server with a total of 108 TBytes of disk cache. These servers connect via a SAN network to an Oracle SL 8500 tape silo with 16 T10K-A drives and 18 T10K-B drives. There is also a 1 PByte Center Wide File System (CWFS) to facilitate the movement of data between the HPC platforms, the Utility Server, and the Mass Storage Systems. This facility provides local storage for user data files.

3. Using the Mass Storage Server

Users cannot login directly to the Mass Storage server, but they are allowed to connect via the ftp, rcp, rsh (remote shell) and ssh/scp (secure shell/copy) commands. In addition, the locally installed utilities archive and safeftp provide other interfaces to the Mass Storage server. Each user is given an archive directory on the Mass Storage server, which is network file system (NFS)-mounted on the AFRL DSRC Application Servers. Thus, users can directly access their archive directory from any AFRL DSRC Application Server.

3.1. The Archive Command

The Mass Storage system ($ARCHIVE_HOME) is mounted on the interactive node of the system, however, to improve transfer speeds for batch jobs, $ARCHIVE_HOME is not mounted to the batch nodes. To transfer files from $ARCHIVE_HOME, users will have to use either un-kerberized rcp or the archive command. The archive command is a tool to help users with transferring files to and from $ARCHIVE_HOME.

The AFRL DSRC startup scripts create the environment variables ARC and MSAS associated with the Mass Storage server. ARC contains the path of the user’s archive directory. The user’s archive directory path can be simply written as $ARCHIVE_HOME as used in the examples in this User’s Guide. MSAS contains the hostname of the server on which the user’s archive directory is mounted. $ARCHIVE_HOST is used where a machine hostname is required in this User’s Guide. It is recommended that users use $ARCHIVE_HOST and $ARCHIVE_HOME for the archive server name and archive directory to avoid problems when these values change.

Basic Syntax

The basic syntax for the archive command is:

% archive get [getopts] file1 [file2 ...]
% archive put [putopts] file1 [file2 ...]

More information on the archive command can be found on line via the archive man page (man archive).

Using the Archive Command in the Batch Environment

To transfer files in the batch environment, you will need to use un-kerberized rcp or the archive command, as shown below:

% /usr/bin/rcp ${msas}:/msas*/foo/bar
% $WRK/username archive get archive_filename local_filename
% archive put local_filename

3.2. Creating a Directory

Subdirectories are used to organize files within the user’s archive directory. In the examples below, files are copied to and from subdirectories that must already exist.

To create a new subdirectory, use the mkdir command:

% cd $ARCHIVE_HOME
% mkdir dirname

or

% mkdir $ARCHIVE_HOME/dirname

3.3. Linking to the Created Directory

To create a link to the new directory in the user’s current working directory, use the ln command:

% ln -s $ARCHIVE_HOME/dirname ./dirname

where $ARCHIVE_HOME/dirname is the new subdirectory on the Mass Storage server. The user can access the subdirectory by referencing the link dirname.

Users can also put their data files directly in their $ARCHIVE_HOME directory and create no subdirectories. This arrangement may become unwieldy as more files are stored in the user’s archive directory.

3.4. Using Archive Files

Although rcp, scp and cp can transfer entire directories to and from the Mass Storage server, users are strongly advised to use archive files instead because they are much more efficient for storage and transfer.

Using Tar Files

The tar (tape archive) command creates and extracts from archive files. tar archive files, which store the directory structure as well as the file contents, are recommended for storing directories and collections of related files. For additional information regarding tar files, please type “man tar” at the command line.

In the below example, “tarfile” is the archive file created by the tar command.

% tar -cf tarfile file-1 ... file-N dir-1 ... dir-N

To extract the created tar file (named “tarfile” in this example), type the following:

% tar -xf tarfile

3.5. Methods for Storing and Retrieving Data

The user has three methods for copying files to and from the Mass Storage server. They are: cp and mv, rcp and scp, and ftp.

In the examples below, the files are copied to and from the user’s current working directory for convenience. This is not a requirement. Also, since rcp, scp, ftp and rsh begin in the user’s archive home directory, $ARCHIVE_HOME on the Mass Storage server, the relative path dirname is equivalent to $ARCHIVE_HOME/dirname. If a final “.” is included, this is indicative of the current directory.

cp and mv

The cp command on an NFS-mounted file system is recoverable when a server fails, but it will be slower than the other methods. It can also be used in shell scripts. In the below examples, $ARCHIVE_HOME/dirname is the destination directory on the Mass Storage server.

Using cp and mv to store directories or projects:

Step 1: Create a tarfile.

% tar -cf tarfile file-1 ... file-N dir-1 ... dir-N

Step 2: Copy the file for storage.

% cp file-1 ... file-N $ARCHIVE_HOME/dirname

Step 3: Move the file for storage.

% mv file-1 ... file-N $ARCHIVE_HOME/dirname

Step 4: Remove the original file from work directory, or from the home directory.

% rm tarfile

Using cp to retrieve migrated files and/or directories and untar them:

Step 1: Retrieve the files.

% cp $ARCHIVE_HOME/dirname/file-1 ... $ARCHIVE_HOME/dirname/file-N .

Step 2: Untar the files from the location they have been moved to.

% tar xf tarfile

Step 3: Remove the original tarfile from the work directory, or from the home directory.

% rm tarfile

rcp and scp

The rcp and scp commands is more complicated to use than cp. Only non-Kerberized rcp (see Section 2.3.3) can be used in batch the Mass Storage server and machines outside the AFRL DSRC. This requires that users use non-Kerberized rcp in any batch scripts.

The full rcp path names required to use rcp on the AFRL DSRC systems are:

rcp Path
System Path
Spirit /usr/bin/rcp
Predator /usr/bin/rcp

Since rcp and scp have the same syntax, only rcp is used in the following examples:

Using rcp to store files and/or directories (without creating a tar file):

% rcp file-1 ... file-N ${msas}:dirname

Using rcp to store directories or projects (with a tar file):

Step 1: Create an archive file using tar.

% tar cf tarfile file-1 ... file-N dir-1 ... dir-N

Step 2: Store the archive file created above.

% rcp tarfile ${msas}:dirname

Step 3: Remove the local copy.

% rm tarfile

Using rcp to retrieve files and/or directories (without a tar file):

% rcp ${msas}:dirname/file-1 ...

% ${msas}:dirname/file-N .

Using rcp to retrieve directories (with a tar file):

Step 1: Retrieve archive file tarfile.

% rcp ${msas}:dirname/tarfile .

Step 2: Extract from the archive file tarfile.

% tar xf tarfile

Step 3: Remove the tarfile.

% rm tarfile

ftp

ftp commands cannot be incorporated into shell scripts due to security restrictions. Only Kerberized ftp (kftp) can be used to transfer files to and from the Mass Storage server.

Using ftp for storing files and/or directories (without a tar file):

Step 1: Open an ftp session to the Mass Storage server.

% ftp $ARCHIVE_HOST

Step 2: Press the return key to accept the default user name.

username:

Name (msasX:username):<CR>

Step 3: Change to the destination directory dirname on the Mass Storage server.

% cd dirname

Step 4: Put the file(s) to the Mass Storage server.

% put file-1 or mput file-1 ... file-N

Step 5: End the ftp session:

% quit

Using ftp for storing files and/or directories (with a tar file):

Step 1: Create an archive file using tar.

% tar cf tarfile file-1 ... file-N dir-1 ... dir-N

Step 2: Open an ftp session to the Mass Storage server.

% ftp $ARCHIVE_HOST

Step 3: Press the return key to accept the default user name.

username:

Name (msasX:username):<CR>

Step 4: Change to the destination directory dirname on the Mass Storage server.

% cd dirname

Step 5: Store the archive file tarfile on the Mass Storage server.

% put tarfile

Step 6: End the ftp session.

% quit

Step 7: Delete the local copy of the archive file tarfile.

% rm tarfile

Using ftp for retrieving stored files and/or directories:

Step 1: Open an ftp session to the Mass Storage server.

% ftp $ARCHIVE_HOST

Step 2: Press the return key to accept the default user name.

username:

Name (msasX:username):<CR>

Step 3: Change to the source directory dirname on the Mass Storage server.

% cd dirname

Step 4: Retrieve the file(s) from the Mass Storage server.

% get file-1 or mget file-1 ... file-N

Step 5: End the ftp session.

% quit

Step 6: Extract from the archive file tarfile and remove it.

% tar xf tarfile

Step 7: Remove tarfile.

% rm tarfile

archive Command

The archive command is a locally installed utility that provides a uniform interface to the Mass Storage servers at DoD DSRCs. The archive command can be used from within batch scripts.

Using the archive command to store file(s) (without a tarfile):

archive put -C dirname file-1 ... file-N

Using the archive command to store directories or projects (using a tarfile):

Step 1: Create an archive file using tar.

% tar cf tarfile file-1 ... file-N dir-1 ... dir-N

Step 2: Store archive file.

% archive put -C dirname tarfile

Step 3: Delete local copy.

% rm tarfile

The archive command can also create archive files using tar implicitly. See the man pages for more details.

Using archive to retrieve files and/or directories (without a tarfile):

% archive get -C dirname file-1 ... file-N

Using archive to retrieve files and/or directories (with a tarfile):

Step 1: Retrieve archive file tarfile.

% archive get -C dirname tarfile

Step 2: Extract from the archive file tarfile.

% tar xf tarfile

Step 3: Remove the archived tarfile.

% rm tarfile

The archive command can also extract from archive files using tar implicitly. See the man pages for more details.

safeftp

The safeftp command is a locally installed utility that provides an interface similar to rcp and scp, but using ftp, which allows for some error checking and restart capabilities. Since it uses ftp, it cannot be used in batch scripts. Since the form is similar to rcp and scp, it is not described.

3.6. Other Commands

This section provides a brief description of other commands that may be useful for users on the archive system.

Standard Long Listing (sls)

The sls command is an extended version of the Unix ls command. It can be used to help determine the status of files on the archive server.

The following commands can be used for a single line listing:

% ls -l $ARCHIVE_HOME/test.dat (NFS mounted)

or

% sls -l $ARCHIVE_HOME/test.dat

produces:

-rw------- 1 user user 1000 Sep 10 12:50 /msas031/user/test.dat

The following commands can be used for two line listing:

% sls -2 $ARCHIVE_HOME/test.dat

produces:

-rw------- 1 user user 1000 Sep 10 12:50 /msas031/user/test.dat

O-a------- --- sg

In addition to the output of the standard long listing, the ‘‘O’’ in the second line dicates that the file is off-line and must be staged before being accessed.

The following commands can be used for detailed listing:

% sls -D $ARCHIVE_HOME/test.dat

produces:

/msas031/user/test.dat mode: -rw------- links: 1 owner: user group: user

length: 1000 inode: 608

offline; archdone;

copy 1: ---- Sep 10 12:51 27eb4.1 sg 000305

access: Sep 10 12:50 modification: Sep 10 12:50

changed: Sep 10 12:50 attributes: Sep 10 13:52

creation: Sep 10 12:50 residence: Sep 10 13:52

Refer to the online man pages for more information regarding sls.

As an alternative, a wrapper for sls exists on the AFRL DSRC Application Servers. This wrapper starts a remote shell on the Mass Storage server and passes any arguments entered to the indicated command. For example:

sls -l is equivalent to rsh $ARCHIVE_HOST ’sls -l’.

NOTE: when using these wrapper utilities, arguments that contain wildcards (e.g., *, ?, [, ]) must be quoted to avoid being expanded by the shell on the local machine.

Non-Kerberized rsh on the AFRL DSRC Application Servers

These commands are available via rsh or ssh to the Sun servers. The command line used for these commands should be in the form:

rsh|ssh $ARCHIVE_HOST ‘command options filelist’

Only non-Kerberized rsh can be used in batch scripts. Only Kerberized rsh or ssh can be used to connect to the Mass Storage server from machines outside the AFRL DSRC.

rsh Path
System Path
Spirit /usr/bin/rsh
Predator /usr/bin/rsh

4. Customer Service

4.1. Consolidated Customer Assistance Center (CCAC)

For customer assistance, contact the CCAC at:

Web: http://centers.hpc.mil/about/contact.html
Toll Free: 1-877-CCAC-039 (1-877-222-2039)
Local: (937) 255-0679
DSN: 785-0679
E-mail: help@ccac.hpc.mil

If you have any questions regarding the AFRL DSRC, contact the CCAC first. If your problem or question is beyond the scope of the CCAC, they will refer you to the appropriate resource.

4.2. AFRL DSRC Support

In-depth technical inquiries and problems are forwarded to the AFRL DSRC Customer Assistance and Technology Center (CATC), which pursues such inquiries and problems through resolution as rapidly as possible. The AFRL DSRC CATC will attempt to determine the nature of the problem, then identify and coordinate whatever resources are needed to resolve the problem.

4.3. AFRL DSRC Web Site

The AFRL DSRC website (http://www.afrl.hpc.mil) is the best source for current AFRL DSRC information. Some of the topics found on the website include:

Software - Short and long descriptions of current AFRL DSRC applications

Hardware - Information on AFRL DSRC servers and Archival Storage

Documentation - Listings of the AFRL DSRC User Guides

Policy User Guide - The latest policies regarding usage of the AFRL DSRC resources