Guide to the PBS Queuing System
on Raptor
Table of Contents
- 1. Introduction
- 2. The Anatomy of a Batch Script
- 2.1. Specify Your Shell
- 2.2. Required PBS Directives
- 2.3. The Execution Block
- 3. Submitting Your Job
- 4. Simple Batch Script Example
- 5. Job Management Commands
- 6. Optional PBS Directives
- 6.1. Job Identification Directives
- 6.2. Job Environment Directives
- 6.3. Reporting Directives
- 6.4. Job Dependency Directives
- 6.5. AFRL DSRC Directives
- 7. Environment Variables
- 7.1. PBS Environment Variables
- 7.2. Other Important Environment Variables
- 8. Example Scripts
- 8.1. MPI Script
- 8.2. MPI Script (accessing more memory per process)
- 8.3. OpenMP Script
- 8.4. Hybrid MPI/OpenMP Script
1. Introduction
On large scale computers, many users must share available resources. Because of this, you cannot just log on to one of these systems, upload your programs, and start running them. Essentially, your programs (called batch jobs) have to "get in line" and wait their turn, and there is more than one of these lines (called queues) for them to wait in. Some queues have a higher priority than others (like the express checkout at the grocery store). The queues available to you are determined by the projects that you are involved with.
The jobs in the queues are managed and controlled by a batch queuing system, without which, users could overload systems, resulting in tremendous performance degradation. The queuing system will run your job as soon as it can while still honoring the following:
- Meeting your resource requests
- Not overloading systems
- Running higher priority jobs first
At the AFRL DSRC, we use the PBS Professional queuing system. The PBS module should be loaded automatically for you at login, allowing you access to the PBS commands.
2. The Anatomy of a Batch Script
A batch script is simply a small text file that can be created with a text editor such
as vi or notepad. You may create your own from scratch, or start with one of the sample
batch scripts available in $SAMPLES_HOME. Although the specifics of a batch
script will differ slightly from system to system, a basic set of components are always
required, and a few components are just always good ideas. The basic components of a
batch script must appear in the following order:
- Specify Your Shell
- Required PBS Directives
- The Execution Block
- Optional PBS Directives and Defaults
Note: PBS does not handle ^M characters well. Scripts created on a MS
Windows system should be transferred to the HPC systems in ascii mode, or else use
dos2unix to convert the file before use.
2.1. Specify Your Shell
First of all, remember that your batch script is a script. It is a good idea to specify
which shell your script is written in. Unless you specify otherwise, PBS will use your
default login shell to run your script. To tell PBS which shell to use, start your
script with a line similar to the following, where shell is either bash,
sh, ksh, csh, or tcsh:
#!/bin/shell
2.2. Required PBS Directives
The next block of your script will tell PBS about the resources that your job needs by
including PBS directives. These directives are actually a special form of comment,
beginning with #PBS. As you might suspect, the # character
tells the shell to ignore the line, but PBS reads these directives and uses them to
set various values.
IMPORTANT: All PBS directives MUST come before the first line of executable code in your script, otherwise they will be ignored.
Every script must include directives for the following:
- The required number of cores
- The maximum amount of time your job should run
PBS also provides additional optional directives. These are discussed in Optional PBS Directives, below.
Note: While it is technically possible to specify PBS directives via the command line, we strongly suggest that you include them in your batch script instead. Doing so minimizes the opportunity for typing errors, allows you to see what you submitted later, and allows us to more easily assist you if something goes wrong.
Number of Nodes and Processes Per Node
Before PBS can schedule and start the job, it needs to know the required number of
cores or the required number of chunks of a specified core count (chunks may equate
to nodes). In addition, the directives may specify number of processes (affects entries
in the $PBS_NODEFILE file). In general, you would specify one process per
core, but you might want more or fewer processes depending on the programming model you
are using. See Example Scripts (below) for alternate use
cases.
The following directive :
#PBS -l select=N1:ncpus=N2:mpiprocs=N3
specifies N1 chunks of N3 processes executing on N2
cores. If N1 is 1, then N1: may be omitted. If N3
equals N2, then :mpiprocs=N3 may be omitted.
How Long to Run
Next, PBS needs to know how long your job will run. For this, you will have to make an estimate.There are three things to keep in mind:
- Your estimate is a limit. If your job has not completed within your estimate, it will be terminated.
- Your estimate will affect how long your job waits in the queue. In general, shorter jobs will run before longer jobs.
- Each queue has a maximum time limit. You cannot request more time than the queue allows.
To specify how long your job will run, include the following directive:
#PBS -l walltime=HHH:MM:SS
2.3. The Execution Block
Once the PBS directives have been supplied, the execution block may begin. This is the section of your script that contains the actual work to be done. A well written execution block will generally contain the following stages:
- Environment Setup - This might include setting environment variables,
creating directories, copying files, initializing data, etc. As the last step in this
stage, you will generally cd to the directory that you want your script to execute in.
Otherwise, your script would execute by default in your home directory. Most users use
"
cd $PBS_O_WORKDIR" to run the batch script from the directory where they typedqsubto submit the job. - Compilation - You may need to compile your application if you do not already have a pre-compiled executable available.
- Launching - Your application is launched as appropriate.
- Clean up - This usually includes archiving your results and removing unwanted files and directories.
3. Submitting Your Job
Once your batch script is complete, you will need to submit it to PBS for execution
using the qsub command. For example, if you have saved your script into a
text file named run.pbs, you would type
"qsub run.pbs".
Occasionally you may want to supply one or more directives directly on the qsub command
line. Directives supplied in this way override the same directives if they are already
included in your script. The syntax to supply directives on the command line is the
same as within a script except that #PBS is not used. For example:
qsub -l walltime=HH:MM:SS run.pbs
4. Simple Batch Script Example
The batch script below contains all of the required directives and common script components discussed above.
#!/bin/csh ## Specify your shell ## Required PBS Directives -------------------------------------- #PBS -l select=ncpus=16:mpiprocs=16 #PBS -l walltime=12:00:00 ## Execution Block ---------------------------------------------- # Environment Setup # Change to job-specific directory in $WORKDIR cd $WORK_DIR # Launching # copy executable from $HOME and submit it cp $HOME/a.out . mpirun -np 16 ./a.out > output_file # Clean up # archive your results archive mkdir -C $ARCHIVE_HOME $JOBID archive put -C $ARCHIVE_HOME/$JOBID output_file
5. Job Management Commands
The table below contains commands for managing your jobs in PBS.
| Command | Description |
|---|---|
qsub |
Submit a job. |
qstat |
Check the status of a job. |
qstat -q |
Display the status of all PBS queues. |
show_queues |
A more user-friendly version of qstat -q. |
qdel |
Delete a job. |
qhold |
Place a job on hold. |
qrls |
Release a job from hold. |
tracejob |
Display job accounting data from a completed job. |
pbsnodes |
Display host status of all PBS batch nodes. |
6. Optional PBS Directives
In addition to the required directives mentioned above, PBS has many other directives, but most users will only use a few of them. Some of the more useful directives are listed below.
6.1. Job Identification Directives
Job identification directives allow you to identify characteristics of your jobs. These directives are voluntary, but strongly encouraged. The following table contains some useful job identification directives.
| Directive | Options | Description |
|---|---|---|
| -l application | application_name | Identify the application being used |
| -N | job_name | Name your job. |
-l application
Allows you to identify the application
being used by your job. To use this directive, add a line in the following form to
your batch script:
#PBS -l application=application_name
or
qsub -l application=application_name
-N job_name
Allows you to designate a name
for your job. In addition to being easier to remember than a numeric job ID, the PBS
environment variable, $PBS_JOBNAME, inherits this value and can be used
instead of the job ID to create job-specific output directories. For example:
qsub -N job_20 run.pbs
6.2. Job Environment Directives
Job environment directives allow you to control the environment in which your script will operate. The following table contains a few useful job environment directives.
| Directive | Options | Description |
|---|---|---|
-I |
Request an interactive batch shell. | |
-V |
Export all environment variables to the job. | |
-v |
variable_list |
Export specific environment variables to the job. |
-I
Allows you to request an interactive batch shell.
Within that shell, you can perform normal Unix commands, including launching parallel
jobs. Adding -I to the qsub command-line queues an interactive batch job.
For example:
qsub -l select=ncpus=16:mpiprocs=16 -l walltime=00:30:00 -q debug -I
-V
Tells PBS to export all of the environment
variables from your login environment to the environment in which your script will run.
For example:
qsub -V run.pbs
-v
Tells PBS to export specific environment
variables from your login environment to the environment in which your script will run.
For example:
qsub -v DISPLAY run.pbs
or
qsub -v my_variable=my_value
6.3. Reporting Directives
Reporting directives allow you to control what happens to standard output and standard error messages generated by your script. They also allow you to specify e-mail options to be executed at the beginning and end of your job.
Redirecting Stdout and Stderr
By default, messages written to stdout and stderr are
captured for you in files named x.ojob_id and x.ejob_id,
where x is either the name of the script or the name specified with the
-N directive, and job_id is the ID of the job. If you want
to change this behavior, the -o and -e directives allow you
to redirect stdout and stderr messages to different named files. The -j
directive allows you to combine stdout and stderr into the same file.
| Directive | Options | Description |
|---|---|---|
-e |
File name |
Define standard error file. |
-o |
File name |
Define standard output file. |
-j |
oe |
Stderr and stdout are merged into stdout. |
-j |
eo |
Stderr and stdout are merged into stderr. |
Setting Up E-mail Alerts
Many users want to be notified when their jobs begin and end. The -mb
and -me directives make this possible. If you use these directives, you
will also need to supply the -M directive with one or more e-mail addresses
to be used.
| Directive | Options | Description |
|---|---|---|
-m |
b |
Send e-mail when the job begins. |
-m |
e |
Send e-mail when the job ends. |
-M |
E-mail address(es) |
Send mail to address(es). |
Example:
#PBS -m be #PBS -M joesmith@gmail.com,joe.smith@us.army.mil
6.4. Job Dependency Directives
Job dependency directives allow you to specify dependencies that your job may have on other jobs. These directives will generally take the following form:
#PBS -W depend=dependency_expression
where dependency_expression is a comma-delimited list of one
or more dependencies, and each dependency is of the form:
type:jobid[:jobid...]
where type is one of the directives listed below, and jobid[:jobid...]
is a list of one or more job IDs that your job is dependent upon.
| Directive | Description |
|---|---|
after |
Execute this job after listed jobs have begun. |
afterok |
Execute this job after listed jobs have terminated without error. |
afternotok |
Execute this job after listed jobs have terminated with an error. |
afterany |
Execute this job after listed jobs have terminated for any reason. |
before |
Listed jobs may be run after this job begins execution. |
beforeok |
Listed jobs may be run after this job terminates without error. |
beforenotok |
Listed jobs may be run after this job terminates with an error. |
beforeany |
Listed jobs may be run after this job terminates for any reason. |
Example: run a job after completion (success or failure) of job ID 1234:
#PBS -W depend=afterany:1234
Example: run a job after successful completion of job ID 1234:
#PBS -W depend=afterok:1234
For more information about job dependencies, see the qsub man page.
6.5. AFRL DSRC Directives
The following directives are optional on AFRL PBS installations.
Project Allocation Charging
PBS needs to know to which project allocation to charge the job's resource usage. The
command show_usage displays the ID(s) and remaining allocation(s) of the
user's active project(s). In the show_usage output, project IDs are in
the column Subproject.
To specify the Project ID for your job, include the following directive:
#PBS -A project_ID
To Which Queue Job is Submitted
PBS submits the job to an initial destination (queue and/or server), and the job
ultimately resides in a queue on a server in which it runs. By default on AFRL DSRC
PBS installations, this is the queue standard on the server for the local
system. All users have access to the queues standard, debug,
and background. In the queues challenge, high,
and urgent, jobs reside in the queue standard with higher initial priority
than standard class jobs based on the job's project ID. The queues standard
and debug should be used for normal day-to-day and debugging jobs.
The queue background has the lowest priority, but jobs that run in this
queue do not charge to the job's project allocation. Users may choose to run in the
queue background for several reasons:
- You do not care how long it takes for your job to begin running.
- You are trying to conserve your allocation.
- You have used up your allocation.
To see the list of queues available on the system, type the show_queues
command. To specify the queue you want your job to run in, include the following
directive:
#PBS -q queue_name
How Nodes Should Be Allocated
The resource job_type specifies the job's node allocation requirements.
Acceptable values are SMP, MPI, MIX with a
default value of MPI. Type SMP requires all cores on a single
node. Type MPI allows cores distributed among multiple nodes. Type
MIX identifies jobs with both OpenMP and MPI requirements.
Specification:
#PBS -l job_type=MPI
Software Resource
For some application software, concurrent limits are enforced with a finite number of licensing elements specific to each vendor. Jobs requiring software licenses may contain directives to specify the application and number of licenses required.
Specification:
#PBS -l application=license_count
7. Environment Variables
7.1. PBS Environment Variables
While there are many PBS environment variables, you only need to know a few important ones to get started using PBS. The table below lists the most important PBS environment variables and how you might generally use them.
| PBS Variable | Description |
|---|---|
$PBS_JOBID |
Job identifier assigned to job or job array by the batch system. |
$PBS_O_LOGNAME |
Value of LOGNAME from submission environment. |
$PBS_O_WORKDIR |
The absolute path of directory where qsub was executed. |
$PBS_JOBNAME |
The job name supplied by the user. |
The following additional PBS variables may be useful to some users.
| PBS Variable | Description |
|---|---|
$PBS_ARRAY_ID |
Identifier for job arrays. Consists of sequence number. |
$PBS_ARRAY_INDEX |
Index number of subjob in job array. |
$PBS_ENVIRONMENT |
Indicates job type: PBS_BATCH or PBS_INTERACTIVE |
$PBS_JOBDIR |
Pathname of job-specific staging and execution directory. |
$PBS_NODEFILE |
Filename containing a list of vnodes assigned to the job. |
$PBS_NODENUM |
Logical vnode number of this vnode allocated to the job. |
$PBS_O_HOME |
Value of HOME from submission environment. |
$PBS_O_HOST |
Host name on which the qsub command was executed. |
$PBS_O_LANG |
Value of LANG from submission environment. |
$PBS_O_MAIL |
Value of MAIL from submission environment. |
$PBS_O_PATH |
Value of PATH from submission environment. |
$PBS_O_QUEUE |
The original queue name to which the job was submitted. |
$PBS_O_SHELL |
Value of SHELL from submission environment. |
$PBS_O_SYSTEM |
The operating system name where qsub was executed. |
$PBS_O_TZ |
Value of TZ from submission environment. |
$PBS_QUEUE |
The name of the queue from which the job is executed. |
$PBS_TASKNUM |
The task (process) number for the job on this vnode. |
7.2. Other Important Environment Variables
In addition to the PBS environment variables, the table below lists a few other variables which are not generally required, but may be important depending on your job.
| Variable | Description |
|---|---|
$OMP_NUM_THREADS |
The number of OpenMP threads per node |
$MPI_DSM_DISTRIBUTE |
Ensures that memory is assigned closest to the physical core where each MPI process is running |
8. Example Scripts
8.1. MPI Script
The following script is for a 128 core MPI job running for 20 hours in the standard queue.
#!/bin/csh ## Required Directives ------------------------------------ #PBS -l select=ncpus=128:mpiprocs=128 #PBS -l walltime=20:00:00 ## Optional Directives ------------------------------------ #PBS -l job_type=MPI #PBS -q standard #PBS -A project_ID #PBS -N testjob1 #PBS -j oe #PBS -M my_email@yahoo.com #PBS -m be ## Execution Block ---------------------------------------- # Environmental Setup # the following environment variable is not required, but will # optimally assign processes to cores and improve memory use. setenv MPI_DSM_DISTRIBUTE # Change to job-specific subdirectory cd $WORK_DIR # Stage input data from archive archive get -C $ARCHIVE_HOME/my_data_dir "*.dat" # Copy the executable from $HOME cp $HOME/my_mpiprog.exe ./a.out ## Launching ---------------------------------------------- mpirun -np 128 ./a.out > output_file ## Cleanup ------------------------------------------------ # archive your results archive put -p -C $ARCHIVE_HOME/my_output_dir *.out output*
8.2. MPI Script (accessing more memory per process)
By default, an MPI job runs one process per core, with all processes sharing the available memory on the node. If you need more memory per process, then your job needs to run fewer MPI processes per node.
The following script requests 128 cores, but because mpiprocs is set to
1, it uses only one core (and one MPI process) per node for an MPI job running for
20 hours in the standard queue.
This would run 16 processes with access to 24 GBytes of memory.
#!/bin/csh ## Required Directives ------------------------------------ #PBS -l select=ncpus=128:mpiprocs=32 #PBS -l walltime=20:00:00 ## Optional Directives ------------------------------------ #PBS -l job_type=MPI #PBS -q standard #PBS -A project_ID #PBS -N testjob2 #PBS -j oe #PBS -M my_email@yahoo.com #PBS -m be ## Execution Block ---------------------------------------- # Environmental Setup # the following environment variable is not required, but will # optimally assign processes to cores and improve memory use. setenv MPI_DSM_DISTRIBUTE # the following environment variable is not required, but will # optimally assign processes to cores and improve memory use. setenv MPI_DSM_DISTRIBUTE # Set the number of MPI processes per memory node, which # may not exceed 4, the number of CPUs per memory node. setenv MPI_DSM_PPM 1 # Change to job-specific subdirectory cd $WORK_DIR # Stage input data from archive archive get -C $ARCHIVE_HOME/my_data_dir "*.dat" # Copy the executable from $HOME cp $HOME/my_mpiprog.exe a.out ## Launching ---------------------------------------------- mpirun -np 32 ./a.out > output_file ## Cleanup ------------------------------------------------ # archive your results archive put -p -C $ARCHIVE_HOME/my_output_dir *.out output*
8.3. OpenMP Script
The following script is for an OpenMP job using one thread per core on a single node and running for 20 hours in the standard queue.
#!/bin/csh ## Required Directives ------------------------------------ #PBS -l select=1:ncpus=8:mpiprocs=8 #PBS -l walltime=20:00:00 ## Optional Directives ------------------------------------ #PBS -l job_type=SMP #PBS -q standard #PBS -A project_ID #PBS -N testjob4 #PBS -j oe #PBS -M my_email@yahoo.com #PBS -m be ## Execution Block ---------------------------------------- # Environmental Setup # Change to job-specific subdirectory cd $WORK_DIR # Stage input data from archive archive get -C $ARCHIVE_HOME/my_data_dir "*.dat" # Copy the executable from $HOME cp $HOME/my_ompprog.exe a.out ## Launching ---------------------------------------------- # Threads per rank set by OMP_NUM_THREADS or omplace -nt setenv OMP_NUM_THREADS 8 omplace -vv -nt 8 ./a.out > output_file ## Cleanup ------------------------------------------------ # archive your results archive put -p -C $ARCHIVE_HOME/my_output_dir *.out output*
8.4. Hybrid MPI/OpenMP Script
The following script uses 128 cores with one MPI task per node and one thread per core.
#!/bin/csh ## Required Directives ------------------------------------ #PBS -l select=ncpus=16:mpiprocs=2 #PBS -l walltime=20:00:00 ## Optional Directives ------------------------------------ #PBS -l job_type=MIX #PBS -q standard #PBS -A project_ID #PBS -N testjob5 #PBS -j oe #PBS -M my_email@yahoo.com #PBS -m be ## Execution Block ---------------------------------------- # Environmental Setup # the following environment variable is not required, but will # optimally assign processes to cores and improve memory use. setenv MPI_DSM_DISTRIBUTE # Change to job-specific subdirectory cd $WORK_DIR # Stage input data from archive archive get -C $ARCHIVE_HOME/my_data_dir "*.dat" # Copy the executable from $HOME cp $HOME/my_mixprog.exe a.out ## Launching ---------------------------------------------- # Threads per rank set by OMP_NUM_THREADS or omplace -nt setenv OMP_NUM_THREADS 8 mpirun -np 2 omplace -vv -nt 8 ./a.out > output_file ## Cleanup ------------------------------------------------ # archive your results archive put -p -C $ARCHIVE_HOME/my_output_dir *.out output*

