Using MPI via the
PBS batch queueing system

Alpha
The Alpha PBS queue consists of 64 processors linked with a fast network, Myrinet. To use the PBS-based MPI on the Alphas, compile and link your program on scout using the commands mpicc, mpif77, or mpif90, as appropriate. These commands automatically include the appropriate files and libraries. They take the same options as the underlying compilers (gcc, g77, and fort, respectively). Manpages are available for these commands. Note: If you wish to use the Compaq fort compiler you must use mpif90 regardless of whether your code is Fortran 77 or Fortran 90, but fort handles both well. When you submit an MPI job, the system will create a file gmpiconf.ID, where ID is the queue id of your job, in the /home/pbs directory on scout. This file lists the nodes on which your job is run, which can be useful under some circumstances. Please do not delete this file until your job has completed. PBS also provides a built-in means to obtain job information; qstat -f <job id> will provide a great deal of information about <job id>. To determine the nodes on which the job is running, pipe the command through grep exec.

Job Submission

Once your program is compiled and linked, you must submit it via the PBS batch queue system. Scout is the only node from which PBS jobs can be submitted on the Alpha processors. Below is an example that uses eight nodes. In this example, all processes must have NFS access to your home directory.

Create a file containing the lines:

#!/bin/sh
#PBS -lnodes=8
mpirun /home/you/project/yourjob.mpp
Note: In our system, the -np option is unnecessary and ignored. This is not always the case for other queueing systems with MPI. If you have existing scripts that specify -np, you may use them, but the #PBS -lnodes statement specifies the number of nodes for the job.

To run the job from the front end, type:

$ qsub <script-name>

For example, if you called your script pbsjob.sh, you would type

$ qsub pbsjob.sh

A more complicated example uses the fact that local scratch space is available on all of the batch nodes. Each process will be started in a directory on the local disk. This means that writing dump files, etc., is much faster but you will have to collect the files afterwards, so you will have to find out on which nodes your job was run.

#!/bin/sh
#PBS -lnodes=8
mkdir -p /home/localtmp/you/Run
cd /home/localtmp/you/Run
cp /home/you/project/yourjob.a yourjpb.mpp
mpirun -np 8 /home/localtmp/you/Run/yourjob.mpp

You may use qstat to check on your job's status. If you need to delete your job from the queue before it has started, use qdel.

Consult the manpages on scout for more information about qsub, qstat, and qdel. Another useful manpage is that for pbs_resources.

[Centurion Home] [Centurion Overview]
[Alert] [Using Centurion] [Photos] [Demo Applications]
[Legion Home]

legion@Virginia.edu
http://legion.virginia.edu/

Last modified: Mon Apr 3 13:18:09 2000