Tutorial:
Starting a remote program in Legion

Table of Contents
What is a remote program?
What kind of programs can Legion run?
What are the prerequisites
for running a program in Legion?
Registering programs
Input and output files
Running a single instance remotely
Blocking mode
Nonblocking mode
Option file
Running multiple instances remotely
Scheduling the jobs
Running the jobs
Tracking jobs, moving files
Converting a C/C++ program
Other on-line tutorials & documentation
Click on the to move to the selected text. At some points you will see this symbol: . If you click on it, you will go to the FAQ section relevant to that issue.


Depending on how your system is set up, you may need to set up your access to your system before you can run Legion commands. This will probably involve running a command such as this:
$ . ~legion/setup.sh
     or
$ source ~legion/setup.csh
The exact syntax will depend on what kind of shell you are using and where your Legion files are installed (i.e., the value of ~legion will depend on your individual Legion net). Consult your system administrator for more information.

The following style conventions are used in these tutorials:


Legion's remote execution allows you to run remotely executed programs from the Legion command line. Your workload can be spread out on several processors, improving job turn-around time and performance. You can execute a single instance of a program, multiple instances of a single program, or several different programs in parallel.

What is a remote program?
A remote program is an executable process that can be run on a remote host via a command-line utility. The term "remote" here means that the program does not need to be executed on the local host. Note that a remote program is not part of the Legion system but an independent program that may or may not be linked to the Legion libraries. Remote programs can be shell scripts, compilers, existing executables, etc.


What kind of programs can Legion run?
Legion can run two kinds of programs: Legion-linked and independent. A Legion-linked program is linked to the Legion libraries. An independent program is not linked to the Legion libraries. This difference is only important when you are registering the programs (below). After that step, you use the same commands to start the programs.

What are the prerequisites for running a program in Legion?About context space
Before you run a program you must register it with either legion_register_runnable or legion_register_program. The legion_register_runnable command registers Legion-linked programs and the legion_register_program registers independent programs. Both require parameters containing a program class context path, the program's binary path, and the program's architecture. The program class context path parameter is used to create a Legion class object that will oversee the execution of the program in a Legion system. Once you've registered it, you can run it with either legion_run or legion_run_multi.

Registering programs
Legion-linked programs
The legion_register_runnable command registers a program that is linked to the Legion libraries. The syntax is:
legion_register_runnable <program class> <executable path> <architecture>

The currently available values for <architecture> are:

solaris corresponds to Sun workstations running Solaris 5.x
sgi corresponds to SGI workstations running IRIX 6.4
linux corresponds to x86 running Red Hat 5.x Linux
x86_freebsd corresponds to x86 running FreeBSD 3.0
alpha_linux corresponds to DEC Alphas running Red Hat 5.x Linux
alpha_DEC corresponds to DEC Alphas running OSF1 v4
rs6000 corresponds to IBM RS/6000s running AIX 4.2.1
hppa_hpux corresponds to HPUX 11
winnt_x86 corresponds to Windows NT 4.0
t90 corresponds to Cray T90s running Unicos 10.x (virtual hosts only)
t3e corresponds to Cray T3Es running Unicos/mk 2.x (virtual hosts only)

The <program class> parameter is a context path name for the Legion objects that will handle a particular program. Choose a context path that best suits you, though we suggest that the path should contain the program's name, so as to make it easier to remember. For example, to register a program called "Doris" you might use the class name "doris":

$ legion_register_runnable doris /bin/programs/Doris linux
Program class "doris" does not exist.
Creating class "doris".
Registering implementation for class "doris"
$
The output shows that the the program class doris is new, so the program has not been previously registered with the class path. You may wish to register multiple implementations of the same program for different architectures. Here, we'll reregister Doris to run on a solaris machine:
$ legion_register_runnable doris /bin/programs/Doris solaris
Registering implementation for class "doris"
$

The output shows Legion registering the new implementation for program class we registered above, so there's no need to create another class. When we run Doris, we'll be able to run it on linux or solaris. If you reregister a program several times using the same program class and architecture, the most recently registered version will be used.

Please note that when Legion registers a program, it makes a copy of the binary executable and uses that copy when you execute the program with legion_run or legion_run_multi. If you change the program, you'll need to reregister it.

Independent programs
Independent programs are registered with the legion_register_program command. The syntax is:

legion_register_program <program class> <executable path> <legion arch>

The currently available values for <architecture> are listed above.

The <program class> parameter is a context path name for the class that will manage Legion objects related to a particular program. An example of this command might look like this:

$ legion_register_program doris /bin/programs/Doris linux
Program class "doris" does not exist.
Creating class "doris".
Registering implementation for class "doris"
$
The output shows that the the program class doris is new, so the program has not been previously registered with the class path. You may wish to register multiple implementations of the same program for different architectures. Here, we'll reregister Doris to run on a solaris machine:
$ legion_register_program doris /bin/programs/Doris solaris
Registering implementation for class "doris"
$

The output shows Legion registering the new implementation for program class we registered above, so there's no need to create another class. When we run Doris, we'll be able to run it on linux or solaris. If you reregister a program several times using the same program class and architecture, the most recently registered version will be used.

Please note that when Legion registers a program, it makes a copy of the binary executable and uses that copy when you execute the program with legion_run or legion_run_multi. If you change the program, you'll need to reregister it.


Input and output files
In all likelihood, your program will expect to find certain input files in its local execution environment when it runs and it will produce certain output files. You'll therefore need to copy input files to the remote host before or while the program runs and output files from the remote host after it runs. If you know the name of the remote host and which directory will be used you can move the files around by hand, but it is easier to use the -in/-IN and -out/-OUT flags. These flags have the following purposes:

-inCopy a file from your context space to the remote job's working directory.
-INCopy a file from your local host (i.e., the legion_run local execution environment) to the remote job's working directory.
-outCopy a file from the remote job's working directory into your context space.
-outCopy a file from the remote job's working directory onto your local host.

You can use these flags with legion_run, when you first start the job, or with legion_probe_run, after the job has started.

Note that you can ask for multiple input and output files when you execute a program by repeating the -in/-out and -IN/-OUT options.


Running a single instance remotely About the tty object
legion_run FAQ
legion_probe_run FAQ
The legion_run command executes a single instance of a program. If you are running a serial program with many input files and/or multiple executions you may prefer to use the legion_run_multi command.

There are many options associated with this command but we will not consider all of them here. In this section of the tutorial, we will look at the steps in two different runs of the fictional program Doris. We will assume that this program was previously registered with the program class name doris and linux architecture and is now ready to run.

You can see the Reference Manual for an explanation of the options and "Executing programs remotely" in the Basic User Manual for more information on using this command.

In both cases, our first decision is whether to run the program in blocking or nonblocking mode. Let's assume that Doris only takes a few minutes to run, so first we want to run the command in its default (blocking) mode. To be even-handed, though, we'll then run it in nonblocking mode. Let us further suppose that Doris needs two input files, "input1" and "input2", and produces two output files "output1" and "output2".

Before we start, we want to be sure that we have a tty object set for the current shell:

$ legion_tty /tty_objects/mytty

Finally, we want to decide whether or not to start a probe file. We don't have to start one, but since we want to check Doris's status as it is running, we will use the -p option to start a probe file called "dorisProbe".

Blocking mode
When you run a program in blocking mode, the legion_run command will continue to run on your command line, blocking any new input until the remote job has finished. Once the job is done, all output files are collected, the job's working directory on the remote host is cleaned up and deleted, and the command exits. Since our program, Doris, doesn't take long to run, it won't be much of an issue.

$ legion_run -IN input1 -IN input2
  -OUT output1 -OUT output2 -p dorisProbe doris

We used the -IN flag to tell Legion to copy the input files from our local host to the remote host before the job started, although you can pass these names along after you start a job. Note that we didn't need to tell Legion to run in blocking mode, since that's the default setting.

Now that job has started, we go to another command line and use legion_probe_run's -hostname and -statjob flags to check on its status and see what host it is running on (being sure to set the tty object in that shell first).

$ legion_probe_run -hostname -statjob -p dorisProbe doris
gander.cs.virginia.edu
Running

The output shows that the job was placed on gander.cs.virginia.edu and that it is still running. If we had wished to place the job on a specific host, we could have used legion_run's -h flag. We also could have used its -d flag to place it in a specify working directory on the remote host.

If you need to kill the job midway through, you can use legion_probe_run's -kill flag. Note, though, that legion_run will continue to block at your command line so you will need to kill it by hand.

Once the program has finished, legion_run copies the -OUT files to our local host, deletes the probe files, and cleans up the remote host.

Nonblocking mode
When you run a program in nonblocking mode, the the legion_run command starts the job on the remote host, waits ten seconds, verifies that the job can start, and then exits. When the job finishes, it copies any files marked with the -out flag into context space but ignores any files marked with the -OUT flag. The remote host can hold on to the job's working directory for six hours, but if you do not take steps to pick up the remaining output files and clean up the remote host during that period, the remote host will tar and compress the job's working directory and put it in your context scratch space.

$ legion_run -nonblock -IN input1 -IN input2
  -OUT output1 -OUT output2 -p dorisProbe doris

We used the -nonblock flag to start the job in nonblocking mode and we started a probe file. We strongly suggest that you always use a probe file with nonblocking runs. As above, we used the -IN flag to tell Legion to copy the input files from our local host to the remote host before the job started.

We can use the legion_probe_run command to check on its status and see what host it is running on. We'll also use the -pwd flag to get the name of the job's current working directory.

$ legion_probe_run -hostname -pwd -statjob -p dorisProbe doris
gander.cs.virginia.edu
/myDir/OPR/BootstrapVaultOPR/ReservedOPR-1.80
Running

The output shows that the job was placed on gander.cs.virginia.edu and that it is still running. The -statjob flag is especially use in nonblocking runs, since you have no other way to find out if the job has finished or hit an error.

Once the program has finished, the remote host will hold on to the /myDir/OPR/BootstrapVaultOPR/ReservedOPR-1.80 directory for six hours. Even though we told legion_run to pick up the two output files, the command has exited by now.* If we do nothing, gander.cs.virginia.edu will tar up the directory and put it in our context scratch space. We will then have to retrieve it, untar it, and get the output files.

To avoid this hassle, use legion_probe_run's -OUT and -kill options. First, we pick up the output files:

$ legion_probe_run -OUT output1 -OUT output2 -p dorisProbe doris

Then we clean up the remote directory:

$ legion_probe_run -kill -p dorisProbe doris

A word of caution about the -kill flag: if the job is still running, it will terminate the job (hence the name). You will lose all data from the job, including all output files.

Option file
If you are trying to keep track of several options when you start your program or are running the program over and over again, you may wish to use an option file. This is a text file that contains a list of the flags and settings you want to use when running your program. You can include any legion_run flags, but you must put the program class name and any arbitrary command-line arguments on the command line. You must also put the -f flag and the option file name on the command line. For example, if we used an option file for the nonblocking Doris run above, it would look like this:

-nonblock
-IN input1
-IN input2
-OUT output1
-OUT output2
-p dorisProbe

In this example we separated the flags by new lines, but you can also use tabs or blank spaces. Doris has no arguments, so we can start it by entering:

$ legion_run -f dorisOptionFile doris

This may be helpful if several people want to run the program at different points, since they can share the option file.


Running multiple instances remotely legion_run_multi FAQ
The legion_run_multi command starts multiple instances of a program on one or more Legion hosts. It is essentially a script that starts multiple copies of a command on a set of hosts. By default it starts multiple copies of legion_run in blocking mode, and that is its expected purpose, but you can start copies of other Legion commands with with the -e flag. It uses a specification file, which is somewhat like the legion_run option file, to keep track of input and output files that may be distributed on several hosts.

This command is complex but more flexible than legion_run. There is a detailed FAQ about this command, as well as documentation on-line and in the Reference Manual. Rather than repeating information provided elsewhere, this section will go through the steps of starting the fictional program Gladys. We'll assume that you've already registered it with the class name gladys and a Linux architecture.

First of all, we want to be sure that we have a tty object set for the current shell:

$ legion_tty /tty_objects/mytty

Scheduling the jobs
You must tell Legion the maximum number of processes can run at a time and/or the name of a schedule file. You name the number of processes with the -n flag and the schedule file with the -s flag. In this case we'll just use a schedule file, called GladysSchedule.txt. We want Gladys to run up to a given number of jobs on the hosts listed in this file:

/hosts/Abba   5
/hosts/Dabba  8
/hosts/Doo    3

Please be sure that the hosts listed in your schedule file are all of appropriate architecture: remember that Gladys is registered to run on linux, so we don't want to try to run it on an RS6000.

Tracking jobs, moving files
Every copy of Gladys will expect to find files called "Gladys1," "Gladys2", "Gladys3", and "password" in its local execution environment. Since several copies will be running on three different hosts, the expected input files must be passed to each copy's working directory. Each copy will also produce output files called "GladysDone" and "GladysAlt". If we were running a single instance there would be no problem, but when running multiple instances we have to consider the prospect of sixteen different "GladysDone" files. Furthermore, how is legion_run_multi (which runs in blocking mode, remember) going to know when all of the jobs are finished or if some have failed?

The solution to this is a specification file. This file controls the distribution and collection of input and output files, constants, and standard input, output, and error files on the different hosts. It also keeps track of each job's status, so that if a job fails it can be moved to another host and restarted. Each line contains three fields: a keyword , a filename, and a pattern . These fields are used to compile a list of jobs to run. For example, our specification file is called GladysSpecification.txt and looks like this:

IN           Gladys1       Gladinput*txt
in           Gladys2       /myContext/Gladys/Gladnew*.ent
in           Gladys3       /myContext/Gladys/Glad*alt.txt
OUT          GladysDone    Gladoutput*.txt
OUT          GladysAlt     /myContext/Gladys/Gladdone*.txt
CONSTANT     password      /crypt/Gladpasswd

Legion compiles a list of input files that fit the pattern field and looks for jobs to be run. If it finds GladnewFoo.ent and GladFooalt.txt in /myContext/Gladys/, that is a potential job that can be called job Foo. It then looks for files that match the output files. If it finds "GladoutputFoo.txt" on the local host and /myContext/Gladys/GladdoneFoo.txt in context space, it knows that job Foo has already been run and that these input files can be ignored. If it doesn't find those two Foo output files, it know that job Foo needs to be run.

If necessary, we can fine-tune the specification file with an exception file. This file provides extra information for some of your program's jobs. For example, if we wanted to be sure that job Foo runs on the Abba host, we could use an exception file.

Running the jobs
We can now start the program:

$ legion_run_multi -v -s GladysSchedule.txt -f GladysSpecification.txt gladys

We suggest that you always use the -v flag so that you can see what is happening, especially if the program will take more than a few seconds to run.

We are starting multiple instances of legion_run. Each copy of legion_runstarts one of the Gladys jobs: Foo, Bar, and Beowulf. Each copy also runs in nonblocking mode and automatically creates a probe file for each job. These files are called "legion_probe_run_<job name>" and are in the ./.legion_run_multi_<random number>_ directory. In our example, the probe files would be called:

legion_probe_run_Foo
legion_probe_run_Bar
legion_probe_run_Beowulf

You can check any of these jobs from the command line by running legion_probe_run.


Converting a C/C++ program
Any C or C++ program can be easily made into a Legion runnable object using the following steps:

  1. The program should export a C-linkable legion_main function in place of its main function.

  2. The program should be linked to the Legion libraries. Add the following to your link line:
    -lLegionRun -lLegion1 -lLegion2


Other relevant on-line documents:
Click on the to go to the page.
Logging in to a running Legion system
Introduction to Legion context space
Context-related commands
Legion tty objects
Running a PVM code in Legion
Running a Legion MPI code
Running native MPI code
Quick list of all 1.7 Legion commands
Usage of all 1.7 Legion commands
FAQs for running programs in Legion
Starting a new Legion system
Legion security
Legion host and vault objects
Adding host and vault objects
Brief descriptions of all on-line tutorials


* The exception to this rule occurs if your programs runs in less than ten seconds. Since legion_run always blocks for ten seconds to be sure that the remote job can start, if your job finished in that period the -OUT files will be picked up. back

Last modified: Tue Sep 5 17:23:55 2000

 

[Home] [General] [Documentation] [Software]
[Testbeds] [Et Cetera] [Map/Search]

legion@Virginia.edu
http://legion.virginia.edu/