8.0 Executing programs remotelyRemote execution allows you to run remotely executed programs from the Legion command line, taking advantage of Legion's distributed resources. Your workload can be spread out on several processors, improving job turn-around time and performance. You can execute multiple instances of a single program or execute several different programs in parallel. A remote program is an executable process that can be started with a command-line utility. Note that a remote program is not part of the Legion system, but is an independent program that may or may not be linked to Legion. Remote programs might be shell scripts, compilers, existing executables, etc. The term "remote" here means that the program is not or may not be executed on the local host. 8.1 Linked and independent programsLegion permits two kinds of programs to be run remotely: Legion-linked and independent. A Legion-linked program is linked to the Legion libraries. An independent program is not linked to the Legion libraries. For example, a shell script or a program written and compiled on non-Legion systems are independent programs. 8.2 Registering independent programsBefore either a linked or independent remote program can be run from Legion it must be registered . The legion_register_runnable command registers Legion-linked programs. The legion_register_program command registers independent programs. Both of these commands require parameters containing a context path name, the program's binary path name, and the binary's architecture. You can register a program multiple times to run on different architectures. The legion_run command runs registered programs. 8.2.1 Independent programsIndependent remote programs are registered with legion_register_program. This includes information about the program's architecture (e.g., linux, sgi, solaris, etc.), its binary path name, and a program class. Once the program has been registered, it can be executed with legion_run. legion_register_program The <program class> parameter is a context path name for a class that will manage the program's Legion objects. The path name can be whatever you find most convenient. You might want to use the program's name so that it will be easier to remember (e.g., if I'm registering the program MyProgram I will use MyProgram as the <program class> parameter). The context path name can be either new or a previously created path name (if you are registering multiple versions of the same program). It will refer to a (new or previously created) Legion object that permits an independent remote program to execute. If multiple programs are registered with the same program class and architecture, the most recently registered version will be used when the program is run on that particular architecture. Be sure to reregister the program if you recompile it: Legion makes a copy of your executable for context space only when you register it and will not update its copy when you update the original. An example of this command might look like this:
This output shows Legion creating the program class myProgram, as the command requested. If this class had been previously created, the output would look like this:
8.2.2 Legion-linked programsThe legion_register_runnable command registers programs that are linked to the Legion libraries, and exports a runnable object interface. The command creates an interface between the Legion system and the program. Like legion_register_program, this command requires the remote program's program class, executable binary path, and the binary's architecture as parameters. legion_register_runnable The <program class> parameter is a context path name for the Legion objects that will handle a particular program. You can choose a context path that suits your organizational scheme. If you reregister a program and use the same program class name, Legion will use the most recently registration information when running the program. Once the program has been registered, you can run it with legion_run. You must reregister the program if you recompile it (Legion copies your executable to context space when you register it, so the most recently registered copy will be run).
This output shows Legion creating the program class myProgram, as the command requested. If this class had been previously created, Legion will simply register the new information:
8.3 Running a program remotelyUse legion_run to start a single instance of a program on a remote host. If you are running a serial program with many input files and/or multiple executions you may prefer to use the legion_run_multi command (see page 104 in the Reference Manual). The program must be registered with either legion_register_program (for independent programs not linked to the Legion libraries) or legion_register_runnable (for Legion-linked programs) before you run it. There are a number of optional parameters associated with legion_run. To try to keep things simple, we will not discuss all of them here: please see page 100 for the complete syntax and explanation of all options. The legion_probe_run command is a useful aid when remote programs. It lets you pass input files to the remote host after the program has started running, pick up output files, check the job's status, and clean up the remote host after the job has finished. 8.3.1 Choosing a remote hostLegion will choose a host with an appropriate architecture if you do not specify one with the -h flag. The host must be part of your system (i.e., have a host object and be listed in the /hosts context: if necessary, see "Adding a new host" in the System Administrator Manual). 8.3.2 Command-line argumentsIf your program requires command-line arguments, you must include them as the final parameters to legion_run. They can not be put in an option file. 8.3.3 Getting input files to the remote hostYour program may expect to find certain files in its local directory at run time, so when you execute it on a remote host you must pass copies of any expected input files to that host. If you know the name of the remote host, you can pass the files by hand, but otherwise you'll find it much easier to tell Legion the name of the files that need to be copied. You can do this before or after the program starts. 8.3.3.1 Before the program startsYou can use the -in and -IN flags to move files onto the remote host before the program starts. The -in flag gets files from context space and the -IN file gets them from the local host. Or, use legion_probe_run to move files to the remote host after the program starts. 8.3.3.2 After the program has startedUse legion_probe_run to move input files to the remote host after the program has started running (see section 8.3.8). If you wish to use this method, please note that you must have started the program with a probe file (see section 8.3.6). You can use the -in or -IN flags to pass input files from context space or local file space to the remote job's current working directory. These flags are identical to legion_run's -in and-IN flags, described above. 8.3.4 Getting output files from the remote hostOnce the program has finished, you need to retrieve the output files. If you know the name of the remote host and you are running in nonblocking mode you can get the files by hand (see section 8.3.7). Otherwise, you will need tell Legion the names of the output files that you wish to retrieve and whether you wish to put them on your local host or in context space. You can do this before or after the program starts. 8.3.4.1 Before the program startsYou can use legion_run's -out and -OUT flags to specify which files you want copied from the remote host. The -out flag copies files into context space and the -OUT file copies them onto your local host. If the program has terminated because of a crash, Legion may not be able to copy all of the results. Into context space
Onto your local host
8.3.4.2 After the program has startedYou can use the -out and -OUT flags with legion_probe_run to specify which files you want copied from the remote host after the program has started running. If you wish to use this method, please note that you must have started the program with a probe file (see section 8.3.8). The -out flag copies files into context space. The -OUT file copies them onto your local host. These flags are identical to legion_run's -out and-OUT flags. 8.3.5 Option fileIf you find that you are trying to keep track of too many options when you start legion_run, you may wish to use an option file. This is a text file that holds a list of all of your flags and optional settings for that run. The file can be delimited with tabs, spaces, or new lines and can contain any of the legion_run flags except for the program class name and any command-line arguments: those must appear on the command line, along with the -f flag and the option file name. Please note that you can only use legion_run flags in this file. 8.3.6 Creating and using a probe fileA probe file is a Unix file on your local machine. The legion_probe_run command uses it to contact a remote job. To create one, simply use the -p flag when you start legion_run. If you run in blocking mode, legion_run automatically will remove a job's probe file when the job is finished. If you run in nonblocking mode, legion_probe_run's -kill option will delete it as part of its cleaning-up operation. Otherwise, the file will remain in your local file space.The probe file is good for only one remote job, so if you reuse the name Legion will simply write over the previous file if it still exists. 8.3.7 Blocking vs. nonblockingYou can run the legion_run command in blocking or nonblocking mode. The default mode is blocking. This means that legion_run will continue to run on your command line until the remote job has finished. All output files will be collected from the remote host, the remote directory that was used to run the job will be cleaned up, and the command will exit. BlockingThe basic steps in running a blocking remote job are shown below in Figure 6.
In this example, a user starts a remote job on his local host in blocking mode. The following events occur:
You can use ^C to kill the job prematurely. NonblockingIf you start legion_run with the -nonblock flag, on the other hand, the command will start the program on the remote host, wait ten seconds, verify that the program can start, and then die. When the remote host has finished executing the program, the job will copy any files named with the -out flag into context space but ignore any files named with the -OUT flag. The job's working directory will remain on the remote host. The basic steps are shown below in Figure 7.
In this example, a user starts a remote job on his local host using the -nonblock flag. The following events occur:
Since this is nonblocking mode, the remote job's working directory does not get cleaned up. The remote host will hold on to the job's working directory for six hours. During this period, the user can remove any remaining output files by hand. If he started a probe file, he can remove copy output files to his local host with legion_probe_run's -OUT flag.1 If the user does not take clean up the remote host in that period, the entire directory will be tarred and moved into the user's context scratch space (see section 8.3.9). 8.3.8 About legion_probe_runThe legion_probe_run command checks a remote job that was started with legion_run. You must know the name of the job's probe file (see above). You can use this command to pass input files, pick up output files, find what host is running your program, check the job's status, see what files are in the job's current working directory, kill the job, and clean up the remote host when the job is finished. An example is shown in Figure 8, below.
In this example, a user starts a job using -nonblock to run it in nonblocking mode and -p to create a probe file. The following events occur:
The user can then run legion_probe_run with the -kill flag. This destroys the remote job's working directory and the probe file. Please note that if you use this flag before the job is finished you will terminate the job and lose all data. Please see page 96 in the Reference Manual for more information about this command. 8.3.9 Context scratch spaceLegion uses context space as backup storage (scratch space) for any remote jobs that finish and are not cleaned up by legion_run or legion_probe_run. If the job finishes and is not picked up or checked for six hours, the remote host will tar, compress, and move the job's working directory into your context scratch space. The default scratch space is /tmp but you can set it to anywhere in your context space.2 There are several ways to do this.
If you set it multiple times, the most recent setting will be used. 8.3.10 Retrieving files from context scratch spaceIf your remote job's working directory was sent to context scratch space, you'll need to copy it to your local host then uncompress and untar it in order to use it. First, run legion_cp:
Be sure to give the local file name a *tar.Z suffix. You then need to run uncompress and tar:3 8.4 ExampleSuppose that you have a program called Doris that you wish to run on a remote linux host. The first step is to determine whether it is linked to the Legion libraries (Legion-linked) or not (independent). In this case, we'll suppose that Doris is not linked to the Legion libraries.
8.5 Converting a C/C++ programAny C or C++ program can be made into a Legion runnable object using the following steps:
1. If you run legion_probe_run on a terminated job, the clock will restart and give you another six hours. 2. You may need a different scratch space if security is enabled. You may not have permission to work in the /tmp context or your system administrator may have removed the context altogether. 3. These are common Unix tools that should be available on all Unix platforms. If you do not have access to these tools, please contact us at legion-help@virginia.edu.
legion@Virginia.edu
|