Legion: A Worldwide Virtual Computer
Home General Documentation Software Testbeds Et Cetera Map/Search

legion_mpi_run FAQ

1. What does this command do?[go]
2. What is the difference between Legion MPI and native MPI?[go]
3. Can I run a native MPI program as a Legion MPI program?[go]
4. Can I start more than one program at a time?[go]
5. Which MPI functions are supported?[go]
6. What are the prerequisites?[go]
7. What is an option file?[go]
8. What if I don't use an option file?[go]
9. How can I control which hosts run my program(s)?[go]
10. How can I track my program's instances?[go]
11. I want to put my program's instances in a specific context[go]
12. I need to control placement of the first process[go]
13. I need all of my children to be in a specific directory on the remote host(s)[go]
14. What is a specification file?[go]
15. My program expects input files -- how do I get them to the remote host?[go]
16. My program will produce output files -- how do I get them back?[go]
17. How do I use wild cards?[go]
18. Can I set stdin/stdout/stderr on the remote hosts I'm using?[go]
19. Can I set environment variables for MPI processes?[go]
20. What if a program or host crashes?[go]
21. Can I use the fault tolerant mode with all of my applications?[go]
22. How do I adapt my program to use the checkpoint library?[go]
23. Do I need to do anything special to compile a C application with the checkpoint library?[go]
24. Do I need to do anything special to compile a Fortran application with the checkpoint library?[go]
25. OK, I've adapted my code. What do I do to get it started?[go]
26. What are the fault tolerant flags?[go]
27. Can I deactivate my program?[go]
28. Can I debug my instances?[go]
29. Some hints to make life easier[go]

What does this command do?

It starts one or more previously registered MPI programs on one or more Legion hosts. It is fully documented here.
What is the difference between Legion MPI and native MPI?
There are two ways to run MPI programs on Legion: Legion MPI and native MPI. Legion MPI programs uses code adapted to run in Legion and to link to Legion libraries. They can only be run on machines with Legion binaries installed. A native MPI program does not need to be edited for or compiled in Legion, nor does it need to be linked to the Legion libraries (although you can if you wish).
Can I run a native MPI program as a Legion MPI program?
No, you must adapt native MPI code in order to run it as Legion MPI. However, you can run native MPI code on Legion with legion_native_mpi_run
Can I start more than one program at a time in a single run?
Yes, if you use an option file.
Which MPI functions are supported?
All MPI 1.1 functions and some MPI 2.0 functions are currently supported in Legion MPI. Please contact us at legion-help@virginia.edu for help.
What are the prerequisites?
You must have previously registered the program(s) with Legion, with legion_mpi_register (see section 10.1.4 in the Basic User Manual). If there are any required input files, they must be visible in your local file space or context space.
What is an option file?
An option file is a text file containing a list of all MPI programs you wish to start as well as their command line arguments (one program per line). You must use the -f flag to indicate that you are using an option files. A sample file might look like this:
-n 2 /mpi/programs/mpitest
-n 3 /mpi/programs/mpitest_c

Be sure that all of these programs have been registered with Legion and that they use a common MPI_COMM_WORLD. You can include any legion_mpi_run flags except -f.

Note that if you use an option file Legion will ignore the -n flag, program name, and any arguments on the command line. Any <flags> will be treated as default settings and applied to all of your programs (unless otherwise specified in the options file).

What if I don't use an option file?
In that case you'll need to specify (with -n) how many hosts you want the program(s) to run on, the program class full context path, and any necessary arguments.

You can't start multiple programs without an option file.

How can I control which hosts run my program(s)?
By default Legion will randomly select one or more of the available compatible hosts. You can use the -h flag to name a set of hosts on which your program(s) will run. Notice that this does not point to a particular host but to a set: it is accompanied by a context that contains the names of one or more hosts (e.g., /hosts/MPIhosts/).

You can also use a specification file.

How can I track my program's instances?
While your program is running, you can view the running objects with legion_ls. By default all of your program's instance LOIDs can be found in Legion context space at /mpi/instances/<program_name>:
legion_ls /mpi/instances/<program_name>

In a secure Legion system the default context is /mpi/<user_name>/mpi/instances/<program_name>.

You can get more detailed information (host, vault, state, object address) with legion_list_instances:

legion_list_instances /mpi/programs/<program_name>

You can destroy all of a program's instances with legion_destroy_instances. Be aware, though that this command will destroy all of the class's instances.

I want to put my program's instances in a specific context
By default all of your program's instances are in /mpi/instances/<program_name> or (if Legion security is enabled) /mpi/<user_name>/mpi/instances/<program_name>. If you want to put them in a different context, you can use the -p flag.
I need to control placement of the first process
Use -0 with the host context name. Process zero will run on this node.
I need all of my children to be in a specific directory on the remote host(s)
Use -d, along with the Unix path, to tell Legion to move all children to the specified directory before they begin to run.
What is a specification file?
A specification file is a text file that tells Legion which hosts you wish to create your objects on and the maximum number of objects that can be created on each host. You must include either the -HF (if the file is in your local directory space) or -hf (if the file is in context space) to indicate that you are using a specification file. You can use legion_make_schedule to create this file or do it yourself.

If you create it by hand, each line should contain the hosts' context path (e.g., /hosts/<name>) an integer (if no integer is listed the default is 1). You can list a host more than once, in which case the integer values accumulate.

My program expects input files -- how do I get them to the remote host?

You must make sure that these files are visible from either local file space or context space and you must use either -in or -IN to tell Legion to copy the contents of your input files to the remote host. The copied files will be placed in the local directory of the remote host and given the same file name. Once the program is finished, the copied files are deleted.

You need to use -a if you need to get these files to some or all of your nodes. The default setting passes input files to node 0 only.

You can use these flags multiple times to indicate multiple input files or you can use wild cards to identify groups of files.

My program will produce output files -- how do I get them back?

Use either -out or -OUT to tell Legion to look for these files after the program calls MPI_Finalize or MPI_Abort and to copy them back to your local file space or context space.

You need to use -a if you need to pick up files from some or all of your nodes. The default setting checks node 0 only.

The new local files will have the same name as the remote output files, unless you used -a. In that case, the local copies will be named <file/context name>.<mpi object id>.

If the program crashes midway through, any existing output files will remain in the program's current working directory. If an output file is expected but does not appear, Legion will create an empty file of the expected name in your local directory.

You can use these flags multiple times to pick up multiple output files or you can use wild cards to identify groups of files.

How do I use wild cards?
You use wild cards to work with groups of input/output files. The following wild cards can be used with -in/-out and -IN/-OUT:
*	match 0 or more characters
?	match any one character
[-]	match any character listed between the brackets
        (use these to specify a range of characters)
\	treat the character as a literal

For example, if you wanted to identify done.1, done.2, done.3 ... done.9 as your inputs, you could use square brackets to identify them as a group:

$ legion_mpi_run -n 2 -IN done.[0-9] \ 
    /mpi/programs/mpiFoo

You can use wild cards on the command line or in an option file. They can only be used with file names, however, not with directories.

Can I set stdin/stdout/stderr on the remote hosts I'm using?
Yes. You can map std* to Legion file objects in context space with the -stdin, -stdout, and -stderr flags, or to local files with -STDIN, -STDOUT, and -STDERR. If you are usng -stdout or -stderr, standard output/error will first be written to a file in the remote host's local directory then copied to your context space. The remote host's file will have the same name as the new context file object (i.e., if you specify -stdout outputFoo the remote host's file will be called outputFoo).

If you are running on multiple nodes, use the -A flag to pass these files to all nodes. The default setting is to pass these files to only node 0. If you are using -stdout or -stderr, standard output/error will first be written to a file in the remote host's local directory then copied to your context space. The remote host's file will be called <context name>.<mpi object id>.

Can I set environment variables for MPI processes?
The -D flag lets you set an environment variable for all MPI processes after they have called MPI_Init(). You can use the flag more than once to set multiple variables.
What if a program or host crashes?
The more objects you run and the longer you run them, the greater the chances that a host, application, or network will fail. By default, if a host fails Legion will try to find another compatible host. If a job fails, partial results will be available but the job will not be restarted.

Alternatively, you can run Legion MPI in fault tolerant mode. In that mode, if the a job fails it automatically restarts itself. In the same mode, you can use the -R flag to restart the job from the most recent checkpoint if the host fails. If you use this flag, be sure that you also use -h, so that Legion starts it on a host of your choosing.

Can I use the fault tolerant mode with all of my applications?
No. The fault tolerant mode utilitizes a checkpoint library for SPMD-style (Single Program Multiple Data) applications. It requires a few changes to your program and should be limited to SPMD applications only, since we exploit SPMD applications' regularity. They are characterized by regular communication patterns and each task, typically, is responsible for a region of data and periodically exchanges boundary information with neighbor tasks.
How do I adapt my program to use the checkpoint library?
The library provides checkpoint management support, such as fault detection, checkpoint storage management, and application recovery. You are responsible for placing checkpoints consistently (the natural places at are the top or bottom of the main loop) and writing functions that will save and restore the checkpoint state.

For example, here is a piece of code included in the standard Legion distribution at $LEGION/src/ServiceObjects/MPI/ft_examples. The first fault tolerance function it calls is MPI_FT_Init(), which initializes the checkpointing library and returns TRUE if the object is in recovery mode. In recovery mode, Legion restores the last consistent checkpoint.

//
// Save state. State consists of the iteration count.
//
do_checkpoint(int iteration) {
        // pack state into a buffer
                MPI_Pack(&iteration, 1, MPI_INT, user_buffer,
                     1000, &position, MPI_COMM_WORLD);
        
        // save buffer into checkpoint
                MPI_FT_Save(user_buffer, 20);

        // done with the checkpoint
                MPI_FT_Save_Done();
}

//
// State consists of the iteration count
//
do_recovery(int rank, int &start_iteration) {
        // Restore buffer from checkpoint
                size = MPI_FT_Restore(user_buffer, 1000);

        // Extract state from buffer
                MPI_Unpack(user_buffer, 64, &position,
                 &start_iteration, 1,
                MPI_INT, MPI_COMM_WORLD);
}

//
// C pseudo-code
// Simple SPMD example
//
// mpi_stencil_demo.c
//
main(int argc, char **argv) {
        // declaration of variables omitted...

        // MPI Initialization
        MPI_Init (&argc, &argv);
        MPI_Comm_rank (MPI_COMM_WORLD, &nodenum);
        MPI_Comm_size (MPI_COMM_WORLD, &nprocs);

        // Determine if we are in recovery mode
        recovery = MPI_FT_Init(nodenum, start_iteration);
        if (recovery)
            do_recovery(nodenum, start_iteration);
        else
            start_iteration = 0;

        for (iteration=start_iteration; 
             iteration < NUM_ITERATIONS; ++iteration) {

        exchange_boundary();

        // ...do some work here...

           // take a checkpoint every 10th iteration
           if (iteration%10==0)
                do_checkpoint(iteration);
        }
}

In do_checkpoint(), you save state by packing variables into a buffer, via MPI_Pack(), and then calling MPI_FT_Save(). To save more than one data item, you can call MPI_Pack() several times, packing several items into a buffer, and then call MPI_FT_Save(). If the state to be saved is very large, you can break it down into chunks and call MPI_FT_Save() for each chunk. When all datas items for your checkpoints have been called, you can call MPI_FT_Save_Done().

In do_recovery(), you recover state. The sequences of MPI_Unpack() and MPI_FT_Restore() must be mirror images of the sequences of MPI_Pack() and MPI_FT_Save() in do_checkpoint(). The API for C and Fortran is in section 10.1.10.2 in the Basic User Manual.

Do I need to do anything special to compile a C application with the checkpoint library?
You'll need to link it to the -lLegionMPIFT and -lLegionFT libraries. There is a sample makefile and application in the standard Legion distribution package at $LEGION/src/ServiceObjects/MPI/ft_examples.
Do I need to do anything special to compile a Fortran application with the checkpoint library?
You'll need to link it to the -lLegionMPIFT, -lLegionFT, and -lLegionMPLFTF libraries. There is a sample makefile and application in the standard Legion distribution package at $LEGION/src/ServiceObjects/MPI/ft_examples.
OK, I've adapted my code. What do I do to get it started?
You'll need a CheckpointStorage object. If you haven't yet done it, Run the legion_ft_initialize script on your command line (you only need to run this script once, since each user only needs one CheckpointStorage object).

Follow these steps:

  1. Compile your application.

  2. Register it with Legion.

  3. Create an instance of the CheckpointStorage object in your context space:
    legion_create_object /CheckpointStorage /home/<user_name>/Foo

  4. Run the applicatio, using the fault tolerant flags. The example below specifies a ping time of 200 seconds and a reconfiguration time of 500 seconds.
    legion_mpi_run -n 2 -ft -s /home/<user_name>/Foo -g 200 \
      -r 500 mpi/programs/<program_name>

  5. Once the application has successfully finished, you should delete the CheckpointStorage object:
    legion_rm -f /home/<user_name>/Foo

What are the fault tolerant flags?
There are five flags, which must be used in specific combinations.
-ftTurn on fault tolerance
-s <checkpoint server>Specifies the checkpoint server to use.
-R <checkpoint server>Recovery mode. If a host fails the entire application will automatically restart.
-g <ping interval>Ping interval. Default is 90 seconds.
-r <reconfiguration interval>When failure is detected (an object has not responded in the last x seconds) restart the application from the last consistent checkpoint. Default value is 360 seconds.

You must use the -ft in all cases and you must use either -s or -R. The -g and -r flags are optional.

Can I deactivate my program?
No, sorry.
Can I debug my instances?
Yes, although you'll need to run another command. The legion_mpi_debug command will return a list of all of the program's instances and what each one is doing. Please note that this command is acting on Legion instances, not on the program itself or on its input and output files.
Some hints to make life easier
  • We suggest that you run in verbose mode, especially if your program runs for more than a few seconds. You can use up to four -v flags.

  • Use the -S flag to see the program statistics at exit.

  • You must use an option file if you want to start multiple applications in a single run.

  • If you are using the fault tolerant mode, don't set the ping and timeout values too low: since ping and timeout values determine an object's state, a low timeout value may prematurely declare an object dead.

  • The fault tolerant mode does not handle file recovery. If you are running an application that writes data to files, you are responsible for recoving file pointers.

  • Note that in fault tolerant mode data that is packed or unpacked using MPI_Pack() and MPI_Unpack()will not be properly converted in heterogeneous networks.

Other FAQs

Last modified: Wed Jun 20 11:10:46 2001

 

[Home] [General] [Documentation] [Software]
[Testbeds] [Et Cetera] [Map/Search]

This work partially supported by DOE grant DE-FG02-96ER25290, Logicon (for the DoD HPCMOD/PET program) DAHC 94-96-C-0008, DOE D459000-16-3C, DARPA (GA) SC H607305A, NSF-NGS EIA-9974968, NSF-NPACI ASC-96-10920, and a grant from NASA-IPG.

legion@Virginia.edu
http://legion.virginia.edu/