13.0 Batch queue host objects

The standard Legion host object creates objects using the process creation interface of the underlying operating system. However, some systems require using a queue management system to take full advantage of local resources. For example, some parallel computers contain a small number of "interactive" nodes, which can be accessed through normal means, and a large number of "compute" nodes, which can only be reached by submitting jobs to a local queue management system.

To make use of hosts that are managed by local queuing systems, Legion provides a modified host object implementation called the BatchQueueHost. Instead of using the standard process creation interface of the underlying operating system, BatchQueueHost objects submit jobs to the local queuing system.

13.1 Starting a batch queue host object

To start a BatchQueueHost object, the standard legion_starthost command is used with the -B flag to indicate the desired host object implementation. For example:

$ legion_starthost -B BatchQueueHost -N /hosts/SP2 \
  SP2.university.edu

13.2 Setting the local queue

The BatchQueueHost can be used with a variety of queue systems (LoadLeveler, Codine, PBS, and NQS are the currently supported queue types). The type of local queue a given BatchQueueHost object will use to manage the local objects is specified in an object attribute set on the BatchQueueHost.

If the host object started in the above example is supposed to use the local "LoadLeveler" queue, the legion_update_attributes command can be used:

$ legion_update_attributes /hosts/SP2 \
  -a "host_queue_type('LoadLeveler')"

Currently, each BatchQueueHost can use only one queue type at a time (i.e., if multiple local queuing systems are available, they can not all be used by the same BatchQueueHost -- an individual BatchQueueHost would need to be started to represent each queue). Typically, though, individual machines are managed by a single queue.

13.3 Before running objects on the new host

By default every Legion class contains a desired_host_property attribute specifying that it be run on an interactive host:

desired_host_property('interactive')

You can use the legion_list_attributes command to check this. This signals the scheduler that the class's instances should not run on BatchQueueHosts. This is based on the conservative assumption that any class can run on interactive hosts, but not all classes can run on batch hosts.

To allow instances of your class to run on BatchQueueHosts, you can just remove this attribute:

$ legion_update_attributes my_class -d \
  "desired_host_property('interactive')"

13.4 Troubleshooting

If you are having trouble creating objects on a BatchQueueHost, there are several points of possible trouble. First be sure that you've removed the problem class's interactive desired_host_property (see section 13.3). If you still have trouble, you may have a misconfigured host object. Check the following points to be sure that your host object is set up correctly.

  • The right "queue type" attribute should be set on the host. You can use the legion_list_attributes command to check this. For instance, if the host has a LoadLeveler queue, you should run the following command:
  • $ legion_update_attributes /hosts/my_host -a \
     "host_queue_type('LoadLeveler')"

  • This "queue type" attribute points the host object to the location of the local Legion queue management scripts. The above command tells the host to look in $LEGION/bin/QueueManagementScripts/LoadLeveler. If the queue type attribute is host_queue_type('Codine') the host would look for the queue management scripts in its $LEGION/bin/QueueMan- agementScripts/Codine.
  • The appropriate corresponding directory must be in the host's $LEGION/bin/QueueManagementScripts. It should contain the following three queue management scripts.
  •         legion_queue_submit
            legion_queue_status
            legion_queue_cancel
  • These scripts should have execute permissions set for the user-id that will be running the BatchQueueHost.
  • If all of this is set up correctly, the host should be calling the local scripts. If objects are still not being created correctly there may be a problem in the scripts.

You can get a better idea of whether or not the local scripts are being called and what they're doing by looking in the log file maintained by the scripts (look in $LEGION_OPR/Legion-BatchLog). You'll find this log on the host where the BatchQueueHost is running. If the logs indicate that the scripts are never called there may be a scheduling problem.

There is also a six minute delay after you add a new host to the system before which it will not be selected for scheduling, so you may need to wait a few minutes before you can test a new batch queue host.

Directory of Legion 1.6.4 Manuals
[Home] [General] [Documentation] [Software]
[Testbeds] [Et Cetera] [Map/Search]

Free JavaScripts provided by The JavaScript Source

legion@Virginia.edu
http://legion.virginia.edu/