Submitting Jobs via LSF on Lewis


To correctly execute programs on Lewis they must be scheduled using the Load Sharing Facility (LSF).   Logging onto Lewis takes you to the “head” node and through there one has access to 128 “compute” nodes.   LSF schedules the job launched on the head node to run on the various compute nodes:  it finds available cpus and queues your job.   One may have up to 64 cpus running at one time with 200 additional cpus pending.  (This number may be reset during periods of high usage.  If so, a notice will be visible upon logging into Lewis.)  If the total number of cpus requested by a user exceeds 264 all his or her jobs will enter the PEND state from which they will never leave without intervention by the system administrator.   This occurs regardless of how many jobs the user submitted.   The scheduling policies attempt to fully utilize system resources without over-committing those resources, while being fair to all users.   Without LSF your job will run on the head node rather than on one or more of the compute nodes.   It is advised that before using a program that is not well understood the user run some smaller inputs.   See the Monitoring job performance section below to determine how much memory is being used.   You may need to state specific job requirements in the bsub script to make optimal use of LSF.   For more information about running parallel programs on Lewis see the Lewis FAQs.

LSF    Using Additional Computational Resources via LSF MultiCluster    Monitoring job performance and handling large memory jobs    Using the lsrun Command   

LSF

LSF, the Load Sharing Facility, is a subsystem for submitting, monitoring, and controlling a workload of batch jobs across compute servers in a cluster. With LSF, jobs can be scheduled for execution across the lewis compute nodes according to scheduling policies that attempt to fully utilize system resources without over committing those resources, while being fair to all users. For more information about the LSF batch system, see the online manual page, which can be viewed by executing the command:
man lsfbatch
Long running jobs on lewis must be submitted to run under LSF. Short test runs can be run from the interactive login session, but they should be limited to no more than 10 minutes of CPU time, and should run via the lsrun command (see
Using the lsrun Command)

Jobs are submitted to be run under LSF via the bsub command. For complete details on using bsub, see the online manual page, which can be viewed by executing the command:
man bsub
The bsub command has lots of options, and can be used in several different ways. A simple but time- and resource-consuming execution, such as compressing a large file, can be executed with the bsub command followed by the exection between quotes.
bsub "gzip mylargefile.txt"
In this case a large file is compressed with the gzip command. This can take about 99% of a cpu and a lot of time for a large file.

A simple way to define a batch job for submission via bsub is to create a job script file which contains the command(s) which should be run, along with some options for the bsub command. A sample LSF job script file might contain lines like the following:
#BSUB -J jobname

#BSUB -oo jobname.o%J

#BSUB -eo jobname.e%J

./myprog
where jobname is the name that you want to give to the job. The job script file at its simplest can be just a sequence of commands to be executed. But in general, it can be any script. The lines that begin with #BSUB are special comments that are used to specify job parameters to bsub. The parameters on the #BSUB statements can also be specified with the bsub command on the command line when the job is submitted, but it is convenient to put them in the job file so that they are not forgotten. In this example, the #BSUB statements specify a job name, how many CPUs the job will need, and file names for standard output and error output. The job's standard output will be placed in a file in your directory named jobname.onnnn, where nnnn is the job number (indicated by specifying the characters %J in the file name) that is assigned when the job is submitted. The error output will be placed in a file named jobname.ennnn.

If you prefer to have the job output sent to you via email, you can specify an email address instead of specifying output and error file names:
#BSUB -J jobname

#BSUB -u userid@missouri.edu

./myprog
If you do not specify a destination for the job output, it will be sent by default to your mailbox on lewis. There is a limit to how many outputs can be stored that way. Using pine to open the files in your mailbox will allow you to save them to your local directory.

You can also use #BSUB comments to specify any other bsub command options that you want to include, and multiple options can be specified on a single #BSUB comment line. The bsub command options can also be included on the bsub command itself, but it is a good idea to put them in the job file so that they are not forgotten.

If your job will use more than one CPU, you must specify the number of CPUs required by using the -n option of the bsub command. For example, if want to run a parallel program that will use 16 CPUs, you would include the following line in your job script file:
#BSUB -n 16
The lewis cluster has 128 nodes with 4 CPUs each, so in theory a job could use up to 512 CPUs, although in general, not all of them will be available at any given time. But a program will not be able to use more than 4 CPUs unless it is designed for a multi-node parallel environment. This generally requires the use of parallel programming tools, such as MPI. For more information about how to run MPI jobs, see Running MPI Jobs on Lewis.

Programs that use multi-threading with shared memory to run on multiple CPUs can only use a maximum of 4 CPUs, and must have all of those CPUs on the same node. To insure that LSF allocates all of the CPUs for a multithreaded program on the same node, use the -R option of bsub to specify a resource requirement for a single host. For example:
#BSUB -n 4

#BSUB -R "span[hosts=1]"
Please note that the number of CPUs that you specify for a job is very important for job scheduling on lewis. Job scheduling is based upon the load level of the system's compute nodes and the number of CPUs that are in use. The LSF scheduler cannot "look inside" of a job to determine how many CPUs it will actually use, so LSF must be told how many CPUs a job will use via the -n option for the bsub command when the job is submitted. Please be sure that the number of CPUs that you request is correct for your job. If the number of CPUs is not specified, LSF will assume that 1 CPU is needed. If you request more CPUs than your job will actually use, then your job may wait in the queue, even though enough CPUs are available for it. If you request fewer CPUs than your job actually uses, then your job may be cancelled. Since lewis is shared by many users, jobs that request very large numbers of CPUs may have to wait longer in the job queue, depending upon system load.

Once the job script file has been created, it can be submitted for execution via the bsub command, as follows:
bsub < scriptfile
The bsub command will respond with a line like the following:
Job <nnnn> is submitted to default queue <normal>.
where nnnn is the job number assigned to the job. The job number can by used to identify your job with other LSF commands, like bjobs, bhist, bpeek, and bkill. See the man pages for those commands for complete details. The job number is also used to identify output files for your job if you set up your command file like the first example above. Use the bjobs command to list your jobs that are queued and executing on lewis. For example:
bjobs -a
See the man page for bjobs for complete details.

When the job is executed, LSF will create an environment for it that is as much like your normal login environment as is possible. The job will run under your userid, and the initial working directory will be the directory from which you submitted the job. If the files that your jobs reference are not in the directory from which it was submitted, use relative file pathnames or full file pathnames as appropriate.

There are several job queues on lewis. But in general, you should not specify a queue name when submitting the job unless asked to do so by the system administrator, for specific types of jobs. Jobs will be routed by default into an appropriate execution queue based upon job resource requirements. One exception is for short test jobs. An LSF queue called short has been created for test runs that will not need more than 15 minutes of CPU time. Jobs in the short queue have a higher priority than normal jobs, but they are limited to 15 minutes of CPU time. You can submit a job to the short queue by specifying the queue name as an option via a #BSUB comment in the job script file,
#BSUB -q short
or on the bsub command when you submit the job:
bsub -q short < scriptfile
Using Additional Computational Resources via LSF MultiCluster

Additional compute nodes in another, smaller cluster are available to run some jobs from the lewis cluster when all of the CPUs in the lewis cluster are busy, and there is a backlog of pending jobs in the job queue. The nodes in the second cluster are best suited for running serial, single CPU jobs, or multi-threaded jobs that use multiple CPUs that are all on the same node. Parallel MPI jobs that have been compiled with the Topspin InfiniBand libraries (the default for compiling MPI jobs on lewis) will not run on the other cluster, as it does not have an InfiniBand network connecting the compute nodes like lewis does.

In order to allow your serial jobs to run on the second cluster, you must submit the jobs to a special LSF MultiCluster queue called "multi", using the -q option of BSUB, by including the following line in your job script file:
#BSUB -q multi
Or you can specify the -q option on the bsub command when you submit the job:
bsub -q multi < scriptfile
Jobs that are submitted to the multi queue will run on lewis if CPUs are available there. They will only be forwarded to the second cluster if there are no CPUs available on lewis.

The software environment on the second cluster should be identical to lewis (except for InfiniBand MPI libraries), and all of your files are also accessible. If you have a job that runs successfully on lewis, but fails when submitted to the multi queue and run on the second cluster, please send details to
support@rnet.missouri.edu.

Monitoring job performance and handling large memory jobs

To test the performance of your job you need to know the compute nodes on which it is running.  Use the bjobs -w command to get the list of nodes.  If, e.g., compute-22-28 is identified as one of the nodes your job is running on then use the following command to list the jobs running on that node:
lsrun -P -m compute-22-28 top
Identify your process(es) in the listing of all the jobs on that node and look under the %MEM column to see how much memory you are using.   If you are using more than 25% then you should consider specifying the memory requirement for your job.   This is done by including the following in the script provided to bsub:
#BSUB -R"rusage[mem=nnnn]"
where nnnn is in MB, and is per node.   The maximum available on any node is ~5,700 MB.   If a job spans multiple nodes, each node will have to have nnnn available.

One can specify a larger (6 GB) node and prevent competition from other users by using the following:
#BSUB -n 4
#BSUB -R "rusage[mem=5500] span[hosts=1]"
Even if you are not using all 4 cpus this will prevent other jobs from competing with yours for memory.  Note: specifying this large amount of memory will tend to delay the execution of your program as the scheduler must accomodate all users.   Thus it is not advised you do this unless you know you need it.

Using the lsrun Command

The lsrun command is used to run an interactive task via LSF. LSF selects the compute node with the lowest CPU and memory load and runs the specified command on that compute node. Use lsrun to execute short test runs of programs, perform large compilations, edit large files, or execute any interactive command that uses a lot of CPU or memory. To execute a command that does not need to interact with the terminal, use this simple form of the lsrun command:
lsrun command [argument...]
By default, lsrun does not create a pseudo-terminal when running the command. So, if you are running a command which performs interactive I/O, such as an editor, or a command which prompts for subcommands, specify the -P option to create a pseudo-terminal:
lsrun -P command [argument...]
For more information about accessing lewis in general, see Accessing Lewis.