Submitting Job Arrays on Lewis
A job array is a structure that allows execution of the same program with the same resources multiple times but with different input files.
LSF, the load sharing facility on Lewis, is configured to allow a user to run up to 64 cpus at one time, with 200 cpus
pending. (Note that the value of 64 may be decreased at times of high demand. This will be announced on the
login message to Lewis.) One job array can thus carry jobs requesting no more than a total of 264 cpus, but the user will
not be able to submit any other jobs until some of these are done.
Requests for more than 264 cpus puts all jobs in the array into the PEND state from which they never leave
until killed by the system administrator.
Each job in the array shares the same job id as the array itself but has a unique array index.
The following is an example of how one might create a simple job array:
bsub -J "myArray[1-10]" ./myProgramLSF provides runtime variables (%I and %J) and an environment variable (LSB_JOBINDEX) that facilitate handling of output and input. In the next example "mmm.output.nn" files are created with the job ID (%J) as prefix mmm and the job array index value (%I) as suffix nn. Different "input" files numbered 1 to 10 are presumed to exist (input.1, input.2, etc.) and are command line arguments to myProgram. (One could include paths to different directories for the input and output.)
-J "myArray[1-10]" gives a name to the job array and the double quotes (") are required. The indices of the job array are specified in square brackets, here from 1 to 10. This would run myProgram 10 times and all output would be sent to your mailbox.
bsub -J "myArray[1-10]" -o "%J.output.%I" ./myProgram input.\$LSB_JOBINDEXIf myProgram takes standard input and the input files are numbered as above, the command would look like this:
Note that the backward slash ( \ ) is required to escape the $LSB_JOBINDEX variable.
bsub -J "myArray[1-10]" -o "%J.output.%I" -i "input.%I" ./myProgramThe following commands give different information about a job array with an ID of 77793:
bjobs 77793The job ID is used in each case, and square brackets appended to the job ID (e.g., "77793[5]") indicates a specific element of the array.
bjobs -A 77793
bjobs 77793[5]
bhist 77793
Job Array Dependancies
The execution of a job can be made contingent on the completion, or partial completion, of a job or job array. The "-w" flag of bsub will prevent submission of your job unless the dependency expression following the flag is TRUE. In this example, unless the entire job array called "myArray" is complete the subsequent program "collateData" will not execute. Once it does it returns output and error files as described elsewhere. Note that if LSF cannot find the dependant file your job submission will fail.
bsub -w "done("myArray[1-10]")" -J collate -o %J.collate.o -e %J.collate.e ./collateDataPartial completion of a job may also be used as a condition for execution of a subsequent job, but the job ID must be given. In this example the job ID is 12345 and the condition becomes TRUE when more than 5 of the elements of the job array are done. Note that that might not be the first 5 elements of the array since the elements may run on different nodes and thus encounter different competition for cpus.
bsub -w "numdone(12345,>5)" -J collate -o %J.collate.o -e %J.collate.e ./collateDataOther logical operators and conditions may be used and are described in the bsub man page under the topic "-w 'dependency_expression'". The conditions include ended, exit, external, numrun, numpend, numexit, numended, numhold, and numstart.