Some FAQs About Lewis


How many jobs may I run simultaneously?

LSF will allow a user up to 48 cpus at a time, if they are available.  (When the load on Lewis is low this limit may be reset to 64.)  Note that a single job that requests more than 48 cpus will be queued but will never run unless a request is made to the UMBC.  Up to 200 more cpus may be requested with the same limit of 48 per job, but they will remain in a pending state until enough of the user's running jobs end and LSF decides it is that user's turn.  Also note that if more than 248 cpus are requested at once it is likely that no cpus will be awarded at all, no matter how many jobs they are in.  This is because often enough jobs are in a pending state before they can run.  If all your jobs are in a pending state and more than 200 cpus are requested none of the jobs will be submitted. 

How can I make my program use more than one processor?

A program will not use more than 1 cpu per node unless it is threaded, and if so, it cannot use more CPUs than exist per node.  On Lewis that means 4 cpus.  More than 4 CPUs can be used if the program uses MPI (Message Passing Interface).  Writing programs with these features is demanding of time and talent, so unless that is your bent find some way around it.  For more information about how to run MPI jobs, see Running MPI Jobs on Lewis.

How can I see the number of CPUs and the amount of memory my program is using?

If you are logged onto the Lewis head node the “top” command will only show what is happening there, where you should not be running your program.  To get to the compute node(s) where your program is running
  1. Type “bjobs -w”
  2. Find the node(s) used under the column EXEC_HOST.
  3. To view the performance of your program on the compute node called, e.g., compute-20-17, type “lsrun -m compute-20-17 -P top”
  4. To exit type "q" and you will return to the node you came from. 

How do I know which node my job is running on when I launch using lsrun?

At present if you use lsrun to launch your program there is no way to know which node it has gone to.

My program allows me to choose the number of CPUs.  How do I optimize that number on Lewis?

Trial and error.  But some logic may be brought to bear on this if you know anything about your program.  If it does the same thing on every cpu then look to the size of your input for the answer.  It would not make sense to ask for more cpus than there are units of input.  If the program does different things on different cpus then the answer probably depends on the slowest process in your program.  For most of us that takes us back to trial and error testing.
      By way of example, when running mpiblast against our local copy of genbank the database is broken into 14 pieces and each part sent to one of 14 cpus.  The query sequence is sent to all those cpus.  An extra 2 cpus help handle the overhead of collating the returns from the 14 cpus.  So we usually ask for 16 cpus and trials have born that out as the optimum number.  Had we asked for many more cpus the scheduler would be less inclined to let us in right away, so it was in our interest not to ask for too much.  Also, if the database were split into more pieces it did not run better: having more cpus meant that there was more collating of the results to be done, which offset the gain.

I thought that parallel programs required threads or OpenMP or MPI or something else.

Lewis uses MPI to send messages between nodes.

How do I know if my program can run on the parallel machine?

Non-parallelized programs (aka, a serial program) run on Lewis just fine.  You may be able to split your input into pieces and run the program on each piece simultaneously to get results quicker. 

How can I kill my bsub job?  See the status of my job?  See how many jobs are running?...

Run "man lsfbatch" to see the tools at your disposal.