BLAST on Lewis
The fastest way to run BLAST on Lewis is to use mpi. One limitation, however,
is that nucleotide sequences must be less than ~120 KB in length. If that is a limitation consider using the
tcp/ip connection instead. It, however, is limited to 4 cpus which must all be on one node.
You may want to see Lewis FAQs or Running MPI Jobs
on Lewis for more information about using Lewis. If you were looking for WU-BLAST go here.
A. Running BLAST with mpi.
- Databases properly formatted for mpiBlast on Lewis are listed under the
directory /dfs/databases/mpi-db. To query one of them create a file like this one in your home directory:
[<userid>@lewis ~]$ cat file1
Note that you must put your userid in the path to the query file. The "-d" flag is to the database, where here gball indicates all of GenBank. Other databases are listed here. Finally, make the file executable with the following command: chmod 755 file1
mpirun.lsf -np 16 /evbio/src/mpiBlast.last/bin/mpiblast -d gball -p blastn -v 10 -b 0 -i $HOME/query.fa -o $HOME/query.gball.out
- Create a second file that calls the first:
[<userid>@lewis ~]$ cat file2
The "-J" option gives the job a name. The "-a mvapich" option is required for LSF integration with InfiniBand. The "-n" option indicates the number of cpus. We have found that mpi blast runs best with 16 cpus. More will not help. The "-oo" and "-eo" options create output and error files, respectively, in your home directory. More can be learned about the bsub options by using the command "man bsub".
#BSUB -J gb
#BSUB -n 16
#BSUB -a mvapich2
#BSUB -oo %Jgb.o
#BSUB -eo %Jgb.e
./file1
- Create a third file to tell mpiBlast where to find things. You must call it ".ncbirc":
[<userid>@lewis ~]$ cat .ncbirc
Note that in the last line you must put your userid, and, that to list a file whose name begins with the character "." requires the "-a" option, as in "ls -la".
[NCBI]
Data=/evbio/src/mpiBlast/build/ncbi/data
[mpiBLAST]
Shared=/dfs/databases/mpi-db
Local=/home/userid/data/temp
- To submit the blast program indicated in file1 you need to type and enter at the command line:
[<userid>@lewis ~]$ bsub < file2
You will immediately see a response like this:
Job <nnnn> is submitted to default queue
Where nnnn is the job number. If you want to see what your job status is type and enter:. [<userid>@lewis ~]$ bjobs -w
Your results will be returned to your home directory with file names that are indicated in file2, so e.g., the output for the above submission was in two files called nnnngb.e and nnnngb.o, where ".e" refers to error and ".o" refers to output.
- Test to see if the NCBI tools are in your path by typing "blastall" at the command line. You should see the
list of options for blast printed on your screen. If so then go to the last step.
- Put the ncbi tools in your path. Assuming you are using the bash shell
open up the .bashrc file in your home directory and put the following:
export NCBI=/evbio/NCBI/ncbitools/ncbi/build/
- Include the new variable in the PATH statement of the same file so it reads something like this:
export PATH=$PATH:$NCBI
(There may be more variables included in the PATH statement. This is just an example.)
- Now you must exit your session on lewis and log back in.
- Type at the command line:
echo $NCBI
You should see the path indicated as /evbio/NCBI/ncbitools/ncbi/build/
- Repeat the first step. You should see all the options for running BLAST.
- To run blastn create two files. First,
file1 is an executable file that might contain
blastall -p blast -d database -i query -a 4 -v 100 -b 1
You need to state the kind of blast (blastn, blastx, blastp, tblastx, tblastn), the full path and name of the database, and the name of the query file. If you want, e.g., gball, use /dfs/databases/genbank2/gball. Or for NR protein use /dfs/databases/NR/nr. The list of databases is here.
Create file2:# Set job parameters
Note that the maximum possible number of cpus is 4. The "span[hosts=1]" argument ensures all 4 cpus are on the same node. It is required to include this line otherwise LSF may over-allocate threads to the node your job is running on, causing great inefficiency for you and others.
#BSUB -a mpichp4
#BSUB -J jobname
#BSUB -oo jobname.o%J
#BSUB -eo jobname.e%J
# Set number of CPUs
#BSUB -n 4
#BSUB -R "span[hosts=1]"
# Start MPI job
mpirun.lsf ./file1
- To submit the blast program indicated in file1 you need to type and enter at the command line:
[<userid>@lewis ~]$ bsub < file2
See step 4 of Running BLAST with mpi above for more details of the output files.