About DeCypher


What is in DeCypher and why should I use it?

Decypher has most of the tools for the two main types of sequence similarity searching:   pair-wise searches (e.g., BLAST, Smith-Waterman) and profile-type searches (including HMMs).  It is many times faster than BLAST run on most machines and allows much larger input than does the NCBI web site itself.   The UMBC maintains and regularly updates many publicly available and some locally made databases for DeCypher tools to use, including Genbank and UniProt.  Other custom databases can be installed.

The UMBC has DeCypher installed on 4 Sun Solaris computers, allowing that many jobs to be run at a time.  Specialized hardware installed on each machine allows rapid identification of similar sequences, followed by a software step that processes the results into alignments and statistics. 

Below is a shot of the web interface. The types of algorithm (e.g., Tera-Blast P) are matched with the possible combinations of query and target.


DeCypher1

Will DeCypher TeraBlast programs give me the same results as NCBI's BLAST programs?

Not necessarily.  TeraBlastN is optimized to find nearly identical sequences rapidly.   For weaker similarities encoded in nucleic acids the translating programs (tblastn and blastx in TeraBlastP) are better.   DeCypher first performs an ungapped version of the pertinent flavor of blast (blastn for TeraBlastN, blastp for TeraBlastP, etc.) in its “hardware” phase to identify the best scoring sequences, then aligns and processes those sequences in its “software” phase using a version of Smith-Waterman (SW).  The default is Banded SW, but other variations of SW are available (Full, Double Affine, Semi-global, Ungapped).  The hardware phase identifies the matches very rapidly, and then in the software phase the slow but optimal SW alignment is performed on the top hits of this query.  The optimal alignment by SW may yield different scores for DeCypher, and the targets are not necessarily in the same order as gapped BLAST at NCBI. 


What is TeraProbe?

Use it to compare your oligonucleotides with nucleic acid databases.


What can GeneDetective do?

GeneDetective aligns genomic DNA with protein, protein HMM, or coding DNA to allow visualization of splice sites, introns, alternative splice forms, or the relations of SNPs to amino acid sequence.  It starts by using a TeraBlast procedure to identify the top hits of an ungapped alignment, and for that reason only aligns closely matched sequences.


I want to install my own database for use with DeCypher.  How do I do that?

Custom databases can be installed.  We do not allow users to make their own databases as it is possible that one may overwrite another’s database by giving it the same name.   Therefore, send us the sequences and we will make the database.


How can I do a multiple sequence alignment? ClustalW will do that but has a limit of 30 sequences.

 

What can Smith-Waterman do?

Smith-Waterman is guaranteed to find the optimal local alignment. sequences.   It is slower than TeraBlast.


What are the HMM and ProfileSearch tools for?

The HMM (Hidden Markov Models) and ProfileSearch are useful for identifying domains and protein families, repetitive elements, regulatory regions, and to predict 3D structures. HMMs and Profiles represent a domain or consensus sequence in a position-specific manner.  While the BLAST and Smith-Waterman sequence alignment tools are used to find similarity across relatively long stretches of sequence the profile-based techniques are most useful for finding similarities between smaller pre-defined regions, even if the rest of the sequence is dissimilar.  A commonly used example is how blastp might find little similarity between human hemoglobin A and leghemoglobin, but a properly defined HMM can find the heme-binding sites in each and so group the two proteins into the same family.  The difference between HMM and ProfileSearch is:  HMM models have a basis in probability theory whereas ProfileSearch models are built heuristically.  For that reason HMMs are usually preferred.  If you want a more detailed description of these models and their differences try these chapters from the GCG manual:  Profile Hidden Markov Model Analysis, Profile Analysis.


What are the FrameSearch versions of SW, HMM, and ProfileSearch about?

These allow for frame shifts in the translation of DNA to protein, so that less similar DNA sequences can be detected.  This is of use in comparing very divergent species.   Because of the translation and the added frames to search these are slower searches.


What does the actual query interface look like?

Here is a shot of the interface for Tera-Blast N which is similar to all the interfaces.  Enter your email address to have results returned to you as attchments or as notices to a url from where the results can be downloaded.  The output format and manner are specified in the Return Results drop-down menus.  The query sequences can be input by browsing to a file or by pasting into the box.  One, the other, or both strand can be matched in the target.  The list of regularly updated databases includes GenBank, NR, The Gene Index Project (formerly at TIGR, now at Harvard), UniProt, Amigo, Arabidopsis, and more.  At the bottom are the various job options to choose from.


DeCypher2