Warning: The WordSpy web server is temporarily shut down. To use the software, please download a local version at [Download].

WordSpy allows you to

  1. Discover all over-represented (degenerate) words in a large set of sequences (DNA, RNA, Protein, or English)
  2. Identify discriminative words with negative sequence data
  3. Select biological meaningful DNA motifs using gene expression data
  4. Evaluate DNA motifs with genome-scale random sampling analysis (in the result page)

Note:

  1. The webserver was published at Nucleic Acids Research, 33:W412-6, 2005, Web Server issue. [Paper]
  2. NEW! The method paper was just published at Genome Biology, 7(6):R49, 2006. [Paper]
  3. NEW! The local version of the software is currently available for the academic and nonprofit uses at [Download]
Your sequences (required):
Enter one or more sequences in fasta format

Or upload a sequence file
Or select a sequence from our database
Your options:

Maximum word length (required): (2~20) help

Alphabet set: help
Motif mode: allow degeneracy subtle motifs on both strands
Count the number of sequences containing the motif help
Repeatly clean up non-significant motifs help
The order of tandem repeats to be filtered out: help
Motif selection criteria:
Gene expression data for motif ranking (optional):

Enter the gene expression data. Check the format here.

Or upload a gene expression data file
Or select a from our database

Note: The gene expression data format is a two dimentional matrix with the first column labelled gene names (or IDs), and the first row labelled conditions. The program requires that the gene names (or IDs) should match with the sequence names of the input sequence file. (detail)
Negative sequence data for finding discriminative motifs (optional):
Enter one or more sequences in fasta format

Or upload a sequence file
Or select a sequence from our database

Warning: The WordSpy web server is temporarily shut down. To use the software, please download a local version at [Download].