Identifying Yeast cell-cycle trancription
factor binding motifs
As another application of WordSpy, we applied it to discovering
TFBMs in the regulatory regions of about 800 cell-cycle related genes
of S. cerevisiae. The cell-cycle gene names are from http://genome-www.stanford.edu/cellcycle/data/rawdata/
and the promoter sequences are gotten with RSA tools http://rsat.scmbb.ulb.ac.be/rsat/.
By removing the homologs and dubious genes, the input sequences we used
in this experiment contains 645 promoter sequences. The fasta file is
available here (cleaned
yeast cell cycle promoters).
To evaluate the quality of a motif (for being a biologically
meaningful motif), we measure the coherence of expression profile of the
genes whose promoters contain that motif. We can use the average coherence
of pairs of genes associated with a motif and call this coherence measure
G-score. The yeast gene expression
data are from http://cmgm.stanford.edu/~kimlab/multispecies/Data/yeast.zip.
The motifs discovered by WordSpy were reordered based on their G-scores.
Interestingly, most of known motifs are ranked high in our dictionary;
many obvious repeats which have very high Z-scores, such as GAAAAAA, can
be identified as not biologically significant and thus removed from the
dictionary, thanks to their low G-scores.We also performed the whole genome
analysis on the specificity of the motifs, Zg-scores, with the
promoters of all the genes of S. cerevisiae. Most of known TFBMs
are also ranked high with Zg-scores.
To facilitate motif selection for a real application, we
clustered similar motifs. The motifs were first sorted by Zg-score or
G-score. From the highest to the lowest rankings, we took a motif as a
seed that had not been clustered, and grouped it with all the motifs that
shared a common substring of length 6 with the seed or its reverse complementary.
The detail results are shown below.
Results:
Identified
known motifs and their ranks.
All putative motifs
for yeast cell-cycle genes ordered by G-score.
Putative motif clusters based on G-score ranking.
Putative motif clusters based on Zg-score ranking.
The dictionaries
built by Wordspy:
|