Distribution of discovered motifs with respect to Z-score threshold
We analyze the distribution of discovered motifs on yeast in terms of the Z-score cutoff thresold of the WordSpy algorithm. It turns out that the number of motifs per promoter decreases faster than a linear function. An analysis such as this can help determine the correct Z-score threshold. The results are shown below.

X-axes are motif Z-score threshold, and Y axes are the average number of motifs per gene promoter. The error bar gives 95% confidence interval. It shows that as the Z-score threshold is set to 5, on average, around 10 motifs appear in each promoter. Assume every two predictions have one false positive, 10 motifs should be a reasonable number for each promoter. Our results also show that all discovered known motifs are above this threshold. We are very interested in more experimental and theoretical analysis on this subject in our future research.