{% for g in range(gts|length) %} {% endfor %}
(Labeled) Ground truth motif
LabelLogoperformance coefficient-g
G{{ gts_indices[g] }}

Dataset:

This is data {{jaspars}} from the JASPAR database.


Performance coefficients:

Let \(G_i=G_i^f\cup G_i^b\) denote the set of base positions of the \(i\)th ground truth motif occured, where \(G_i^b\) is the reverse complement of \(G_i^f\), and \(D_j\) be the set of base positions of the \(j\)th discoverd motif returned from the algorithm
  • The \(j\)th entry of performance-coefficients-g is defined as \(|G_i \cap D_j| / |G_i|\)
    • This measures the extent of ground truth \(G_i\) is captured by \(D_j\)
    • The entry NC is the quantity \( |\bigcap_j (G_i \setminus D_j)| / |G_i| \); this measures how much the ground truth \(G_i\) is not captured by any \(D_j\)
  • The \(ith\) entry of performance-coefficient-d is defined as \(|G_i \cap D_j| / |D_j|\)
    • This measures how capable the learned motif \(D_j\) can capture \(G_i\)
    • The entry BG is the quantity \( |D_j \cap B| / |D_j| \) where \(B\) is the set of background base positions; this measures how much \(D_j\) is from the background
For JASPAR dataset, since the ground truth is labeled, we use the following metrics:
  • Ratio of labeled ground truth being covered: \( \sum_i\sum_j |G_i\cap D_j| / |\bigcup_i G_i| \)
  • Ratio of labeled ground truth being not being covered: \( 1-\sum_i\sum_j |G_i\cap D_j| / |\bigcup_i G_i| \)
  • Ratio of the dicovered motifs inside the labeled ground truth: \( \sum_i\sum_j |G_i\cap D_j| / |\bigcup_j D_j| \)
  • Ratio of the dicovered motifs outside the labeled ground truth: \( 1-\sum_i\sum_j |G_i\cap D_j| / |\bigcup_j D_j| \)

Likelihood ratios scores:

Let \(P_j\) be the position frequency matrix estimated from the \(j\)th learned motif and \(N_j\) be the number of sequences used in the estimation of \(P_j\). The likelihood ratio score of the \(j\)th learned motif of length \(L\) is $$ \sum_{n=1}^{N_j}\sum_{\ell=1}^L \sum_{\alpha} \unicode{x1D7D9}\left[s_n[\ell]=\alpha\right] \, P_j[\alpha,\ell]\, \ln \frac{P_j[\alpha,\ell]}{B[\alpha]}$$ where \(B[\alpha]\) is the background frequency of nucleotide \(\alpha\), \(s_n\) the \(n\)th substring used in estimating \(P_j\), and \(\unicode{x1D7D9}[\cdot]\) is the indicator function. In this experiment, \(B[\alpha]=1/4,\,\forall \alpha\).
{% for i in range(discovered[1]|length) %} {% endfor %} {% for d in range(discovered|length) %} {% for i in range(discovered[d]|length) %} {% endfor %} {% endfor %}
Discovered motif
LabelLogoperformance coefficient-d
D{{ d_indices[d][i] }}
Learned PWM



  • Ratio of labeled ground truth being covered: {{gt_cover}}
  • Ratio of labeled ground truth being not being covered: {{gt_n_cover}}
  • Ratio of the dicovered motifs inside the labeled ground truth: {{a_cover}}
  • Ratio of the dicovered motifs outside the labeled ground truth: {{false_cover}}


Likelihood ratio scores: