{% for g in range(gts|length) %} {% endfor %}
Ground truth motif
LabelLogoperformance coefficient-g
G{{ gts_indices[g] }}

Binding patterns:

Example fasta file    Example fasta file with ground truth


The ground truth motif has {{ modes }} mode(s)
    {% for i in range(modes) %}
  • Mode {{i+1}} occurs with ({{ mode_strs[i] }}) with probability {{mixture_weights[i]}} in each sequence and half of the time occurs in the reverse-complement direction
    • {% if mode_str_pairs[i]|length != 0 %} {% for j in range(mode_str_pairs[i]|length) %} {% if gap_str_pairs[i]|length > 0 %}
    • {{mode_str_pairs[i][j][0]}},{{mode_str_pairs[i][j][1]}} can be 0-{{gap_str_pairs[i][j]}} nucleotides apart from each other
    • {% endif %} {% endfor %} {% endif %}
    {% endfor %}

Performance coefficients:

Let \(G_i\) denote the set of base positions of the \(i\)th ground truth motif occured, and \(D_j\) be the set of base positions of the \(j\)th discoverd motif returned from the algorithm
  • The \(j\)th entry of performance-coefficients-g is defined as \(|G_i \cap D_j| / |G_i|\)
    • This measures the extent of ground truth \(G_i\) is captured by \(D_j\)
    • The entry NC is the quantity \( |\bigcap_j (G_i \setminus D_j)| / |G_i| \); this measures how much the ground truth \(G_i\) is not captured by any \(D_j\)
  • The \(ith\) entry of performance-coefficient-d is defined as \(|G_i \cap D_j| / |D_j|\)
    • This measures how capable the learned motif \(D_j\) can capture \(G_i\)
    • The entry BG is the quantity \( |D_j \cap B| / |D_j| \) where \(B\) is the set of background base positions; this measures how much \(D_j\) is from the background
  • The performance-coefficient is defined as $$ \frac{\sum_{i,j} |G_i\cap D_j|}{|\bigcup_i G_i\, \cup\, \bigcup_j D_j|}$$

Likelihood ratios scores:

Let \(P_j\) be the position frequency matrix estimated from the \(j\)th learned motif and \(N_j\) be the number of sequences used in the estimation of \(P_j\). The likelihood ratio score of the \(j\)th learned motif of length \(L\) is $$ \sum_{n=1}^{N_j}\sum_{\ell=1}^L \sum_{\alpha} \unicode{x1D7D9}\left[s_n[\ell]=\alpha\right] \, P_j[\alpha,\ell]\, \ln \frac{P_j[\alpha,\ell]}{B[\alpha]}$$ where \(B[\alpha]\) is the background frequency of nucleotide \(\alpha\), \(s_n\) the \(n\)th substring used in estimating \(P_j\), and \(\unicode{x1D7D9}[\cdot]\) is the indicator function. In this experiment, \(B[\alpha]=1/4,\,\forall \alpha\).
{% for i in range(discovered[1]|length) %} {% endfor %} {% for d in range(discovered|length) %} {% for i in range(discovered[d]|length) %} {% endfor %} {% endfor %}
Discovered motif
LabelLogoperformance coefficient-d
D{{ d_indices[d][i] }}
Learned PWM



Performance-coefficient: {{perf_coeff}}

Likelihood ratio scores: