Position Letter Frequency

How spelling preferences shift by position in English words

Words
Positions
Letters
Data source
loading
End data
loading
Direction
Position
Position 1
n=0
Letter
Metric

Coverage by position

Word count at each position

Position view

Letters at position 1 (from start)

Letter profile

Distribution for A (from start)

Reliability

Top position-specific letters

Entropy by position

Lower entropy = more predictable letter distribution

Z-score outliers

Top deviations from expected frequency

Near-zero occurrences

Letters nearly absent at specific positions

Vowel density by position

Percentage of vowels (a, e, i, o, u) at each position

Top-3 concentration by position

How much the top 3 letters dominate each position
Method and definitions

Let position \(i \in \{1,\dots,31\}\) and letter \(\ell \in \{a,\dots,z\}\). Let \(n_i\) be the number of words long enough to have a letter at position \(i\), and let \(c_{i,\ell}\) be the number of those words whose letter at position \(i\) is \(\ell\).

count: \(c_{i,\ell}\)

probability: \(p_{i,\ell} = P(\ell \mid i) = \dfrac{c_{i,\ell}}{n_i}\)

baseline: overall letter frequency across all positions: \[ p_{\ell} = \frac{\sum_i c_{i,\ell}}{\sum_i n_i} \]

deviation: difference from baseline in percentage points: \[ \Delta_{i,\ell} = (p_{i,\ell} - p_{\ell}) \times 100 \]

lift: relative rate compared to baseline: \[ L_{i,\ell} = \frac{p_{i,\ell}}{p_{\ell}} \]

entropy: Shannon entropy at position \(i\): \[ H_i = -\sum_{\ell} p_{i,\ell}\log_2(p_{i,\ell}) \] Maximum is \(\log_2(26)\approx 4.70\) bits (uniform over 26 letters). Lower entropy = more predictable.

z-score: under a simple binomial null \(c_{i,\ell}\sim \mathrm{Binomial}(n_i,p_{\ell})\): \[ E_{i,\ell}=n_ip_{\ell},\quad \sigma_{i,\ell}=\sqrt{n_ip_{\ell}(1-p_{\ell})},\quad z_{i,\ell}=\frac{c_{i,\ell}-E_{i,\ell}}{\sigma_{i,\ell}} \] Roughly, \(|z|>3\) corresponds to two-tailed \(p\approx 0.0027\) under a normal approximation.

near-zero: almost absent relative to baseline: \[ \frac{p_{i,\ell}}{p_{\ell}} < 0.1 \]

vowel density: share of vowels at position \(i\): \[ V_i=\sum_{\ell\in\{a,e,i,o,u\}} p_{i,\ell} \]

top-3 concentration: combined probability of the 3 most common letters at position \(i\): \[ C_i=\sum_{k=1}^{3} p_{i,\ell_k} \] where \(\ell_k\) are the top-3 letters by \(p_{i,\ell}\) at that position.