If we are willing to live with / tolerate some small probability of decoding error, we can compress the source better $\implies$ we can have smaller length code for representing the source.
By ignoring source symbols which have very low probability of occurrence.
We ignore source symbols which have very low probability of occurrence.
“Club” multiple RV instances, $X_1,X_2,X_3$ are the three instances.
$X_1,X_2,X_3\in{{a,a,a},…,{b,b,b}}={a,b}^3$.
Assume $X_1,X_2,X_3$ are independent RVs.
$P_{X_1,X_2,X_3}(x_1,x_2,x_3)=P_{X_1}(x_1)\cdot P_{X_2}(x_2)\cdot P_{X_3}(x_3)\ \forall x_1,x_2,x_3\in{a,b}$
We know joint distribution from the individual distributions (also called marginal distributions).
$P_{X_1,X_2,X_3}(x_1,x_2,x_3)=P_{X_1,X_2,X_3}(\vec{x})=p^{n_a(\vec{x})}\cdot(1-p)^{n_b(\vec{x})}$
Where $\vec{x}=(x_1,x_2,x_3)$; $n_a(\vec{x})$ is the number of times $a$ occurs in $\vec{x}$; $p$ is probability of occurrence of $a$.
Suppose we have a compression scheme $C_S$ (a mapping) for a single RV
\(a
C_S:\{a,b\}\mapsto\{0,1\}\)
For three RV it will become
\(C'_S:\{a,b\}^3\mapsto\{0,1\}^3\)
This new code $C’_S$ taking 3 bits for 3 RVs is as good as original code $C_S$ taking 1 bit for 1 RV.
To do better than this, we have to use less than 3 bits, i.e. 0, 1 , 2 bits.
In the case of encoding only one source symbol, our possible code length choices are either 0 or 1.
Here we have more choices 0, 1, 2, 3 length binary strings can be used
\[C'_S=\{a,b\}^3\mapsto\{0,1\}\]This is good code for the following case:
It is the ratio of the length of source code to the length of source symbols.
Suppose we are allowed to combine multiple source symbols and encode them together into some fixed length binary string, then this gives a more efficient source code (smaller normalized length)
Requirements:
Assumptions:
$n$-length random source vector is represented by $(X_1,X_2,\dots,X_n)$, where $X_i$ is the RV representing $i^{th}$ output of the source which will belong to ${a,b}$.
The following is true,
\(\cases{p_{X_i}(a)=p_X(a)\\p_{X_i}(b)=p_X(b)}\implies p_{X_i}=p_X\)
So, all $X_i$ have the same distribution and are independent.
And so all $X_i$ are said to be independent and identically distributed.
Question:
Suppose $n$ is a very large number. How many $a$ and $b$ do we expect to see in the random sequence $(X_1,X_2,\dots,X_n)$?
\[\begin{aligned} &\text{Number of }a=n\cdot p\\ &\text{Number of }b=n\cdot(1-p)\\ &\text{Total number of sequences}=2^n\\ &\text{Number of sequences with }(n\cdot p)\ a\text{'s and }(n\cdot(1-p))\ b\text{'s}={n\choose np} \end{aligned}\]So the source code we want to use will encode only these $n\choose np$ sequences with unique codewords. All other sequences are combined under a single dummy codeword.