Given, random variable $X$. Then entropy $H(X)$ is average uncertainly about $X$. Suppose some observer sees $X$.
Uncertainty about $X$ after observing $X=H(X | X)=0$. |
Reduction in average uncertainty of $X$ after observing $X=H(X)-H(X | X)=H(X)$. |
Uncertainty about $X$ after observing $Y=H(X | Y)$. |
Reduction in average uncertainty of $X$ after observing $Y=H(X)-H(X | Y)$. |
Information obtained about one random variable through observing the other random variable.
\[\boxed{I(X;Y)\triangleq H(X)-H(X|Y)=H(Y)-H(Y|X)}\]On simplifying,
\[\begin{aligned} I(X;Y)&=H(X)-H(X|Y)\\ &=-\sum_{x\in\text{supp}(P_X)}P_X(x)\cdot\log_2{P_X(x)}+\sum_{\{x,y\}\in\text{supp}(P_{XY})}P_{XY}(x,y)\cdot\log_2{\frac{P_{XY}(x,y)}{P_Y(y)}}\\ &=-\sum_{\{x,y\}\in\text{supp}(P_{XY})}P_{XY}(x,y)\cdot\log_2{P_X(x)}+\sum_{\{x,y\}\in\text{supp}(P_{XY})}P_{XY}(x,y)\cdot\log_2{\frac{P_{XY}(x,y)}{P_Y(y)}}\\ &=\sum_{\{x,y\}\in\text{supp}(P_{XY})}P_{XY}(x,y)\cdot\log_2{\frac{P_{XY}(x,y)}{P_X(x)\cdot P_Y(y)}} \end{aligned}\] \[\implies\boxed{I(X;Y)=\sum_{\{x,y\}\in\text{supp}(P_{XY})}P_{XY}(x,y)\cdot\log_2\left(\frac{P_{XY}(x,y)}{P_X(x)\cdot P_Y(y)}\right)=D(P_{XY}||P_X\otimes P_Y)}\]Suppose we have $X\in{a,b}$ a binary source with probability distribution $p_X$.
Suppose observer observes one instance of $X$, then wants to communicate through some noise-free medium which can only carry ${0,1}$ bits.
In general, we need 1 bit. For example $\cases{a\mapsto 0\b\mapsto 1}$
Case 1:
Case 2: