Files
NoteNextra-origin/content/CSE5313/CSE5313_L17.md
2025-10-28 11:55:11 -05:00

85 lines
2.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# CSE5313 Coding and information theory for data science (Lecture 17)
## Shannon's coding Theorem
**Shannons coding theorem**: For a discrete memoryless channel with capacity $C$,
every rate $R < C = \max_{x\in \mathcal{X}} I(X; Y)$ is achievable.
### Computing Channel Capacity
$X$: channel input (per 1 channel use), $Y$: channel output (per 1 channel use).
Let the rate of the code be $\frac{\log_F |C|}{n}$ (or $\frac{k}{n}$ if it is linear).
The Binary Erasure Channel (BEC): analog of BSC, but the bits are lost (not corrupted).
Let $\alpha$ be the fraction of erased bits.
### Corollary: The capacity of the BEC is $C = 1 - \alpha$.
<details>
<summary>Proof</summary>
$$
\begin{aligned}
C&=\max_{x\in \mathcal{X}} I(X;Y)\\
&=\max_{x\in \mathcal{X}} (H(Y)-H(Y|X))\\
&=H(Y)-H(\alpha)
\end{aligned}
$$
Suppose we denote $Pr(X=1)\coloneqq p$.
$Pr(Y=0)=Pr(X=0)Pr(no erasure)=(1-p)(1-\alpha)$
$Pr(Y=1)=Pr(X=1)Pr(no erasure)=p(1-\alpha)$
$Pr(Y=*)=\alpha$
So,
$$
\begin{aligned}
H(Y)&=H((1-p)(1-\alpha),p(1-\alpha),\alpha)\\
&=(1-p)(1-\alpha)\log_2 ((1-p)(1-\alpha))+p(1-\alpha)\log_2 (p(1-\alpha))+\alpha\log_2 (\alpha)\\
&=H(\alpha)+(1-\alpha)H(p)
\end{aligned}
$$
So $I(X;Y)=H(Y)-H(Y|X)=H(\alpha)+(1-\alpha)H(p)-H(\alpha)=(1-\alpha)H(p)$
So $C=\max_{x\in \mathcal{X}} I(X;Y)=\max_{p\in [0,1]} (1-\alpha)H(p)=(1-\alpha)$
So the capacity of the BEC is $C = 1 - \alpha$.
</details>
### General interpretation of capacity
Recall $I(X;Y)=H(Y)-H(Y|X)$.
Edge case:
- If $H(X|Y)=0$, then output $Y$ reveals all information about input $X$.
- rate of $R=I(X;Y)=H(Y)$ is possible. (same as information compression)
- If $H(Y|X)=H(X)$, then $Y$ reveals no information about $X$.
- rate of $R=I(X;Y)=0$ no information is transferred.
> [!NOTE]
>
> Compression is transmission without noise.
## Side notes for Cryptography
Goal: Quantify the amount of information that is leaked to the eavesdropper.
- Let:
- $M$ be the message distribution.
- Let $Z$ be the cyphertext distribution.
- How much information is leaked about $m$ to the eavesdropper (who sees $operatorname{Enc}(m)$)?
- Idea: One-time pad.
### One-time pad