The max number of elements that can be represented by $r$ elements in $B$ is $b^r$, so $c_r\leq b^r$.
This gives that the power series
$$
\sum_{r=1}^{\infty}b^{-r}c_r
$$
converges to $R=\frac{1}{\limsup_{r\to\infty}\sqrt[r]{c_r}}\leq 1$.
So the sum of the probabilities of all the codewords must be less than or equal to 1.
#### Sardinas and Patterson Algorithm
Let $A=\{a_1,a_2,\cdots,a_n\}$ be the message alphabet and $B=\{b_1,b_2,\cdots,b_m\}$ be the encoded alphabet.
We test whether the code $f:A\to S(B)$ is uniquely decipherable.
```python
defis_uniquely_decipherable(f):
contain_common_prefix=
message_space=set()
forxinA:
iff(x)inmessage_space:
returnFalse
message_space.add(f(x))
foriinrange(1,len(f(x))):
iff(x)[:i]inmessage_space:
contain_common_prefix=True
whilecontain_common_prefix:
contain_common_prefix=False
forxinmessage_space:
foryinmessage_space:
code_length=min(len(x),len(y))
ifx[:code_length]==y[:code_length]:
contain_common_prefix=True
iflen(x)<len(y):
message_space.add(y[code_length:])
else:
message_space.add(x[code_length:])
break
returnTrue
```
#### Definition 1.1.4
An elementary information source is a pair $(A,\mu)$ where $A$ is an alphabet and $\mu$ is a probability distribution on $A$. $\mu$ is a function $\mu:A\to[0,1]$ such that $\sum_{a\in A}\mu(a)=1$.
The **mean code word length** of an information source $(A,\mu)$ given a code $f:A\to S(B)$ is defined as
$$
\overline{l}(\mu,f)=\sum_{a\in A}\mu(a)l(f(a))
$$
The $L(\mu)$ of the mean code word length is defined as
$$
L(\mu)=\min\{\overline{l}(\mu,f)|f:A\to S(B)\text{ is uniquely decipherable}\}
$$
#### Theorem 1.1.5
Shannon's source coding theorem
Let $(A,\mu)$ be an elementary information source and let $b$ be the size of the encoded alphabet $B$.
$\log \prod_{a\in A}\left(\frac{v(a)}{\mu(a)}\right)^{\mu(a)}=\sum_{a\in A}\mu(a)\log \frac{v(a)}{\mu(a)}$ is also called Kullback–Leibler divergence or relative entropy.
> Jenson's inequality: Let $f$ be a convex function on the interval $(a,b)$. Then for any $x_1,x_2,\cdots,x_n\in (a,b)$ and $\lambda_1,\lambda_2,\cdots,\lambda_n\in [0,1]$ such that $\sum_{i=1}^{n}\lambda_i=1$, we have
> If $f$ is a convex function, there are three properties that useful for the proof:
>
> 1. $f''(x)\geq 0$ for all $x\in (a,b)$
> 2. For any $x,y\in (a,b)$, $f(x)\geq f(y)+(x-y)f'(y)$ (Take tangent line at $y$)
> 3. For any $x,y\in (a,b)$ and $0<\lambda<1$, we have $f(\lambda x+(1-\lambda)y)\leq \lambda f(x)+(1-\lambda)f(y)$ (Take line connecting $f(x)$ and $f(y)$)
>
> We use $f(x)\geq f(y)+(x-y)f'(y)$, we replace $y=\sum_{i=1}^{n}\lambda_ix_i$ and $x=x_j$, we have
(This is also known as Gibbs' inequality: Put in words, the information entropy of a distribution $P$ is less than or equal to its cross entropy with any other distribution $Q$.)
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.