325 lines
9.7 KiB
Markdown
325 lines
9.7 KiB
Markdown
# CSE347 Analysis of Algorithms (Lecture 10)
|
|
|
|
## Online Algorithms
|
|
|
|
### Example 1: Elevator
|
|
|
|
Problem: You've entered the lobby of a tall building, and want to go to the top floor as quickly as possible. There is an elevator which takes $E$ time to get to the top once it arrives. You can also take the stairs which takes $S$ time to climb (once you start) with $S>E$. However, you **do not know** when the elevator will arrive.
|
|
|
|
#### Offline (Clairvoyant) vs. Online
|
|
|
|
Offline: If you know that the elevator is arriving in $T$ time, the what will you do?
|
|
|
|
- Easy. I will computer $E+T$ with $S$ and take the smaller one.
|
|
|
|
Online: You do not know when the elevator will arrive.
|
|
|
|
- You can either wait for the elevator or take the stairs.
|
|
|
|
#### Strategies
|
|
|
|
**Always take the stairs.**
|
|
|
|
Your cost $S$,
|
|
|
|
Optimal Cost: $E$.
|
|
|
|
Your cost / Optimal cost = $\frac{S}{E}$.
|
|
|
|
$S$ would be arbitrary large. For example, the Empire State Building has $103$ floors.
|
|
|
|
**Wait for the elevator**
|
|
|
|
Your cost $T+E$
|
|
|
|
Optimal Cost: $S$ (if $T$ is large)
|
|
|
|
Your cost / Optimal cost = $\frac{T+E}{S}$.
|
|
|
|
$T$ could be arbitrary large. For out of service elevator, $T$ could be infinite.
|
|
|
|
#### Online Algorithms
|
|
|
|
Definition: An online algorithm must take decisions **without** full information about the problem instance [in this case $T$] and/or it does not know the future [e.g. makes decision immediately as jobs come in without knowing the future jobs].
|
|
|
|
An **offline algorithm** has the full information about the problem instance.
|
|
|
|
### Competitive Ratio
|
|
|
|
Quality of online algorithm is quantified by the **competitive ratio** (Idea is similar to the approximation ratio in optimization).
|
|
|
|
Consider a problem $L$ (minimization) and let $l$ be an instance of this problem.
|
|
|
|
$C^*(l)$ is the cost of the optimal offline solution with full information and unlimited computational power.
|
|
|
|
$A$ is the online algorithm for $L$.
|
|
|
|
$C_A(l)$ is the value of $A$'s solution on $l$.
|
|
|
|
An online algorithm $A$ is $\alpha$-competitive if
|
|
|
|
$$
|
|
\frac{C_A(l)}{C^*(l)}\leq \alpha
|
|
$$
|
|
|
|
for all instances $l$ of the problem.
|
|
|
|
In other words, $\alpha=\max_l\frac{C_A(l)}{C^*(l)}$.
|
|
|
|
For maximization problems, we want to minimize the comparative ratio.
|
|
|
|
### Back to the Elevator Problem
|
|
|
|
**Strategy 1**: Always take the stairs. Ratio is $\frac{S}{E}$. can be arbitrarily large.
|
|
|
|
**Strategy 2**: Wait for the elevator. Ratio is $\frac{T+E}{S}$. can be arbitrarily large.
|
|
|
|
**Strategy 3**: We do not make a decision immediately. Let's wait for $R$ times and then takes stairs if elevator does not arrive.
|
|
|
|
Question: What is the value of $R$? (how long to wait?)
|
|
|
|
Let's try $R=S$.
|
|
|
|
Claim: The comparative ratio is $2$.
|
|
|
|
<details>
|
|
<summary>Proof</summary>
|
|
|
|
Case 1: The optimal offline solution takes the elevator, then $T+E\leq S$.
|
|
|
|
We also take the elevator.
|
|
|
|
Competitive ratio = $\frac{T+E}{T+E}=1$.
|
|
|
|
Case 2: The optimal offline solution takes the stairs, immediately.
|
|
|
|
We wait for $R$ times and then take the stairs. In worst case, we wait for $R$ times and then take the stairs for $R$.
|
|
|
|
Competitive ratio = $\frac{2R}{R}=2$.
|
|
|
|
</details>
|
|
|
|
Let's try $R=S-E$ instead.
|
|
|
|
Claim: The comparative ratio is $max\{1,2-\frac{E}{S}\}$.
|
|
|
|
<details>
|
|
<summary>Proof</summary>
|
|
|
|
Case 1: The optimal offline solution takes the elevator, then $T+E\leq S$.
|
|
|
|
We also take the elevator.
|
|
|
|
Competitive ratio = $\frac{T+E}{T+E}=1$.
|
|
|
|
Case 2: The optimal offline solution takes the stairs, immediately.
|
|
|
|
We wait for $R=S-E$ times and then take the stairs.
|
|
|
|
Competitive ratio = $\frac{S-E+S}{S}=2-\frac{E}{S}$.
|
|
|
|
</details>
|
|
|
|
What if we wait less time? Let's try $R=S-E-\epsilon$ for some $\epsilon>0$
|
|
|
|
In the worst case, we take the stairs for $S-E-\epsilon$ times and then take the stairs for $S$.
|
|
|
|
Competitive ratio = $\frac{(S-E-\epsilon)+S}{S-E-\epsilon+E}=\frac{2S-E-\epsilon}{2S-E}>2-\frac{E}{S}$.
|
|
|
|
So the optimal competitive ratio is $max\{1,2-\frac{E}{S}\}$ when we wait for $S-E$ time.
|
|
|
|
### Example 2: Cache Replacement
|
|
|
|
Cache: Data in a cache is organized in blocks (also called pages or cache lines).
|
|
|
|
If CPU accesses data that is already in the cache, it is called **cache hit**, then access is fast.
|
|
|
|
If CPU accesses data that is not in the cache, it is called **cache miss**, This block if brought to cache from main memory. If the cache already has $k$ blocks (full), then another block need to be **kicked out** (eviction).
|
|
|
|
Global: Minimize the number of cache misses.
|
|
|
|
**Clairvoyant policy**: Knows that will be accessed in the future and the sequence of access.
|
|
|
|
FIF: Farthest in the future
|
|
|
|
Example: $k=3$, cache has $3$ blocks.
|
|
|
|
Sequence: $A B C D C A B$
|
|
|
|
Cache: $A B C$, the evict $B$ for $D$. then 3 warm up and 1 miss.
|
|
|
|
Online Algorithm: Least recently used (LRU)
|
|
|
|
LRU: least recently used.
|
|
|
|
Example: $A B C D C A B$
|
|
|
|
Cache: $A B C$, the evict $A$ for $D$. then 3 warm up and 1 miss.
|
|
|
|
Cache: $D B C$, the evict $B$ for $A$. 1 miss.
|
|
|
|
Cache: $D A C$, the evict $D$ for $B$. 1 miss.
|
|
|
|
#### Competitive Ratio for LRU
|
|
|
|
Claim: LRU is $k+1$-competitive.
|
|
|
|
<details>
|
|
<summary>Proof</summary>
|
|
|
|
Split the sequence into subsequences such that each subsequence contains $k+1$ distinct blocks.
|
|
|
|
For example, suppose $k=3$, sequence $ABCDCEFGEA$, subsequences are $ABCDC$ and $EFGEA$.
|
|
|
|
LRU Cache: In each subsequence, it has at most $k+1$ misses.
|
|
|
|
The optimal offline solution: In each subsequence, must have at least $1$ miss.
|
|
|
|
So the competitive ratio is at most $k+1$.
|
|
|
|
</details>
|
|
|
|
Using similar analysis, we can show that LRU is $k$ competitive.
|
|
|
|
Hint for the proof:
|
|
|
|
Split the sequence into subsequences such that each subsequence LRU has $k$ misses.
|
|
|
|
Argue that OPT has at least $1$ miss in each subsequence.
|
|
|
|
#### Many sensible algorithms are $k$-competitive
|
|
|
|
**Lower Bound**: No deterministic online algorithm is better than $k$-competitive.
|
|
|
|
**Resource augmentation**: Offline algorithm (which knows the future) has $k$ cache lines in its cache and the online algorithm has $ck$ cache lines with $c>1$.
|
|
|
|
##### Lemma: Competitive Ratio is $\sim \frac{c}{c-1}$
|
|
|
|
Say $c=2$. LRU cache has twice as much as cache. LRU is $2$-competitive.
|
|
|
|
<details>
|
|
<summary>Proof</summary>
|
|
|
|
LRU has cache of size $2k$.
|
|
|
|
Divide the sequence into subsequences such that you have $ck$ distinct pages.
|
|
|
|
In each subsequence, LRU has at most $ck$ misses.
|
|
|
|
The OPT has at least $(c-1)k$ misses.
|
|
|
|
So competitive ratio is at most $\frac{ck}{(c-1)k}=\frac{c}{c-1}$.
|
|
|
|
_Actual competitive ratio is $\sim \frac{c}{c-1+\frac{1}{k}}$._
|
|
|
|
</details>
|
|
|
|
### Conclusion
|
|
|
|
- Definition: some information unknown
|
|
- Clairvoyant vs. Online
|
|
- Competitive Ratio
|
|
- Example:
|
|
- Elevator
|
|
- Cache Replacement
|
|
|
|
### Example 3: Pessimal cache problem
|
|
|
|
Maximize number of cache misses.
|
|
|
|
Maximization problem: competitive ratio is $max\{\frac{\text{cost of the optimal online algorithm}}{\text{cost of our algorithm}}\}$.
|
|
|
|
Or get $min\{\frac{\text{cost of our algorithm}}{\text{cost of the optimal online algorithm}}\}$.
|
|
|
|
The size of the cache is $k$.
|
|
|
|
So if OPT has $X$ cache misses, we want $\geq \frac{X}{\alpha}$. cache misses where $\alpha$ is the competitive ratio.
|
|
|
|
Claim: The OPT could always miss (note quite) except when the page is accessed twice in a row.
|
|
|
|
Claim: No deterministic online algorithm has a bounded competitive ratio. (that is independent of the length of the sequence)
|
|
|
|
Proof:
|
|
|
|
Start with an empty cache. (size of cache is $k$)
|
|
|
|
Miss the first $k$ unique pages.
|
|
|
|
$P_1,P_2,\cdots,P_k|P_{k+1},P_{k+2},\cdots,P_{2k}$
|
|
|
|
Say your deterministic online algorithm choose to evict $P_i$ for $i\in\{1,2,\cdots,k\}$.
|
|
|
|
We want to choose $P_i$ for $i\in\{1,2,\cdots,k\}$ such that the number of misses is maximized.
|
|
|
|
The optimal offline solution: evict the page that will be accessed furthest in the future. Let's call it $\sigma$.
|
|
|
|
The online algorithm: evict $P_i$ for $i\in\{1,2,\cdots,k\}$. Will have $k+1$ misses in the worst case.
|
|
|
|
So the competitive ratio is at most $\frac{\sigma}{k+1}$, which is unbounded.
|
|
|
|
#### Randomized most recently used (RAND, MRU)
|
|
|
|
MRU without randomization is a deterministic algorithm, and thus, the competitive ration is bounded.
|
|
|
|
First $k$ unique accesses brings all pages to cache.
|
|
|
|
On the $k+1$th access, pick a random page from the cache and evict it.
|
|
|
|
After that evict the MRU no a miss.
|
|
|
|
Claim: RAND is $k$-competitive.
|
|
|
|
#### Lemma: After the first $k+1$ unique accesses at all times
|
|
|
|
1. 1 page is in the cache with probability 1 (the MRU one)
|
|
2. There exists $k$ pages each of which is in the cache with probability $1-\frac{1}{k}$
|
|
3. All other pages are in the cache with probability $0$.
|
|
|
|
<details>
|
|
<summary>Proof</summary>
|
|
|
|
By induction.
|
|
|
|
Base case: right after the first $k+1$ unique accesses and before $k+2$th access.
|
|
|
|
1. $P_{k+1}$ is in the cache with probability $1$.
|
|
2. When we brought $P_{k+1}$ to the cache, we evicted one page uniformly at random. (i.e. $P_i$ is evicted with probability $\frac{1}{k}$, $P_i$ is still in the cache with probability $1-\frac{1}{k}$)
|
|
3. All other $r$ pages are definitely not in the cache because we did not see them yet.
|
|
|
|
Inductive cases:
|
|
|
|
Let $P$ be a page that is in the cache with probability $0$
|
|
|
|
Cache miss and RAND MRU evict $P'$ for another page with probability in this cache with probability $0$.
|
|
|
|
1. $P$ is in the cache with probability $1$.
|
|
2. By induction, there exists a set of $k$ pages each of which is in the cache with probability $1-\frac{1}{k}$.
|
|
3. All other pages are in the cache with probability $0$.
|
|
|
|
Let $P$ be a page in the cache with probability $1-\frac{1}{k}$.
|
|
|
|
With probability $\frac{1}{k}$, $P$ is not in the cache and RAND evicts $P'$ in the cache and brings $P$ to the cache.
|
|
|
|
</details>
|
|
|
|
MRU is $k$-competitive.
|
|
|
|
<details>
|
|
<summary>Proof</summary>
|
|
|
|
Case 1: Access MRU page.
|
|
|
|
Both OPT and our algorithm don't miss.
|
|
|
|
Case 2: Access some other 1 page
|
|
|
|
OPT definitely misses.
|
|
|
|
RAND MRU misses with probability $\geq \frac{1}{k}$.
|
|
|
|
Let's define the random variable $X$ as the number of misses of RAND MRU.
|
|
|
|
$E[X]\leq 1+\frac{1}{k}$.
|
|
|
|
</details>
|