9.5 KiB
Lecture 10
Online Algorithms
Example 1: Elevator
Problem: You've entered the lobby of a tall building, and want to go to the top floor as quickly as possible. There is an elevator which takes E time to get to the top once it arrives. You can also take the stairs which takes S time to climb (once you start) with S>E. However, you do not know when the elevator will arrive.
Offline (Clairvoyant) vs. Online
Offline: If you know that the elevator is arriving in T time, the what will you do?
- Easy. I will computer
E+TwithSand take the smaller one.
Online: You do not know when the elevator will arrive.
- You can either wait for the elevator or take the stairs.
Strategies
Always take the stairs.
Your cost S,
Optimal Cost: E.
Your cost / Optimal cost = \frac{S}{E}.
S would be arbitrary large. For example, the Empire State Building has 103 floors.
Wait for the elevator
Your cost T+E
Optimal Cost: S (if T is large)
Your cost / Optimal cost = \frac{T+E}{S}.
T could be arbitrary large. For out of service elevator, T could be infinite.
Online Algorithms
Definition: An online algorithm must take decisions without full information about the problem instance [in this case $T$] and/or it does not know the future [e.g. makes decision immediately as jobs come in without knowing the future jobs].
An offline algorithm has the full information about the problem instance.
Competitive Ratio
Quality of online algorithm is quantified by the competitive ratio (Idea is similar to the approximation ratio in optimization).
Consider a problem L (minimization) and let l be an instance of this problem.
C^*(l) is the cost of the optimal offline solution with full information and unlimited computational power.
A is the online algorithm for L.
C_A(l) is the value of $A$'s solution on l.
An online algorithm A is $\alpha$-competitive if
\frac{C_A(l)}{C^*(l)}\leq \alpha
for all instances l of the problem.
In other words, \alpha=\max_l\frac{C_A(l)}{C^*(l)}.
For maximization problems, we want to minimize the comparative ratio.
Back to the Elevator Problem
Strategy 1: Always take the stairs. Ratio is \frac{S}{E}. can be arbitrarily large.
Strategy 2: Wait for the elevator. Ratio is \frac{T+E}{S}. can be arbitrarily large.
Strategy 3: We do not make a decision immediately. Let's wait for R times and then takes stairs if elevator does not arrive.
Question: What is the value of R? (how long to wait?)
Let's try R=S.
Claim: The comparative ratio is 2.
Proof:
Case 1: The optimal offline solution takes the elevator, then T+E\leq S.
We also take the elevator.
Competitive ratio = \frac{T+E}{T+E}=1.
Case 2: The optimal offline solution takes the stairs, immediately.
We wait for R times and then take the stairs. In worst case, we wait for R times and then take the stairs for R.
Competitive ratio = \frac{2R}{R}=2.
EOP
Let's try R=S-E instead.
Claim: The comparative ratio is max\{1,2-\frac{E}{S}\}.
Proof:
Case 1: The optimal offline solution takes the elevator, then T+E\leq S.
We also take the elevator.
Competitive ratio = \frac{T+E}{T+E}=1.
Case 2: The optimal offline solution takes the stairs, immediately.
We wait for R=S-E times and then take the stairs.
Competitive ratio = \frac{S-E+S}{S}=2-\frac{E}{S}.
EOP
What if we wait less time? Let's try R=S-E-\epsilon for some \epsilon>0
In the worst case, we take the stairs for S-E-\epsilon times and then take the stairs for S.
Competitive ratio = \frac{(S-E-\epsilon)+S}{S-E-\epsilon+E}=\frac{2S-E-\epsilon}{2S-E}>2-\frac{E}{S}.
So the optimal competitive ratio is max\{1,2-\frac{E}{S}\} when we wait for S-E time.
Example 2: Cache Replacement
Cache: Data in a cache is organized in blocks (also called pages or cache lines).
If CPU accesses data that is already in the cache, it is called cache hit, then access is fast.
If CPU accesses data that is not in the cache, it is called cache miss, This block if brought to cache from main memory. If the cache already has k blocks (full), then another block need to be kicked out (eviction).
Global: Minimize the number of cache misses.
Clairvoyant policy: Knows that will be accessed in the future and the sequence of access.
FIF: Farthest in the future
Example: k=3, cache has 3 blocks.
Sequence: A B C D C A B
Cache: A B C, the evict B for D. then 3 warm up and 1 miss.
Online Algorithm: Least recently used (LRU)
LRU: least recently used.
Example: A B C D C A B
Cache: A B C, the evict A for D. then 3 warm up and 1 miss.
Cache: D B C, the evict B for A. 1 miss.
Cache: D A C, the evict D for B. 1 miss.
Competitive Ratio for LRU
Claim: LRU is $k+1$-competitive.
Proof:
Split the sequence into subsequences such that each subsequence contains k+1 distinct blocks.
For example, suppose k=3, sequence ABCDCEFGEA, subsequences are ABCDC and EFGEA.
LRU Cache: In each subsequence, it has at most k+1 misses.
The optimal offline solution: In each subsequence, must have at least 1 miss.
So the competitive ratio is at most k+1.
EOP
Using similar analysis, we can show that LRU is k competitive.
Hint for the proof:
Split the sequence into subsequences such that each subsequence LRU has k misses.
Argue that OPT has at least 1 miss in each subsequence.
EOP
Many sensible algorithms are $k$-competitive
Lower Bound: No deterministic online algorithm is better than $k$-competitive.
Resource augmentation: Offline algorithm (which knows the future) has k cache lines in its cache and the online algorithm has ck cache lines with c>1.
Lemma: Competitive Ratio is \sim \frac{c}{c-1}
Say c=2. LRU cache has twice as much as cache. LRU is $2$-competitive.
Proof:
LRU has cache of size 2k.
Divide the sequence into subsequences such that you have ck distinct pages.
In each subsequence, LRU has at most ck misses.
The OPT has at least (c-1)k misses.
So competitive ratio is at most \frac{ck}{(c-1)k}=\frac{c}{c-1}.
Actual competitive ratio is \sim \frac{c}{c-1+\frac{1}{k}}.
EOP
Conclusion
- Definition: some information unknown
- Clairvoyant vs. Online
- Competitive Ratio
- Example:
- Elevator
- Cache Replacement
Example 3: Pessimal cache problem
Maximize number of cache misses.
Maximization problem: competitive ratio is max\{\frac{\text{cost of the optimal online algorithm}}{\text{cost of our algorithm}}\}.
Or get min\{\frac{\text{cost of our algorithm}}{\text{cost of the optimal online algorithm}}\}.
The size of the cache is k.
So if OPT has X cache misses, we want \geq \frac{X}{\alpha}. cache misses where \alpha is the competitive ratio.
Claim: The OPT could always miss (note quite) except when the page is accessed twice in a row.
Claim: No deterministic online algorithm has a bounded competitive ratio. (that is independent of the length of the sequence)
Proof:
Start with an empty cache. (size of cache is k)
Miss the first k unique pages.
P_1,P_2,\cdots,P_k|P_{k+1},P_{k+2},\cdots,P_{2k}
Say your deterministic online algorithm choose to evict P_i for i\in\{1,2,\cdots,k\}.
We want to choose P_i for i\in\{1,2,\cdots,k\} such that the number of misses is maximized.
The optimal offline solution: evict the page that will be accessed furthest in the future. Let's call it \sigma.
The online algorithm: evict P_i for i\in\{1,2,\cdots,k\}. Will have k+1 misses in the worst case.
So the competitive ratio is at most \frac{\sigma}{k+1}, which is unbounded.
Randomized most recently used (RAND, MRU)
MRU without randomization is a deterministic algorithm, and thus, the competitive ration is bounded.
First k unique accesses brings all pages to cache.
On the $k+1$th access, pick a random page from the cache and evict it.
After that evict the MRU no a miss.
Claim: RAND is $k$-competitive.
Lemma: After the first k+1 unique accesses at all times
- 1 page is in the cache with probability 1 (the MRU one)
- There exists
kpages each of which is in the cache with probability1-\frac{1}{k} - All other pages are in the cache with probability
0.
Proof:
By induction.
Base case: right after the first k+1 unique accesses and before $k+2$th access.
P_{k+1}is in the cache with probability1.- When we brought
P_{k+1}to the cache, we evicted one page uniformly at random. (i.e.P_iis evicted with probability\frac{1}{k},P_iis still in the cache with probability1-\frac{1}{k}) - All other
rpages are definitely not in the cache because we did not see them yet.
Inductive cases:
Let P be a page that is in the cache with probability 0
Cache miss and RAND MRU evict P' for another page with probability in this cache with probability 0.
Pis in the cache with probability1.- By induction, there exists a set of
kpages each of which is in the cache with probability1-\frac{1}{k}. - All other pages are in the cache with probability
0.
Let P be a page in the cache with probability 1-\frac{1}{k}.
With probability \frac{1}{k}, P is not in the cache and RAND evicts P' in the cache and brings P to the cache.
EOP
MRU is $k$-competitive.
Proof:
Case 1: Access MRU page.
Both OPT and our algorithm don't miss.
Case 2: Access some other 1 page
OPT definitely misses.
RAND MRU misses with probability \geq \frac{1}{k}.
Let's define the random variable X as the number of misses of RAND MRU.
E[X]\leq 1+\frac{1}{k}.
EOP