Trance-0/NoteNextra-origin

Fork 0

Files

Zheyuan Wu f08d8ff674 update notes

2024-11-18 14:07:36 -06:00

9.5 KiB

Raw Blame History

Lecture 10

Online Algorithms

Example 1: Elevator

Problem: You've entered the lobby of a tall building, and want to go to the top floor as quickly as possible. There is an elevator which takes E time to get to the top once it arrives. You can also take the stairs which takes S time to climb (once you start) with S>E. However, you do not know when the elevator will arrive.

Offline (Clairvoyant) vs. Online

Offline: If you know that the elevator is arriving in T time, the what will you do?

Easy. I will computer E+T with S and take the smaller one.

Online: You do not know when the elevator will arrive.

You can either wait for the elevator or take the stairs.

Strategies

Always take the stairs.

Your cost S,

Optimal Cost: E.

Your cost / Optimal cost = \frac{S}{E}.

S would be arbitrary large. For example, the Empire State Building has 103 floors.

Wait for the elevator

Your cost T+E

Optimal Cost: S (if T is large)

Your cost / Optimal cost = \frac{T+E}{S}.

T could be arbitrary large. For out of service elevator, T could be infinite.

Online Algorithms

Definition: An online algorithm must take decisions without full information about the problem instance [in this case $T$] and/or it does not know the future [e.g. makes decision immediately as jobs come in without knowing the future jobs].

An offline algorithm has the full information about the problem instance.

Competitive Ratio

Quality of online algorithm is quantified by the competitive ratio (Idea is similar to the approximation ratio in optimization).

Consider a problem L (minimization) and let l be an instance of this problem.

C^*(l) is the cost of the optimal offline solution with full information and unlimited computational power.

A is the online algorithm for L.

C_A(l) is the value of $A$'s solution on l.

An online algorithm A is $\alpha$-competitive if


\frac{C_A(l)}{C^*(l)}\leq \alpha

for all instances l of the problem.

In other words, \alpha=\max_l\frac{C_A(l)}{C^*(l)}.

For maximization problems, we want to minimize the comparative ratio.

Back to the Elevator Problem

Strategy 1: Always take the stairs. Ratio is \frac{S}{E}. can be arbitrarily large.

Strategy 2: Wait for the elevator. Ratio is \frac{T+E}{S}. can be arbitrarily large.

Strategy 3: We do not make a decision immediately. Let's wait for R times and then takes stairs if elevator does not arrive.

Question: What is the value of R? (how long to wait?)

Let's try R=S.

Claim: The comparative ratio is 2.

Proof:

Case 1: The optimal offline solution takes the elevator, then T+E\leq S.

We also take the elevator.

Competitive ratio = \frac{T+E}{T+E}=1.

Case 2: The optimal offline solution takes the stairs, immediately.

We wait for R times and then take the stairs. In worst case, we wait for R times and then take the stairs for R.

Competitive ratio = \frac{2R}{R}=2.

EOP

Let's try R=S-E instead.

Claim: The comparative ratio is max\{1,2-\frac{E}{S}\}.

Proof:

Case 1: The optimal offline solution takes the elevator, then T+E\leq S.

We also take the elevator.

Competitive ratio = \frac{T+E}{T+E}=1.

Case 2: The optimal offline solution takes the stairs, immediately.

We wait for R=S-E times and then take the stairs.

Competitive ratio = \frac{S-E+S}{S}=2-\frac{E}{S}.

EOP

What if we wait less time? Let's try R=S-E-\epsilon for some \epsilon>0

In the worst case, we take the stairs for S-E-\epsilon times and then take the stairs for S.

Competitive ratio = \frac{(S-E-\epsilon)+S}{S-E-\epsilon+E}=\frac{2S-E-\epsilon}{2S-E}>2-\frac{E}{S}.

So the optimal competitive ratio is max\{1,2-\frac{E}{S}\} when we wait for S-E time.

Example 2: Cache Replacement

Cache: Data in a cache is organized in blocks (also called pages or cache lines).

If CPU accesses data that is already in the cache, it is called cache hit, then access is fast.

If CPU accesses data that is not in the cache, it is called cache miss, This block if brought to cache from main memory. If the cache already has k blocks (full), then another block need to be kicked out (eviction).

Global: Minimize the number of cache misses.

Clairvoyant policy: Knows that will be accessed in the future and the sequence of access.

FIF: Farthest in the future

Example: k=3, cache has 3 blocks.

Sequence: A B C D C A B

Cache: A B C, the evict B for D. then 3 warm up and 1 miss.

Online Algorithm: Least recently used (LRU)

LRU: least recently used.

Example: A B C D C A B

Cache: A B C, the evict A for D. then 3 warm up and 1 miss.

Cache: D B C, the evict B for A. 1 miss.

Cache: D A C, the evict D for B. 1 miss.

Competitive Ratio for LRU

Claim: LRU is $k+1$-competitive.

Proof:

Split the sequence into subsequences such that each subsequence contains k+1 distinct blocks.

For example, suppose k=3, sequence ABCDCEFGEA, subsequences are ABCDC and EFGEA.

LRU Cache: In each subsequence, it has at most k+1 misses.

The optimal offline solution: In each subsequence, must have at least 1 miss.

So the competitive ratio is at most k+1.

EOP

Using similar analysis, we can show that LRU is k competitive.

Hint for the proof:

Split the sequence into subsequences such that each subsequence LRU has k misses.

Argue that OPT has at least 1 miss in each subsequence.

EOP

Many sensible algorithms are $k$-competitive

Lower Bound: No deterministic online algorithm is better than $k$-competitive.

Resource augmentation: Offline algorithm (which knows the future) has k cache lines in its cache and the online algorithm has ck cache lines with c>1.

Lemma: Competitive Ratio is `\sim \frac{c}{c-1}`

Say c=2. LRU cache has twice as much as cache. LRU is $2$-competitive.

Proof:

LRU has cache of size 2k.

Divide the sequence into subsequences such that you have ck distinct pages.

In each subsequence, LRU has at most ck misses.

The OPT has at least (c-1)k misses.

So competitive ratio is at most \frac{ck}{(c-1)k}=\frac{c}{c-1}.

Actual competitive ratio is \sim \frac{c}{c-1+\frac{1}{k}}.

EOP

Conclusion

Definition: some information unknown
Clairvoyant vs. Online
Competitive Ratio
Example:
- Elevator
- Cache Replacement

Example 3: Pessimal cache problem

Maximize number of cache misses.

Maximization problem: competitive ratio is max\{\frac{\text{cost of the optimal online algorithm}}{\text{cost of our algorithm}}\}.

Or get min\{\frac{\text{cost of our algorithm}}{\text{cost of the optimal online algorithm}}\}.

The size of the cache is k.

So if OPT has X cache misses, we want \geq \frac{X}{\alpha}. cache misses where \alpha is the competitive ratio.

Claim: The OPT could always miss (note quite) except when the page is accessed twice in a row.

Claim: No deterministic online algorithm has a bounded competitive ratio. (that is independent of the length of the sequence)

Proof:

Start with an empty cache. (size of cache is k)

Miss the first k unique pages.

P_1,P_2,\cdots,P_k|P_{k+1},P_{k+2},\cdots,P_{2k}

Say your deterministic online algorithm choose to evict P_i for i\in\{1,2,\cdots,k\}.

We want to choose P_i for i\in\{1,2,\cdots,k\} such that the number of misses is maximized.

The optimal offline solution: evict the page that will be accessed furthest in the future. Let's call it \sigma.

The online algorithm: evict P_i for i\in\{1,2,\cdots,k\}. Will have k+1 misses in the worst case.

So the competitive ratio is at most \frac{\sigma}{k+1}, which is unbounded.

Randomized most recently used (RAND, MRU)

MRU without randomization is a deterministic algorithm, and thus, the competitive ration is bounded.

First k unique accesses brings all pages to cache.

On the $k+1$th access, pick a random page from the cache and evict it.

After that evict the MRU no a miss.

Claim: RAND is $k$-competitive.

Lemma: After the first `k+1` unique accesses at all times

1 page is in the cache with probability 1 (the MRU one)
There exists k pages each of which is in the cache with probability 1-\frac{1}{k}
All other pages are in the cache with probability 0.

Proof:

By induction.

Base case: right after the first k+1 unique accesses and before $k+2$th access.

P_{k+1} is in the cache with probability 1.
When we brought P_{k+1} to the cache, we evicted one page uniformly at random. (i.e. P_i is evicted with probability \frac{1}{k}, P_i is still in the cache with probability 1-\frac{1}{k})
All other r pages are definitely not in the cache because we did not see them yet.

Inductive cases:

Let P be a page that is in the cache with probability 0

Cache miss and RAND MRU evict P' for another page with probability in this cache with probability 0.

P is in the cache with probability 1.
By induction, there exists a set of k pages each of which is in the cache with probability 1-\frac{1}{k}.
All other pages are in the cache with probability 0.

Let P be a page in the cache with probability 1-\frac{1}{k}.

With probability \frac{1}{k}, P is not in the cache and RAND evicts P' in the cache and brings P to the cache.

EOP

MRU is $k$-competitive.

Proof:

Case 1: Access MRU page.

Both OPT and our algorithm don't miss.

Case 2: Access some other 1 page

OPT definitely misses.

RAND MRU misses with probability \geq \frac{1}{k}.

Let's define the random variable X as the number of misses of RAND MRU.

E[X]\leq 1+\frac{1}{k}.

EOP

9.5 KiB Raw Blame History

Lecture 10

Online Algorithms

Example 1: Elevator

Offline (Clairvoyant) vs. Online

Strategies

Online Algorithms

Competitive Ratio

Back to the Elevator Problem

Example 2: Cache Replacement

Competitive Ratio for LRU

Many sensible algorithms are $k$-competitive

Lemma: Competitive Ratio is \sim \frac{c}{c-1}

Conclusion

Example 3: Pessimal cache problem

Randomized most recently used (RAND, MRU)

Lemma: After the first k+1 unique accesses at all times

9.5 KiB

Raw Blame History

Lemma: Competitive Ratio is `\sim \frac{c}{c-1}`

Lemma: After the first `k+1` unique accesses at all times