update notes

This commit is contained in:
Zheyuan Wu
2024-11-18 14:07:36 -06:00
parent ee64606236
commit f08d8ff674
47 changed files with 5863 additions and 0 deletions

245
pages/CSE347/CSE347_L1.md Normal file
View File

@@ -0,0 +1,245 @@
# Lecture 1
## Greedy Algorithms
* Builds up a solution by making a series of small decisions that optimize some objective.
* Make one irrevocable choice at a time, creating smaller and smaller sub-problems of the same kind as the original problem.
* There are many potential greedy strategies and picking the right one can be challenging.
### A Scheduling Problem
You manage a giant space telescope.
* There are $n$ research projects that want to use it to make observations.
* Only one project can use the telescope at a time.
* Project $p_i$ needs the telescope starting at time $s_i$ and running for a length of time $t_i$.
* Goal: schedule as many as possible
Formally
Input:
* Given a set $P$ of projects, $|P|=n$
* Each request $p_i\in P$ occupies interval $[s_i,f_i)$, where $f_i=s_i+t_i$
Goal: Choose a subset $\Pi\sqsubseteq P$ such that
1. No two projects in $\Pi$ have overlapping intervals.
2. The number of selected projects $|\Pi|$ is maximized.
#### Shortest Interval
Counter-example: `[1,10],[9,12],[11,20]`
#### Earliest start time
Counter-example: `[1,10],[2,3],[4,5]`
#### Fewest Conflicts
Counter-example: `[1,2],[1,4],[1,4],[3,6],[7,8],[5,8],[5,8]`
#### Earliest finish time
Correct... but why
#### Theorem of Greedy Strategy (Earliest Finishing Time)
Say this greedy strategy (Earliest Finishing Time) picks a set $\Pi$ of intervals, some other strategy picks a set $O$ of intervals.
Assume sorted by finishing time
* $\Pi=\{i_1,i_2,...,i_k\},|\Pi|=k$
* $O=\{j_1,j_2,...,j_m\},|O|=m$
We want to show that $|\Pi|\geq|O|,k>m$
#### Lemma: For all $r<k,f_{i_r}\leq f_{j_r}$
We proceed the proof by induction.
* Base Case, when r=1.
$\Pi$ is the earliest finish time, and $O$ cannot pick a interval with earlier finish time, so $f_{i_r}\leq f_{j_r}$
* Inductive step, when r>1.
Since $\Pi_r$ is the earliest finish time, so for any set in $O_r$, $f_{i_{r-1}}\leq f_{j_{r-1}}$, for any $j_r$ inserted to $O_r$, it can also be inserted to $\Pi_r$. So $O_r$ cannot pick an interval with earlier finish time than $Pi$ since it will also be picked by definition if $O_r$ is the optimal solution $OPT$.
#### Problem of “Greedy Stays Ahead” Proof
* Every problem has very different theorem.
* It can be challenging to even write down the correct statement that you must prove.
* We want a systematic approach to prove the correctness of greedy algorithms.
### Road Map to Prove Greedy Algorithm
#### 1. Make a Choice
Pick an interval based on greedy choice, say $q$
Proof: **Greedy Choice Property**: Show that using our first choice is not "fatal" at least one optimal solution makes this choice.
Techniques: **Exchange Argument**: "If an optimal solution does not choose $q$, we can turn it into an equally good solution that does."
Let $\Pi^*$ be any optimal solution for project set $P$.
- If $q\in \Pi^*$, we are done.
- Otherwise, let $x$ be the optimal solution from $\Pi^*$ that does not pick $q$. We create another solution $\bar{\Pi^*}$ that replace $x$ with $q$, and prove that the $\bar{\Pi^*}$ is as optimal as $\Pi^*$
#### 2. Create a smaller instance $P'$ of the original problem
$P'$ has the same optimization criteria.
Proof: **Inductive Structure**: Show that after making the first choice, we're left with a smaller version of the same problem, whose solution we can safely combine with the first choice.
Let $P'$ be the subproblem left after making first choice $q$ in problem $P$ and let $\Pi'$ be an optimal solution to $P'$. Then $\Pi=\Pi^*\cup\{q\}$ is an optimal solution to $P$.
$P'=P-\{q\}-\{$projects conflicting with $q\}$
#### 3. Solution: Union of choices that we made
Union of choices that we made.
Proof: **Optimal Substructure**: Show that if we solve the subproblem optimally, adding our first choice creates an optimal solution to the *whole* problem.
Let $q$ be the first choice, $P'$ be the subproblem left after making $q$ in problem $P$, $\Pi'$ be an optimal solution to $P'$. We claim that $\Pi=\Pi'\cup \{q\}$ is an optimal solution to $P$.
We proceed the proof by contradiction.
Assume that $\Pi=\Pi'+\{q\}$ is not optimal.
By Greedy choice property $GCP$. we already know that $\exists$ an optimal solution $\Pi^*$ for problem $P$ that contains $q$. If $\Pi$ is not optimal, $cost(\Pi^*)<cost(\Pi)$. Then since $\Pi^*-q$ is also a feasible solution to $P'$. $cost(\Pi^*-q)>cost(\Pi-q)=\Pi'$ which leads to contradiction that $\Pi'$ is an optimal solution to $P'$.
#### 4. Put 1-3 together to write an inductive proof of the Theorem
This is independent of problem, same for every problem.
Use scheduling problem as an example:
Theorem: given a scheduling problem $P$, if we repeatedly choose the remaining feasible project with the earliest finishing time, we will construct an optimal feasible solution to $P$.
Proof: We proceed by induction on $|P|$. (based on the size of problem $P$).
- Base case: $|P|=1$.
- Inductive step.
- Inductive hypothesis: For all problems of size $<n$, earliest finishing time (EFT) gives us an optimal solution.
- EFT is optimal for problem of size $n$.
- Proof: Once we pick q, because of greedy choice. $P'=P=\{q\} -\{$interval that conflict with $q\}$. $|P'|<n$, By Inductive hypothesis, EFT gives us an optimal solution to $P'$, but by inductive substructure, and optimal substructure. $\Pi'$ (optimal solution to $P'$), we have optimal solution to $P$.
_this step always holds as long as the previous three properties hold, and we don't usually write the whole proof._
```python
# Algorithm construction for Interval scheduling problem
def schedule(p):
# sorting takes O(n)=nlogn
p=sorted(p,key=lambda x:x[1])
res=[P[0]]
# O(n)=n
for i in p[1:]:
if res[-1][-1]<i[0]:
res.append(i)
return res
```
## Extra Examples:
### File compression problem
You have $n$ files of different sizes $f_i$.
You want to merge them to create a single file. $merge(f_i,f_j)$ takes time $f_i+f_j$ and creates a file of size $f_k=f_i+f_j$.
Goal: Find the order of merges such that the total time to merge is minimized.
Thinking process: The merge process is a binary tree and each of the file is the leaf of the tree.
The total time required =$\sum^n_{i=1} d_if_i$, where $d_i$ is the depth of the file in the compression tree.
So compressing the smaller file first may yield a faster run time.
Proof:
#### Greedy Choice Property
Construct part of the solution by making a locally good decision.
Lemma: $\exist$ some optimal solution that merges the two smallest file first, lets say $[f_1,f_2]$
Proof: **Exchange argument**
* Case 1: Optimal choice already merges $f_1,f_2$, done. Time order does not matter in this problem at some point.
* eg: [2,2,3], merge 2,3 and 2,2 first don't change the total cost
* Case 2: Optimal choice does not merges $f_1$ and $f_2$.
* Suppose the optimal solution merges $f_x,f_y$ as the deepest merge.
* Then $d_x\geq d_1,d_y\geq d_2$. Exchanging $f_1,f_2$ with $f_x,f_y$ would yield a strictly less greater solution since $f_1,f_2$ already smallest.
#### Inductive Structure
* We can combine feasible solution to the subproblem $P'$ with the greedy choice to get a feasible solution to $P$
* After making greedy choice $q$, we are left with a strictly smaller subproblem $P'$ with the same optimality criteria of the original problem
*
Proof: **Optimal Substructure**: Show that if we solve the subproblem optimally, adding our first choice creates an optimal solution to the *whole* problem.
Let $q$ be the first choice, $P'$ be the subproblem left after making $q$ in problem $P$, $\Pi^*$ be an optimal solution to $P'$. We claim that $\Pi=\Pi'\cup \{q\}$ is an optimal solution to $P$.
We proceed the proof by contradiction.
Assume that $\Pi=\Pi^*+\{q\}$ is not optimal.
By Greedy choice property $GCP$. we already know that $\Pi^*$ is optimal solution that contains $q$. Then $|\Pi^*|>|\Pi|$ $\Pi^*-q$ is also feasible solution to $P'$. $|\Pi^*-q|>|\Pi-q|=\Pi'$ which is an optimal solution to $P'$ which leads to contradiction.
Proof: **Smaller problem size**
After merging the smallest two files into one, we have strictly less files waiting to merge.
#### Optimal Substructure
* We can combine optimal solution to the subproblem $P'$ with the greedy choice to get a optimal solution to $P$
Step 4 ignored, same for all greedy problems.
### Conclusion: Greedy Algorithm
* Algorithm
* Runtime Complexity
* Proof
* Greedy Choice Property
* Construct part of the solution by making a locally good decision.
* Inductive Structure
* We can combine feasible solution to the subproblem $P'$ with the greedy choice to get a feasible solution to $P$
* After making greedy choice $q$, we are left with a strictly smaller subproblem $P'$ with the same optimality criteria of the original problem
* Optimal Substructure
* We can combine optimal solution to the subproblem $P'$ with the greedy choice to get a optimal solution to $P$
* Standard Contradiction Argument simplifies it
## Review:
### Essence of master method
Let $a\geq 1$ and $b>1$ be constants, let $f(n)$ be a function, and let $T(n)$ be defined on the nonnegative integers by the recurrence
$$
T(n)=aT(\frac{n}{b})+f(n)
$$
where we interpret $n/b$ to mean either ceiling or floor of $n/b$. $c_{crit}=\log_b a$ Then $T(n)$ has to following asymptotic bounds.
* Case I: if $f(n) = O(n^{c})$ ($f(n)$ "dominates" $n^{\log_b a-c}$) where $c<c_{crit}$, then $T(n) = \Theta(n^{c_{crit}})$
* Case II: if $f(n) = \Theta(n^{c_{crit}})$, ($f(n), n^{\log_b a-c}$ have no dominate) then $T(n) = \Theta(n^{\log_b a} \log_2 n)$
Extension for $f(n)=\Theta(n^{critical\_value}*(\log n)^k)$
* if $k>-1$
$T(n)=\Theta(n^{critical\_value}*(\log n)^{k+1})$
* if $k=-1$
$T(n)=\Theta(n^{critical\_value}*\log \log n)$
* if $k<-1$
$T(n)=\Theta(n^{critical\_value})$
* Case III: if $f(n) = \Omega(n^{log_b a+c})$ ($n^{log_b a-c}$ "dominates" $f(n)$) for some constant $c >0$, and if a $f(n/b)<= c f(n)$ for some constant $c <1$ then for all sufficiently large $n$, $T(n) = \Theta(n^{log_b a+c})$

320
pages/CSE347/CSE347_L10.md Normal file
View File

@@ -0,0 +1,320 @@
# Lecture 10
## Online Algorithms
### Example 1: Elevator
Problem: You've entered the lobby of a tall building, and want to go to the top floor as quickly as possible. There is an elevator which takes $E$ time to get to the top once it arrives. You can also take the stairs which takes $S$ time to climb (once you start) with $S>E$. However, you **do not know** when the elevator will arrive.
#### Offline (Clairvoyant) vs. Online
Offline: If you know that the elevator is arriving in $T$ time, the what will you do?
- Easy. I will computer $E+T$ with $S$ and take the smaller one.
Online: You do not know when the elevator will arrive.
- You can either wait for the elevator or take the stairs.
#### Strategies
**Always take the stairs.**
Your cost $S$,
Optimal Cost: $E$.
Your cost / Optimal cost = $\frac{S}{E}$.
$S$ would be arbitrary large. For example, the Empire State Building has $103$ floors.
**Wait for the elevator**
Your cost $T+E$
Optimal Cost: $S$ (if $T$ is large)
Your cost / Optimal cost = $\frac{T+E}{S}$.
$T$ could be arbitrary large. For out of service elevator, $T$ could be infinite.
#### Online Algorithms
Definition: An online algorithm must take decisions **without** full information about the problem instance [in this case $T$] and/or it does not know the future [e.g. makes decision immediately as jobs come in without knowing the future jobs].
An **offline algorithm** has the full information about the problem instance.
### Competitive Ratio
Quality of online algorithm is quantified by the **competitive ratio** (Idea is similar to the approximation ratio in optimization).
Consider a problem $L$ (minimization) and let $l$ be an instance of this problem.
$C^*(l)$ is the cost of the optimal offline solution with full information and unlimited computational power.
$A$ is the online algorithm for $L$.
$C_A(l)$ is the value of $A$'s solution on $l$.
An online algorithm $A$ is $\alpha$-competitive if
$$
\frac{C_A(l)}{C^*(l)}\leq \alpha
$$
for all instances $l$ of the problem.
In other words, $\alpha=\max_l\frac{C_A(l)}{C^*(l)}$.
For maximization problems, we want to minimize the comparative ratio.
### Back to the Elevator Problem
**Strategy 1**: Always take the stairs. Ratio is $\frac{S}{E}$. can be arbitrarily large.
**Strategy 2**: Wait for the elevator. Ratio is $\frac{T+E}{S}$. can be arbitrarily large.
**Strategy 3**: We do not make a decision immediately. Let's wait for $R$ times and then takes stairs if elevator does not arrive.
Question: What is the value of $R$? (how long to wait?)
Let's try $R=S$.
Claim: The comparative ratio is $2$.
Proof:
Case 1: The optimal offline solution takes the elevator, then $T+E\leq S$.
We also take the elevator.
Competitive ratio = $\frac{T+E}{T+E}=1$.
Case 2: The optimal offline solution takes the stairs, immediately.
We wait for $R$ times and then take the stairs. In worst case, we wait for $R$ times and then take the stairs for $R$.
Competitive ratio = $\frac{2R}{R}=2$.
EOP
Let's try $R=S-E$ instead.
Claim: The comparative ratio is $max\{1,2-\frac{E}{S}\}$.
Proof:
Case 1: The optimal offline solution takes the elevator, then $T+E\leq S$.
We also take the elevator.
Competitive ratio = $\frac{T+E}{T+E}=1$.
Case 2: The optimal offline solution takes the stairs, immediately.
We wait for $R=S-E$ times and then take the stairs.
Competitive ratio = $\frac{S-E+S}{S}=2-\frac{E}{S}$.
EOP
What if we wait less time? Let's try $R=S-E-\epsilon$ for some $\epsilon>0$
In the worst case, we take the stairs for $S-E-\epsilon$ times and then take the stairs for $S$.
Competitive ratio = $\frac{(S-E-\epsilon)+S}{S-E-\epsilon+E}=\frac{2S-E-\epsilon}{2S-E}>2-\frac{E}{S}$.
So the optimal competitive ratio is $max\{1,2-\frac{E}{S}\}$ when we wait for $S-E$ time.
### Example 2: Cache Replacement
Cache: Data in a cache is organized in blocks (also called pages or cache lines).
If CPU accesses data that is already in the cache, it is called **cache hit**, then access is fast.
If CPU accesses data that is not in the cache, it is called **cache miss**, This block if brought to cache from main memory. If the cache already has $k$ blocks (full), then another block need to be **kicked out** (eviction).
Global: Minimize the number of cache misses.
**Clairvoyant policy**: Knows that will be accessed in the future and the sequence of access.
FIF: Farthest in the future
Example: $k=3$, cache has $3$ blocks.
Sequence: $A B C D C A B$
Cache: $A B C$, the evict $B$ for $D$. then 3 warm up and 1 miss.
Online Algorithm: Least recently used (LRU)
LRU: least recently used.
Example: $A B C D C A B$
Cache: $A B C$, the evict $A$ for $D$. then 3 warm up and 1 miss.
Cache: $D B C$, the evict $B$ for $A$. 1 miss.
Cache: $D A C$, the evict $D$ for $B$. 1 miss.
#### Competitive Ratio for LRU
Claim: LRU is $k+1$-competitive.
Proof:
Split the sequence into subsequences such that each subsequence contains $k+1$ distinct blocks.
For example, suppose $k=3$, sequence $ABCDCEFGEA$, subsequences are $ABCDC$ and $EFGEA$.
LRU Cache: In each subsequence, it has at most $k+1$ misses.
The optimal offline solution: In each subsequence, must have at least $1$ miss.
So the competitive ratio is at most $k+1$.
EOP
Using similar analysis, we can show that LRU is $k$ competitive.
Hint for the proof:
Split the sequence into subsequences such that each subsequence LRU has $k$ misses.
Argue that OPT has at least $1$ miss in each subsequence.
EOP
#### Many sensible algorithms are $k$-competitive
**Lower Bound**: No deterministic online algorithm is better than $k$-competitive.
**Resource augmentation**: Offline algorithm (which knows the future) has $k$ cache lines in its cache and the online algorithm has $ck$ cache lines with $c>1$.
##### Lemma: Competitive Ratio is $\sim \frac{c}{c-1}$
Say $c=2$. LRU cache has twice as much as cache. LRU is $2$-competitive.
Proof:
LRU has cache of size $2k$.
Divide the sequence into subsequences such that you have $ck$ distinct pages.
In each subsequence, LRU has at most $ck$ misses.
The OPT has at least $(c-1)k$ misses.
So competitive ratio is at most $\frac{ck}{(c-1)k}=\frac{c}{c-1}$.
_Actual competitive ratio is $\sim \frac{c}{c-1+\frac{1}{k}}$._
EOP
### Conclusion
- Definition: some information unknown
- Clairvoyant vs. Online
- Competitive Ratio
- Example:
- Elevator
- Cache Replacement
### Example 3: Pessimal cache problem
Maximize number of cache misses.
Maximization problem: competitive ratio is $max\{\frac{\text{cost of the optimal online algorithm}}{\text{cost of our algorithm}}\}$.
Or get $min\{\frac{\text{cost of our algorithm}}{\text{cost of the optimal online algorithm}}\}$.
The size of the cache is $k$.
So if OPT has $X$ cache misses, we want $\geq \frac{X}{\alpha}$. cache misses where $\alpha$ is the competitive ratio.
Claim: The OPT could always miss (note quite) except when the page is accessed twice in a row.
Claim: No deterministic online algorithm has a bounded competitive ratio. (that is independent of the length of the sequence)
Proof:
Start with an empty cache. (size of cache is $k$)
Miss the first $k$ unique pages.
$P_1,P_2,\cdots,P_k|P_{k+1},P_{k+2},\cdots,P_{2k}$
Say your deterministic online algorithm choose to evict $P_i$ for $i\in\{1,2,\cdots,k\}$.
We want to choose $P_i$ for $i\in\{1,2,\cdots,k\}$ such that the number of misses is maximized.
The optimal offline solution: evict the page that will be accessed furthest in the future. Let's call it $\sigma$.
The online algorithm: evict $P_i$ for $i\in\{1,2,\cdots,k\}$. Will have $k+1$ misses in the worst case.
So the competitive ratio is at most $\frac{\sigma}{k+1}$, which is unbounded.
#### Randomized most recently used (RAND, MRU)
MRU without randomization is a deterministic algorithm, and thus, the competitive ration is bounded.
First $k$ unique accesses brings all pages to cache.
On the $k+1$th access, pick a random page from the cache and evict it.
After that evict the MRU no a miss.
Claim: RAND is $k$-competitive.
#### Lemma: After the first $k+1$ unique accesses at all times
1. 1 page is in the cache with probability 1 (the MRU one)
2. There exists $k$ pages each of which is in the cache with probability $1-\frac{1}{k}$
3. All other pages are in the cache with probability $0$.
Proof:
By induction.
Base case: right after the first $k+1$ unique accesses and before $k+2$th access.
1. $P_{k+1}$ is in the cache with probability $1$.
2. When we brought $P_{k+1}$ to the cache, we evicted one page uniformly at random. (i.e. $P_i$ is evicted with probability $\frac{1}{k}$, $P_i$ is still in the cache with probability $1-\frac{1}{k}$)
3. All other $r$ pages are definitely not in the cache because we did not see them yet.
Inductive cases:
Let $P$ be a page that is in the cache with probability $0$
Cache miss and RAND MRU evict $P'$ for another page with probability in this cache with probability $0$.
1. $P$ is in the cache with probability $1$.
2. By induction, there exists a set of $k$ pages each of which is in the cache with probability $1-\frac{1}{k}$.
3. All other pages are in the cache with probability $0$.
Let $P$ be a page in the cache with probability $1-\frac{1}{k}$.
With probability $\frac{1}{k}$, $P$ is not in the cache and RAND evicts $P'$ in the cache and brings $P$ to the cache.
EOP
MRU is $k$-competitive.
Proof:
Case 1: Access MRU page.
Both OPT and our algorithm don't miss.
Case 2: Access some other 1 page
OPT definitely misses.
RAND MRU misses with probability $\geq \frac{1}{k}$.
Let's define the random variable $X$ as the number of misses of RAND MRU.
$E[X]\leq 1+\frac{1}{k}$.
EOP

View File

@@ -0,0 +1 @@
# Lecture 11

View File

@@ -0,0 +1 @@
# Lecture 12

View File

@@ -0,0 +1 @@
# Lecture 13

View File

@@ -0,0 +1 @@
# Lecture 14

View File

@@ -0,0 +1 @@
# Lecture 15

334
pages/CSE347/CSE347_L2.md Normal file
View File

@@ -0,0 +1,334 @@
# Lecture 2
## Divide and conquer
Review of CSE 247
1. Divide the problem into (generally equal) smaller subproblems
2. Recursively solve the subproblems
3. Combine the solutions of subproblems to get the solution of the original problem
- Examples: Merge Sort, Binary Search
Recurrence
Master Method:
$$
T(n)=aT(\frac{n}{b})+\Theta(f(n))
$$
### Example 1: Multiplying 2 numbers
Normal Algorithm:
```python
def multiply(x,y):
p=0
for i in y:
p+=x*y
return p
```
divide and conquer approach
```python
def multiply(x,y):
n=max(len(x),len(y))
if n==1:
return x*y
xh,xl=x>>(n/2),x&((1<<n/2)-1)
yh,yl=y>>(n/2),y&((1<<n/2)-1)
return (multiply(xh,yh)<<n)+((multiply(xh,yl)+multiply(yh,xl))<<(n/2))+multiply(xl,yl)
```
$$
T(n)=4T(n/2)+\Theta(n)=\Theta(n^2)
$$
Not a useful optimization
But,
$$
multiply(xh,yl)+multiply(yh,xl)=multiply(xh-xl,yh-yl)+multiply(xh,yh)+multiply(xl,yl)
$$
```python
def multiply(x,y):
n=max(len(x),len(y))
if n==1:
return x*y
xh,xl=x>>(n/2),x&((1<<n/2)-1)
yh,yl=y>>(n/2),y&((1<<n/2)-1)
zhh=multiply(xh,yh)
zll=multiply(xl,yl)
return (zhh<<n)+((multiply(xh-xl,yh-yl)+zhh+zll)<<(n/2))+zll
```
$$
T(n)=3T(n/2)+\Theta(n)=\Theta(n^{\log_2 3})\approx \Theta(n^{1.58})
$$
### Example 2: Closest Pairs
Input: $P$ is a set of $n$ points in the plane. $p_i=(x_i,y_i)$
$$
d(p_i,p_j)=\sqrt{(x_i-x_j)^2+(y_i-y_j)^2}
$$
Goal: Find the distance between the closest pair of points.
Naive algorithm: iterate all pairs ($O(n)=\Theta(n^2)$).
Divide and conquer algorithm:
Preprocessing: Sort $P$ by $x$ coordinate to get $P_x$.
Base case:
- 1 point: clostest d = inf
- 2 points: clostest d = d(p_1,p_2)
Divide Step:
Compute mid point and get $Q, R$.
Recursive step:
- $d_l$ closest pair in $Q$
- $d_r$ closest pair in $R$
Combine step:
Calculate $d_c$ closest point such that one point is on the left side and the other is on the right.
return $min(d_c,d_l,d_r)$
Total runtime:
$$
T(n)=2T(n/2)+\Theta(n^2)
$$
Still no change.
Important Insight: Can reduce the number of checks
**Lemma:** If all points within this square are at least $\delta=min\{d_r,d_l\}$ apart, there are at most 4 points in this square.
A better algorithm:
1. Divide $P_x$ into 2 halves using the mid point
2. Recursively computer the $d_l$ and $d_r$, take $\delta=min(d_l,d_r)$.
3. Filter points into y-strip: points which are within $(mid_x-\delta,mid_x+\delta)$
4. Sort y-strip by y coordinate. For every point $p$, we look at this y-strip in sorted order starting at this point and stop when we see a point with y coordinate $>p_y +\delta$
```python
# d is distance function
def closestP(P,d):
Px=sorted(P,key=lambda x:x[0])
def closestPRec(P,d):
n=len(P)
if n==1:
return float('inf')
if n==2:
return d(P[0],P[1])
Q,R=Px[:n//2],Px[n//2:]
midx=R[0][0]
dl,dr=closestP(Q),closestP(R)
dc=min(dl,dr)
ys=[i if midx-dc<i[0]<midx+dc for i in P]
ys.sort()
yn=len(ys)
# this step below checks at most 4 points, (but still runs O(n))
for i in range(yn):
for j in range(i,yn):
curd=d(ys[i],ys[j])
if curd>dc:
break
dc=min(dc,curd)
return dc
return closestPRec(Px,d):
```
Runtime analysis:
$$
T(n)=2T(n/2)+\Theta(n\log n)=\Theta(n\log^2 n)
$$
We can do even better by presorting Y
1. Divide $P_x$ into 2 halves using the mid point
2. Recursively computer the $d_l$ and $d_r$, take $\delta=min(d_l,d_r)$.
3. Filter points into y-strip: points which are within $(mid_x-\delta,mid_x+\delta)$ by visiting presorted $P_y$
```python
# d is distance function
def closestP(P,d):
Px=sorted(P,key=lambda x:x[0])
Py=sorted(P,key=lambda x:x[1])
def closestPRec(P,d):
n=len(P)
if n==1:
return float('inf')
if n==2:
return d(P[0],P[1])
Q,R=Px[:n//2],Px[n//2:]
midx=R[0][0]
dl,dr=closestP(Q),closestP(R)
dc=min(dl,dr)
ys=[i if midx-dc<i[0]<midx+dc for i in Py]
yn=len(ys)
# this step below checks at most 4 points, (but still runs O(n))
for i in range(yn):
for j in range(i,yn):
curd=d(ys[i],ys[j])
if curd>dc:
break
dc=min(dc,curd)
return dc
return closestPRec(Px,d):
```
Runtime analysis:
$$
T(n)=2T(n/2)+\Theta(n)=\Theta(n\log n)
$$
## In-person lectures
$$
T(n)=aT(n/b)+f(n)
$$
$a$ is number of sub problems, $n/b$ is size of subproblems, $f(n)$ is the cost of divide and combine cost.
### Example 3: Max Contiguous Subsequence Sum (MCSS)
Given: array of integers (positive or negative), $S=[s_1,s_2,...,s_n]$
Return: $max\{\sum^i_{k=i} s_k|1\leq i\leq n, i\leq j\leq n\}$
Trivial solution:
brute force
$O(n^3)$
A bit better solution:
$O(n^2)$ use prefix sum to reduce cost for sum.
Divide and conquer solution.
```python
def MCSS(S):
def MCSSMid(S,i,j,mid):
res=S[j]
for l in range(i,j):
curS=0
for r in range(l,j):
curS+=S[r]
res=max(res,curS)
return res
def MCSSRec(i,j):
if i==j:
return S[i]
mid=(i+j)//2
L,R=MCSSRec(i,mid),MCSSRec(mid,j)
C=MCSSMid(i,j)
return min([L,C,R])
return MCSSRec(0,len(S))
```
If `MCSSMid(S,i,j,mid)` use trivial solution, the running time is:
$$
T(n)=2T(n/2)+O(n^2)=\Theta(n^2)
$$
and we did nothing.
Observations: Any contiguous subsequence that starts on the left and ends on the right can be split into two parts as `sum(S[i:j])=sum(S[i:mid])+sum(S[mid,j])`
and let $LS$ be the subsequence that has the largest sum that ends at mid, and $RS$ be the subsequence that has the largest sum on the right that starts at mid.
**Lemma:** Biggest subsequence that contains `S[mid]` is $LS+RP$
Proof:
By contradiction,
Assume for the sake of contradiction that $y=L'+R'$ is a sum of such a subsequence that is larger than $x$ ($y>x$).
Let $z=LS+R'$, since $LS\geq L'$, by definition of $LS$, then $z\geq y$, WOLG, $RS\geq R'$, $x\geq y$, which contradicts that $y>x$.
Optimized function as follows:
```python
def MCSS(S):
def MCSSMid(S,i,j,mid):
res=S[mid]
LS,RS=0,0
cl,cr=0,0
for l in range(mid-1,i-1,-1):
cl+=S[l]
LS=max(LS,cl)
for r in range(mid+1,j):
cr+=S[r]
RS=max(RS,cr)
return res+LS+RS
def MCSSRec(i,j):
if i==j:
return S[i]
mid=(i+j)//2
L,R=MCSSRec(i,mid),MCSSRec(mid,j)
C=MCSSMid(i,j)
return min([L,C,R])
return MCSSRec(0,len(S))
```
The running time is:
$$
T(n)=2T(n/2)+O(n)=\Theta(n\log n)
$$
Strengthening the recusions:
```python
def MCSS(S):
def MCSSRec(i,j):
if i==j:
return S[i],S[i],S[i],S[i]
mid=(i+j)//2
L,lp,ls,sl=MCSSRec(i,mid)
R,rp,rs,sr=MCSSRec(mid,j)
return min([L,R,ls+rp]),max(lp,sl+rp),max(rs,sr+ls),sl+sr
return MCSSRec(0,len(S))
```
Pre-computer version:
```python
def MCSS(S):
pfx,sfx=[0],[S[-1]]
n=len(S)
for i in range(n-1):
pfx.append(pfx[-1]+S[i])
sfx.insert(sfx[0]+S[n-i-2],0)
def MCSSRec(i,j):
if i==j:
return S[i],pfx[i],sfx[i]
mid=(i+j)//2
L,lp,ls=MCSSRec(i,mid)
R,rp,rs=MCSSRec(mid,j)
return min([L,R,ls+rp]),max(lp,sfx[mid]-sfx[i]+rp),max(rs,sfx[j]-sfx[mid]+ls)
return MCSSRec(0,n)
```
$$
T(n)=2T(n/2)+O(1)=\Theta(n)
$$

161
pages/CSE347/CSE347_L3.md Normal file
View File

@@ -0,0 +1,161 @@
# Lecture 3
## Dynamic programming
When we cannot find a good Greedy Choice, the only thing we can do is to iterate all choices.
### Example 1: Edit distance
Input: 2 sequences of some character set, e.g.
$S=ABCADA$, $T=ABADC$
Goal: Computer the minimum number of **insertions or deletions** you could do to convert $S$ into $T$
We will call it `Edit Distance(S[1...n],T[1...m])`. where `n` and `m` be the length of `S` and `T` respectively.
Idea: computer difference between the sequences.
Observe: The difference we observed appears at index 3, and in this example where the sequences are short, it is obvious that it is better to delete 'C'. But for long sequence, we donot know that the later sequence looks like so it is hard to make a decision on whether to insert 'A' or delete 'C'.
Use branching algorithm:
```python
def editDist(S,T,i,j):
if len(S)<=i:
return len(T)
if len(T)<=j:
return len(S)
if S[i]==T[j]:
return editDist(S,T,i+1,j+1)
else:
return min(editDist(S,T,i+1,j),editDist(S,T,i,j+1))
```
Correctness Proof Outline:
- ~~Greedy Choice Property~~
- Complete Choice Property:
- The optimal solution makes **one** of the choices that we consider
- Inductive Structure:
- Once you make **any** choice, you are left with a smaller problem of the same type. **Any** first choice + **feasible** solution to the subproblem = feasible solution to the entire problem.
- Optimal Substructure:
- If we optimally solve the subproblem for **a particular choice c**, and combine it with c, resulting solution is the **optimal solution that makes choice c**.
Correctness Proof:
Claim: For any problem $P$, the branking algorithm finds the optimal solution.
Proof: Induct on problem size
- Base case: $|S|=0$ or $|T|=0$, obvious
- Inductive Case: By inductive hypothesis: Branching algorithm works for all smaller problems, either $S$ is smaller or $T$ is smaller or both
- For each choice we make, we got a strictly smaller problem: by inductive structure, and the answer is correct by inductive hypothesis.
- By Optimal substructure, we know for any choice, the solution of branching algorithm for subproblem and the choice we make is an optimal solution for that problem.
- Using Complete choice property, we considered all the choices.
Using tree graph, the left and right part of the tree has height n, but the middle part of the tree has height 2n. So the running time is $\Omega(2^n)$, at least $2^n$.
#### How could we reduce the complexity?
There are **overlapping subproblems** that we compute more than once! Number of distinct subproblems is polynomial, we can **share the solution** that we have already computed!
**store the result of subprolem in 2D array**
Use dp:
```python
def editDist(S,T,i,j):
m,n=len(S),len(T)
dp=[[0]*(n+1) for _ in range(m+1)]
for i in range(n):
dp[i][m]=n-i
for i in range(m):
dp[n][j]=m-i
for i in range(m):
for j in range(n):
if S[i]==T[j]:
dp[i][j]=dp[i+1][j+1]
else:
# assuming the cost of insertion and deletion is 1
dp[i][j]=min(1+dp[i][j+1],1+dp[i+1][j])
```
We can use backtracking to find out how do we reach our final answer. Then the new runtime will be the time used to complete the table, which is $T(n,m)=\Theta(mn)$
### Example 2: Weighted Interval Scheduling (IS)
Input: $P=\{p_1,p_2,...,p_n\}$, $p_i=\{s_i,f_i,w_i\}$
$s_i$ is the start time, $f_i$ is the finish time, $w_i$ is the weight of the task for job $i$
Goal: Pick a set of **non-overlapping** intervals $\Pi$ such that $\sum_{p_i\in \Pi} w_i$ is maximized.
Trivial solution ($T(n)=O(2^n)$)
```python
# p=[[s_i,f_i,w_i],...]
p=[]
p.sort()
n=len(p)
def intervalScheduling(idx):
res=0
if i>=n:
return res
for i in range(idx,n):
# pick when end
if p[idx][1]>p[i][0]:
continue
res=max(intervalScheduling(i+1)+p[i][2],res)
return intervalScheduling(0)
```
Using dp ($T(n)=O(n^2)$)
```python
def intervalScheduling(p):
p.sort()
n=len(p)
dp=[0]*(n+1)
for i in range(n-1,-1,-1):
# load initial best case: do nothing
dp[i]=dp[i+1]
_,e,w=p[i]
for j in range(bisect.bisect_left(p,e,key=lambda x:x[0]),n+1):
dp[i]=max(dp[i],w+dp[j])
return dp[0]
```
### Example 3: Subset sums
Input: a set $S$ of positive and unique integers and another integer $K$.
Problem: Is there a subset $X\subseteq S$ such that $sum(X)=K$
Brute force takes $O(2^n)$.
```python
def subsetSum(arr,i,k)->bool:
if i>=len(arr):
if k==0:
return True
return False
return subsetSum(i+1,k-arr[i]) or subsetSum(i+1,k)
```
Using dp $O(nk)$
```python
def subsetSum(arr,k)->bool:
n=len(arr)
dp=[False]*(k+1)
dp[0]=True
for e in arr:
ndp=[]
for i in range(k+1):
ndp.append(dp[i])
if i-e>=0:
ndp[i]|=dp[i-e]
dp=ndp
return dp[-1]
```

321
pages/CSE347/CSE347_L4.md Normal file
View File

@@ -0,0 +1,321 @@
# Lecture 4
## Maximum Flow
### Example 1: Ship cement from factory to building
Input $s$: source, $t$: destination
Graph with **directed** edges weights on each edge: **capacity**
**Goal:** Ship as much stuff as possible while obeying capacity constrains.
Graph: $(V,E)$ directed and weighted
- Unique source and sink nodes $\to s, t$
- Each edge has capacity $c(e)$ [Integer]
A valid flow assignment assigns an integer $f(e)$ to each edge s.t.
Capacity constraint: $0\leq f(e)\leq c(e)$
Flow conservation:
$$
\sum_{e\in E_{in}(v)}f(e)=\sum_{e\in E_{out}(v)}f(e),\forall v\in V-{s,t}
$$
$E_{in}(v)$: set of incoming edges to $v$
$E_{out}(v)$: set of outgoing edges from $v$
Compute: Maximum Flow: Find a valid flow assignment to
Maximize $|F|=\sum_{e\in E_{in}(t)}f(e)=\sum_{e\in E_{out}(s)}f(e)$ (total units received by end and sent by source)
Additional assumptions
1. $s$ has no incoming edges, $t$ has no outgoing edges
2. You do not have a cycle of 2 nodes
A proposed algorithm:
1. Find a path from $s$ to $t$
2. Push as much flow along the path as possible
3. Adjust the capacities
4. Repeat until we cannot find a path
**Residual Graph:** If there is an edge $e=(u,v)$ in $G$, we will add a back edge $\bar{e}=(v,u)$. Capacity of $\bar{e}=$ flow on $e$. Call this graph $G_R$.
Algorithm:
- Find an "augmenting path" $P$.
- $P$ can contain forward or backward edges!
- Say the smallest residual capacity along the path is $k$.
- Push $k$ flow on the path ($f(e) =f(e) + k$ for all edges on path $P$)
- Reduce the capacity of all edges on the path $P$ by $k$
- **Increase** the capacity of the corresponding mirror/back edges
- Repeat until there are no augmenting paths
### Formalize: Ford-Fulkerson (FF) Algorithm
1. Initialize the residual graph $G_R=G$
2. Find an augmenting path $P$ with capacity $k$ (min capacity of any edge on $P$)
3. Fix up the residual capacities in $G_R$
- $c(e)=c(e)-k,\forall e\in P$
- $c(\bar{e})=c(\bar{e})+k,\forall \bar{e}\in P$
4. Repeat 2 and 3 until no augmenting path can be found in $G_R$.
```python
def ford_fulkerson_algo(G,n,s,t):
"""
Args:
G: is the graph for max_flow
n: is the number of vertex in the graph
s: start vertex of flow
t: end vertex of flow
Returns:
the max flow in graph from s to t
"""
# Initialize the residual graph $G_R=G$
GR=[defaultdict(int) for i in range(n)]
for i in range(n):
for v,_ in enumerate(G[i]):
# weight w is unused
GR[v][i]=0
path=set()
def augP(cur):
# Find an augumentting path $P$ with capacity $k$ (min capacity of any edge on $P$)
if cur==t: return True
# true for edge in residual path, false for edge in graph
for v,w in G[cur]:
if w==0 or (cur,v,False) in path: continue
path.add((cur,v,False))
if augP(v): return True
path.remove((cur,v,False))
for v,w in GR[cur]:
if w==0 or (cur,v,True) in path: continue
path.add((cur,v,True))
if augP(v): return True
path.remove((cur,v,True))
return False
while augP(s):
k=min([GR[a][b] if isR else G[a][b] for a,b,isR in path])
# Fix up the residual capacities in $G_R$
# - $c(e)=c(e)-k,\forall e\in P$
# - $c(\bar{e})=c(\bar{e})+k,\forall \bar{e}\in P$
for a,b,isR in path:
if isR:
GR[a][b]+=k
else:
G[a][b]-=k
return sum(GR[s].values())
```
#### Proof of Correctness: Valid Flow
**Lemma 1:** FF finds a valid flow
- Capacity and conservation constrains are not violated
- Capacity constraint: $0\leq f(e)\leq c(e)$
- Flow conservation: $\sum_{e\in E_{in}(v)}f(e)=\sum_{e\in E_{out}(v)}f(e),\forall v\in V-\{s,t\}$
Proof: We proceed by induction on **augmenting paths**
##### Base Case
$f(e)=0$ on all edges
##### Inductive Case
By inductive hypothesis, we have a valid flow and the corresponding residual graph $G_R$.
Inductive Step:
Now we find an augmented path $P$ in $GR$, pushed $k$ (which is the smallest edge capacity on $P$). Argue that the constraints are not violated.
**Capacity Constrains:** Consider an edge $e$ in $P$.
- If $e$ is an forward edge (in the original graph)
- by construction of $G_R$, it had left over capacities.
- If $e$ is an back edge with residual capacity $\geq k$
- flow on real edge reduces, but the real capacity is still $\geq 0$, no capacity constrains violation.
**Conservation Constrains:** Consider a vertex $v$ on path $P$
1. Both forward edges
- No violation, push $k$ flow into $v$ and out.
2. Both back edges
- No violation, push $k$ less flow into $v$ and out.
3. Redirecting flow
- No violation, change of $0$ by $k-k$ on $v$.
#### Proof of Correctness: Termination
**Lemma 2:** FF terminate
Proof:
Every time it finds an augmenting path that increases the total flow.
Must terminate either when it finds a max flow or before.
Each iteration we use $\Theta(m+n)$ to find a valid path.
The number of iteration $\leq |F|$, the total is $\Theta(|F|(m+n))$ (not polynomial time)
#### Proof of Correctness: Optimality
From Lemma 1 and 2, we know that FF returns a feasible solution, but does it return the **maximum** flow?
##### Max-flow Min-cut Theorem
Given a graph $G(V,E)$, a **graph cut** is a partition of vertices into 2 subsets.
- $S$: $s$ + maybe some other vertices
- $V-S$: $t$ + maybe some other vertices
Define capacity of the cut be the sum of capacity of edges that go from a vertex in $S$ to a vertex in $T$.
**Lemma 3:** For all valid flows $f$, $|f|\leq C(S)$ for all cut $S$ (Max-flow $\leq$ Min-cut)
Proof: all flow must go through one of the cut edges.
**Min-cut:** cut of smallest capacity, $S^*$. $|f|\leq C(S^*)$
**Lemma 4:** FF produces a flow $=C(S^*)$
Proof: Let $\hat{f}$ be the flow found by FF. Mo augmenting paths in $G_R$.
Let $\hat{S}$ be all vertices that can be reached from $s$ using edges with capacities $>0$.
and all the forward edges going out of the cut are saturated. Since back edges have capacity 0, no flow is going into the cut $S$.
If some flow was coming from $V-\hat{S}$, then there must be some edges with capacity $>0$. So, $|f|\leq C(S^*)$
### Example 2: Bipartite Matching
input: Given $n$ classes and $n$ rooms; we want to match classes to rooms.
Bipartite graph $G=(V,E)$ (unweighted and undirected)
- Vertices are either in set $L$ or $R$
- Edges only go between vertices of different sets
Matching: A subset of edges $M\subseteq E$ s.t.
- Each vertex has at most one edge from $M$ incident on it.
Maximum Matching: matching of the largest size.
We will reduce the problem to the problem of finding the maximum flow
#### Reduction
Given a bipartite graph $G=(V,E)$, construct a graph $G'=(V',E')$ such that
$$
|max-flow (G')|=|max-flow(G)|
$$
Let $s$ connects to all vertices in $L$ and all vertex in $R$ connects to $t$.
$G'=G+s+t+$added edges form $S$ to $T$ and added capacities.
#### Proof of correctness
Claim: $G'$ has a flow of $k$ iff $G$ has a matching of size $k$
Proof: Two directions:
1. Say $G$ has a matching of size $k$, we want to prove $G'$ has a flow of size $k$.
2. Say $G'$ has a flow of size $k$, we want to prove $G$ has a matching of size $k$.
## Conclusion: Maximum Flow
Problem input and target
Ford-Fulkerson Algorithm
- Execution: residual graph
- Runtime
FF correctness proof
- Max-flow Min-cut Theorem
- Graph Cut definition
- Capacity of cut
Reduction to Bipartite Matching
### Example 3: Image Segmentation: (reduction from min-cut)
Given:
- Image consisting of an object and a background.
- the object occupies some set of pixels $A$, while the background occupies the remaining pixels $B$.
Required:
- Separate $A$ from $B$ but if doesn't know which pixels are each.
- For each pixel $i,p_i$ is the probability that $i\in A$
- For each pair of adjacent pixels $i,j,c_{ij}$ is the cost of placing the object boundary between them. i.e. putting $i$ in $A$ and $j$ in $B$.
- A segmentation of the image is an assignment of each pixel to $A$ or $B$.
- The goal is to find a segmentation that maximizes
$$
\sum_{i\in A}p_i+\sum_{i\in B}(1-p_i)-\sum_{i,j\ on \ boundary}c_{ij}
$$
Solution:
- Let's turn our maximization into a minimization
- If the image has $N$ pixels, then we can rewrite the objective as
$$
N-\sum_{i\in A}(1-p_i)-\sum_{i\in B}p_i-\sum_{i,j\ on \ boundary}c_{ij}
$$
because $N=\sum_{i\in A}p_i+\sum_{i\in A}(1-p_i)+\sum_{i\in B}p_i+\sum_{i\in B}(1-p_i)$ boundary
New maximization problem:
$$
Max\left( N-\sum_{i\in A}(1-p_i)-\sum_{i\in B}p_i-\sum_{i,j\ on \ boundary}c_{ij}\right)
$$
Now, this is equivalent ot minimizing
$$
\sum_{i\in A}(1-p_i)+\sum_{i\in B}p_i+\sum_{i,j\ on \ boundary}c_{ij}
$$
Second steps
- Form a graph with $n$ vertices, $v_i$ on for each pixel
- Add vertices $s$ and $t$
- For each $v_i$, add edges $S-T$ cut of $G$ assigned each $v_i$ to either $S$ side or $T$ side.
- The $S$ side of an $S-T$ is the $A$ side, while the $T$ side of the cur is the $B$ side.
- Observer that if $v_i$ goes on the $S$ side, it becomes part of $A$, so the cut increases by $1-p$. Otherwise, it become part of $B$, so the cut increases by $p_i$ instead.
- Now add edges $v_i\to v_j$ with capacity $c_{ij}$ for all adjacent pixels pairs $i,j$
- If $v_i$ and $v_j$ end up on opposite sides of the cut (boundary), then the cut increases by $c_{ij}$.
- Conclude that any $S-T$ cut that assigns $S\subseteq V$ to the $A$ side and $V\backslash S$ to the $B$ side pays a total of
1. $1-p_i$ for each $v_i$ on the $A$ side
2. $p_i$ for each $v_i$ on the $B$ side
3. $c_{ij}$ for each adjacent pair $i,j$ that is at the boundary. i.e. $i\in S\ and\ j\in V\backslash S$
- Conclude that a cut with a capacity $c$ implies a segmentation with objective value $cs$.
- The converse can (and should) be also checked: a segmentation with subjective value $c$ implies a $S-T$ cut with capacity $c$.
#### Algorithm
- Given an image with $N$ pixels, build the graph $G$ as desired.
- Use the FF algorithm to find a minimum $S-T$ cut of $G$
- Use this cut to assign each pixel to $A$ or $B$ as described, i.e pixels that correspond to vertices on the $S$ side are assigned to $A$ and those corresponding to vertices on the $T$ side to $B$.
- Minimizing the cut capacity minimizes our transformed minimization objective function.
#### Running time
The graph $G$ contains $\Theta(N)$ edges, because each pixel is adjacent to a maximum of of 4 neighbors and $S$ and $T$.
FF algorithm has running time $O((m+n)|F|)$, where $|F|\leq |n|$ is the size of set of min-cut. The edge count is $m=6n$.
So the total running time is $O(n^2)$

341
pages/CSE347/CSE347_L5.md Normal file
View File

@@ -0,0 +1,341 @@
# Lecture 5
## Takeaway from Bipartite Matching
- We saw how to solve a problem (bi-partite matching and others) by reducing it to another problem (maximum flow).
- In general, we can design an algorithm to map instances of a new problem to instances of known solvable problem (e.g., max-flow) to solve this new problem!
- Mapping from one problem to another which preserves solutions is called reduction.
## Reduction: Basic Idea
Convert solutions to the known problem to the solutions to the new problem
- Instance of new problem
- Instance of known problem
- Solution of known problem
- Solution of new problem
## Reduction: Formal Definition
Problems $L,K$.
$L$ reduces to $K$ ($L\leq K$) if there is a mapping $\phi$ from **any** instance $l\in L$ to some instance $\phi(l)\in K'\subset K$, such that the solution for $\phi(l)$ yields a solution for $l$.
This means that **L is no harder than K**
### Using reduction to design algorithms
In the example of reduction to solve Bipartite Matching:
$L:$ Bipartite Matching
$K:$ Max-flow Problem
Efficiency:
1. Reduction: $\phi:l\to\phi(l)$ (Polynomial time reduction $\phi(l)$)
2. Solve prom $\phi(l)$ (Polynomial time to solve $poly(g)$)
3. Convert the solution for $\phi(l)$ to a solution to $l$ (Polynomial time to solve $poly(g)$)
### Efficient Reduction
A reduction $\phi:l\to\phi(l)$ is efficient ($L\leq p(k)$) if for any $l\in L$:
1. $\phi(l)$ is computable from $l$ in polynomial ($|l|$) time.
2. Solution to $l$ is computable from solution of $\phi(l)$ in polynomial ($|l|$) time.
We call $L$ is **poly-time reducible** to $K$, or $L$ poly-time
reduces to $K$.
### Which problem is harder?
Theorem: If $L\leq p(k)$ and there is a polynomial time algorithm to solve $K$, then there is a polynomial time algorithm to solve $L$.
Proof: Given an instance of $l\in L$ If we can convert the problem in polynomial time with respect to the original problem $l$.
1. Compute $\phi(l)$: $p(l)$
2. Solve $\phi(l)$: $p(\phi(l))$
3. Convert solution: $p(\phi(l))$
Total time: $p(l)+p(\phi(l))+p(\phi(l))=p(l)+p(\phi(l))$
Need to show: $|\phi(l)|=poly(|l|)$
Proof:
Since we can convert $\phi(l)$ in $p(l)$ time, and on every time step, (constant step) we can only write constant amount of data.
So $|\phi(l)|=poly(|l|)$
## Hardness Problems
Reductions show the relationship between problem hardness!
Question: Could you solve a problem in polynomial time?
Easy: polynomial time solution
Hard: No polynomial time solution (as far as we know)
### Types of Problems
Decision Problem: Yes/No answer
Examples: Subset sums
1. Is the there a flow of size $F$
2. Is there a shortest path of length $L$ from vertex $u$ to vertex $v$.
3. Given a set of intercal, can you schedule $k$ of them.
Optimization Problem: What is the value of an optimal feasible solution of a problem?
- Minimization: Minimize cost
- min cut
- minimal spanning tree
- shortest path
- Maximization: Maximize profit
- interval scheduling
- maximum flow
- maximum matching
#### Canonical Decision Problem
Does the instance $l\in L$ (an optimization problem) have a feasible solution with objective value $k$:
Objective value $\geq k$ (maximization) $\leq k$ (minimization)
$DL$ is the reduced Canonical Decision problem $L$
##### Hardness of Canonical Decision Problems
Lemma 1: $DL\leq p(L)$ ($DL$ is no harder than $L$)
Proof: Assume $L$ **maximization** problem $DL(l)$: does have a solution $\geq k$.
Example: Does graph $G$ have flow $\geq k$.
Let $v^$ be the maximum objective on $l$ by solving $l$.
Let the instance of $DL:(l,k)$ and $l$ be the problem and $k$ be the objective
1. $l\to \phi(l)\in L$ (optimization problem) $\phi(l,k)=l$
2. Is $v^*(l)\geq k$? If so, return true, else return false.
Lemma 2: If $v^* =O(c^{|l|})$ for any constant $c$, then $L\leq p(DL)$.
Proof: First we could show $L\leq DL$. Suppose maximization problem, canonical decision problem is is there a solution $\geq k$.
Naïve Linear Search: Ask $DL(l,k)$, if returns false, ask $DL(l,k+1)$ until returns true
Runtime: At most $k$ search to iterate all possibilities.
This is exponential! How to reduce it?
Our old friend Binary (exponential) Search is back!
You gets a no at some value: try power of 2 until you get a no, then do binary search
\# questions: $=log_2(v^*(l))=poly(l)$
Binary search in area: from last yes to first no.
Runtime: Binary search ($O(n)=\log(v^*(l))$)
### Reduction for Algorithm Design vs Hardness
For problems $L,K$
If $K$ is “easy” (exists a poly-time solution), then $L$ is also easy.
If $L$ is “hard” (no poly-time solution), then $k$ is also hard.
Every problem that we worked on so far, $K$ is “easy”, so we reduce from new problem to known problem (e.g., max-flow).
#### Reduction for Hardness: Independent Set (ISET)
Input: Given an undirected graph $G = (V,E)$,
A subset of vertices $S\subset V$ is called an **independent set** if no two vertices of are connected by an edge.
Problem: Does $G$ contain an independent set of size $\geq k$?
$ISET(G,k)$ returns true if $G$ contains an independent set of size $\geq k$, and false otherwise.
Algorithm? NO! We think that this is a hard problem.
A lot of people have tried and could not find a poly-time solution
### Example: Vertex Cover (VC)
Input: Given an undirected graph $G = (V,E)$
A subset of vertices $C\subset V$ is called a **vertex cover** if contains at least one end point of every edge.
Formally, for all edges $(u,v)\in E$, either $u\in C$, or $v\in C$.
Problem: $VC(G,j)$ returns true if has a vertex cover of size $\leq j$, and false otherwise (minimization problem)
Example:
#### How hard is Vertex Cover?
Claim: $ISET\leq p(VC)$
Side Note: when we prove $VC$ is hard, we prove it is no easier than $ISET$.
DO NOT: $VC\leq p(ISET)$
Proof: Show that $G=(V,E)$ has an independent set of $k$ **if and only if** the same graph (not always!) has a vertex cover of size $|V|-k$.
Map:
$$
ISET(G,k)\to VC(g,|v|-k)
$$
$G'=G$
##### Proof of reduction: Direction 1
Claim 1: $ISET$ of size $k\to$ $VC$ of size $|V|-k$
Proof: Assume $G$ has an $ISET$ of size $k:S$, consider $C = V-S,|C|=|V|-k$
Claim: $C$ is a vertex cover
##### Proof of reduction: Direction 2
Claim 2: $VC$ of size $|V|-k\to ISET$ of size $k$
Proof: Assume $G$ has an $VC$ of size $|V| k:C$, consider $S = V C, |S| =k$
Claim: $S$ is an independent set
### What does poly-time mean?
Algorithm runs in time polynomial to input size.
- If the input has items, algorithm runs in $\Theta(n^c)$ for any constant is poly-time.
- Examples: intervals to schedule, number of integers to sort, # vertices + # edges in a graph
- Numerical Value (Integer $n$), what is the input size?
- Examples: weights, capacity, total time, flow constraints
- It is not straightforward!
### Real time complexity of F-F?
In class: $O(F( |V| + |E|))$
- $|V| + |E|$ = this much space to represent the graph
- $F$ : size of the maximum flow.
If every edge has capacity , then $F = O(CE)$
Running time:$O(C|E|(|V| + |E| )))$
### What is the actual input size?
Each edge ($|E|$ edges):
- 2 vertices: $|V|$ distinct symbol, $\log |V|$ bits per symbol
- 1 capacity: $\log C$
Size of graph:
- $O(|E|(|V| + \log C))$
- $p( |E| , |V| , \log C)$
Running time:
- $P( |E| , |V| , |C| )$
- Exponential if is exponential in $|V|+|E|$
### Pseudo-polynomial
Naïve Ford-Fulkerson is bad!
Problem s inputs contain some numerical values, say $|W|$. We need only log bits to store . If algorithms runs in $p(W)$, then it is exponential, or **pseudopolynomial**.
In homework, you improved F-F to make it work in
$p( |V| ,|E| , \log C)$, to make it a real polynomial algorithm.
## Conclusion: Reductions
- Reduction
- Construction of mapping with runtime
- Bidirectional proof
- Efficient Reduction $L\leq p(K)$
- Which problem is harder?
- If $L$ is hard, then $K$ is hard. $\to$ Used to show hardness
- If $K$ is easy, then $L$ is easy. $\to$ Used for design algorithms
- Canonical Decision Problem
- Reduction to and from the optimization problem
- Reduction for hardness
- Independent Set$leq p$ Vertex Cover
## On class
Reduction: $V^* = O(c^k)$
OPT: Find max flow of at least one instance $(G,s,t)$
DEC: Is there a flow of size $pK$, given $G,s,t \implies$ the instance is defined by the tuple $(G,s,t,k)$
Yes, if there exists one
No, otherwise
Forget about F-F and assume that you have an oracle that solves the decision problem.
First solution (the naive solution): iterate over $k = 1, 2, \dots$ until the oracle returns false and the last one returns true would be the max flow.
Time complexity: $K\cdot X$, where $X$ is the time complexity of the oracle
Input size: $poly(||V|,|E|, |E|log(max-capacity))$, and $V^* \leq \sum$ capacities
A better solution: do a binary search. If there is no upper bound, we use exponential binary search instead. Then,
$$
\begin{aligned}
log(V^*) &\leq X\cdot log(\sum capacities)\\
&\leq X\cdot log(|E|\cdot maxCapacity)\\
&\leq X\cdot (log(|E| + log(maxCapacity)))
\end{aligned}
$$
As $\log(maxCapacity)$ is linear in the size of the input, the running time is polynomial to the solution of the original problem.
Assume that ISET is a hard problem, i.e. we don't know of any polynomial time solution. We want to show that vertex cover is also a hard problem here:
$ISET \leq_{p} VC$
1. Given an instance of ISET, construct an instance of VC
2. Show that the construction can be done in polynomial time
3. Show that if the ISET instance is true than the CV instance is true
4. Show that if the VC instance is true then the ISET instance is true.
> ISET: given $(G,K)$, is there a set of vertices that do not share edges of size $K$
> VC: given $(G,K)$, is there a set of vertices that cover all edges of size $K$
1. Given $l: (G,K)$ being an instance of ISET, we construct $\phi(l): (G',K')$ as an instance of VC. $\phi(l): (G, |V|-K), \textup{i.e., } G' = G \cup K' = |V| - K$
2. It is obvious that it is a polynomial time construction since copying the graph is linear, in the size of the graph and the subtraction of integers is constant time.
**Direction 1**: ISET of size k $\implies$ VC of size $|V| - K$ Assume that ISET(G,K) returns true, show that $VC(G, |V|-K)$ returns true
Let $S$ be an independent set of size $K$ and $C = V-S$
We claim that $C$ is a vertex cover of size $|V|-K$
Proof:
We proceed by contradiction. Assume that $C$ is NOT a vertex cover, and it means that there is an edge $(u,v)$ such that $u\notin c , v\notin C$. And it implies that $u\in S , v\in S$, which contradicts with the assumption that S is an independent set.
Therefore, $c$ is an vertex cover
**Direction 2**: VC of size $|V|-K \implies$ ISET of size $K$
Let $C$ be a vertex cover of size $|V|-K$ , let $s = |v| - c$
We claim that $S$ is an independent set of size $K$.
Again, assume, for the sake of contradiction, that $S$ is not an independent set. And we get
$\exists (u,v) \textup{such that } u\in S, v \in S$
$u,v \notinC$
$C \textup{ is not a vertex cover}$
And this is a contradiction with our assumption.

287
pages/CSE347/CSE347_L6.md Normal file
View File

@@ -0,0 +1,287 @@
# Lecture 6
## NP-completeness
### $P$: Polynomial-time Solvable
$P$: Class of decision problems $L$ such that there is a polynomial-time algorithm that correctly answers yes or not for every instance $l\in L$.
Algorithm "$A$ decides $L$". If algorithm $A$ always correctly answers for any instance $l\in L$.
Example:
Is the number $n$ prime? Best algorithm so far: $O(\log^6 n)$, 2002
## Introduction to NP
- NP$\neq$ Non-polynomial (Non-deterministic polynomial time)
- Let $L$ be a decision problem.
- Let $l$ be an instance of the problem that the answer happens to be "yes".
- A **certificate** c(l) for $l$ is a "proof" that the answer for $l$ is true. [$l$ is a true instance]
- For canonical decision problems for optimization problems, the certificate is often a feasible solution for the corresponding optimization problem.
### Example of certificates
- Problem: Is there a path from $s$ to $t$
- Instance: graph $G(V,E),s,t$.
- Certificate: path from $s$ to $t$.
- Problem: Can I schedule $k$ intervals in the room so that they do not conflict.
- Instance: $l:(I,k)$
- Certificate: set of $k$ non-conflicting intervals.
- Problem: ISET
- Instance: $G(V,E),k$.
- Certificate: $k$ vertices with no edges between them.
If the answer to the problem is NO, you don't need to provide anything to prove that.
### Useful certificates
For a problem to be in NP, the problem need to have "useful" certificates. What is considered a good certificate?
- Easy to check
- Verifying algorithm which can check a YES answer and a certificate in $poly(l)$
- Not too long: [$poly(l)$]
### Verifier Algorithm
**Verifier algorithm** is one that takes an instance $l\in L$ and a certificate $c(l)$ and says yes if the certificate proves that $l$ is a true instance and false otherwise.
$V$ is a poly-time verifier for $L$ is it is a verifier and runs in $poly(|l|,|c|)$ time. (c=$poly(l)$)
- The runtime must be polynomial
- Must check **every** problem constraint
- Not always trivial
## Class NP
**NP:** A class of decision problems such that exists a certificate schema $c$ and a verifier algorithm $V$ such that:
1. certificate is $poly(l)$ in size.
2. $V:poly(l)$ in time.
**P:** is a class of problems that you can **solve** in polynomial time
**NP:** is a class of problems that you can **verify** TRUE instances in polynomial time given a poly-size certificate
**Millennium question**
$P\subseteq NP$? $NP\subseteq P$?
$P\subseteq NP$ is true.
Proof: Let $L$ be a problem in $P$, we want to show that there is a polynomial size certificate with a poly-time verifier.
There is an algorithm $A$ which solves $L$ in polynomial time.
**Certificate:** empty thing.
**Verifier:** $(l,c)$
1. Discard $c$.
2. Run $A$ on $l$ and return the answer.
Nobody knows the solution $NP\subseteq P$. Sad.
### Class of problem: NP complete
Informally: hardest problem in NP
Consider a problem $L$.
- We want to show if $L\subseteq P$, then $NP\subseteq P$
**NP-hard**: A decision problem $L$ is NP-hard if for any problem $K\in NP$, $K\leq_p L$.
$L$ is at least as hard as all the problems in NP. If we have an algorithm for $L$, we have an algorithm for any problem in NP with only polynomial time extra cost.
MindMap:
$K\implies L\implies sol(L)\implies sol(K)$
#### Lemma $P=NP$
Let $L$ be an NP-hard problem. If $L\in P$, then $P=NP$.
Proof:
Say $L$ has a poly-time solution, some problem $K$ in $NP$.
For any $K\in NP$, $NP\subset P$, $P\subset NP$, then $P=NP$.
**NP-complete:** $L$ is **NP-complete** if it is both NP-hard and $L\in NP$.
**NP-optimization:** $L$ is **NP-optimization** problem if the canonical decision problem is NP-complete.
**Claim:** If any NP-optimization problem have polynomial-time solution, then $P=NP$.
### Is $P=NP$?
- Answering this problem is hard.
- But for any NP-complete problem, if you could find a poly-time algorithm for $L$, then you would have answered this question.
- Therefore, finding a poly-time algorithm for $L$ is hard.
## NP-Complete problem
### Satisfiability (SAT)
Boolean Formulas:
A set of Boolean variables:
$x,y,a,b,c,w,z,...$ they take values true or false.
A boolean formula is a formula of Boolean variables with and, or and not.
Examples:
$\phi:x\land (\neg y \lor z)\land\neg(y\lor w)$
$x=1,y=0,z=1,w=0$, the formula is $1$.
**SAT:** given a formula $\phi$, is there a setting $M$ of variables such that the $\phi$ evaluates to True under this setting.
If there is such assignment, then $\phi$ is satisfiable. Otherwise, it is not.
Example: $x\land y\land \neg(x\lor y)$ is not satisfiable.
A seminar paper by Cook and Levin in 1970 showed that SAT is NP-complete.
1. SAT is in NP
Proof:
$\exists$ a certificate schema and a poly-time verifier.
$c$ satisfying assignment $M$ and $v$ check that $M$ makes $\phi$ true.
2. SAT is NP-hard. we can just accept it has a fact.
#### How to show a problem is NP-complete?
Say we have a problem $L$.
1. Show that $L\in NP$.
Exists certificate schema and verification algorithm in polynomial time.
2. Prove that we can reduce SAT to $L$. $SAT\leq_p L$ **(NOT $L\leq_p SAT$)**
Solving $L$ also solve SAT.
### CNF-SAT
**CNF:** Conjugate normal form of SAT
The formula $\phi$ must be an "and of ors"
$$
\phi=\land_{i=1}^n(\lor^{m_i}_{j=1}l_{i,j})
$$
$l_{i,j}$: clause
### 3-CNF-SAT
**3-CNF-SAT:** where every clauses has exactly 3 literals.
is NP complete [not all version of them are, 2-CNF-SAT is in P]
Input: 3-CNF expression with $n$ variables and $m$ clauses in the form:
number of total literals: $3m$
Output: An assignment of the $n$ variables such that at least one literal from each clauses evaluates to true.
Note:
1. One variable can be used to satisfy multiple clauses.
2. $x_i$ and $\neg x_i$ cannot both evaluate to true.
Example: ISET is NP-complete.
Proof:
Say we have a problem $L$
1. Show that $ISET\in NP$
Certificate: set of $k$ vertices: $|S|=k\in poly(g)$\
Verifier: checks that there are no edges between them $O(E k^2)$
2. ISET is NP-hard. We need to prove $3SAT\leq_p ISET$
- Construct a reduction from $3SAT$ to $ISET$.
- Show that $ISET$ is harder than $3SAT$.
We need to prove $\phi\in 3SAT$ is satisfiable if and only if the constructed $G$ has an $ISET$ of size $\geq k=m$
#### Reduction mapping construction
We construct an ISET instance from $3-SAT$.
Suppose the formula has $n$ variables and $m$ clauses
1. for each clause, we construct vertex for each literal and connect them (for $x\lor \neg y\lor z$, we connect $x,\neg y,z$ together)
2. then we connect all the literals with their negations (connects $x$ and $\neg x$)
$\implies$
If $\phi$ has a satisfiable assignment, then $G$ has an independent set of size $\geq m$,
For a set $S$ we pick exactly one true literal from every clause and take the corresponding vertex to that clause, $|S|=m$
Must also argue that $S$ is an independent set.
Example: picked a set of vertices $|S|=4$.
A literal has edges:
- To all literals in the same clause: We never pick two literals form the same clause.
- To its negation.
Since it is a satisfiable 3-SAT assignment, $x$ and $\neg x$ cannot both evaluate to true, those edges are not a problem, so $S$ is an independent set.
$\impliedby$
If $G$ has an independent set of size $\geq m$, then $\phi$ is satisfiable.
Say that $S$ is an independent set of $m$, we need to construct a satisfiable assignment for the original $\phi$.
- If $S$ contains a vertex corresponding to literal $x_i$, then set $x_i$ to true.
- If contains a vertex corresponding to literal $\neg x_i$, then set $x_i$ to false.
- Other variables can be set arbitrarily
Question: Is it a valid 3-SAT assignment?
Your ISET $S$ can contain at most $1$ vertex from each clause. Since vertices in a clause are connected by edges.
- Since $S$ contains $m$ vertices, it must contain exactly $1$ vertex from each clause.
- Therefore, we will make at least $1$ literals form each clause to be true.
- Therefore, all the clauses are true and $\phi$ is satisfied.
## Conclusion: NP-completeness
- Prove NP-Complete:
- If NP-optimization, convert to canonical decision problem
- Certificate, Verification algorithm
- Prove NP-hard: reduce from existing NP-Complete
problems
- 3-SAT Problem:
- Input, output, constraints
- A well-known NP-Complete problem
- Reduce from 3-SAT to ISET to show ISET is NP-Complete
## On class
### NP-complete
$p\in NP$, if we have a certificate schema and a verifier algorithm.
### NP-complete proof
#### P is in NP
what a certificate would looks like, show that if has a polynomial time o the problem size.
design a verifier algorithm that checks a certificate if it indeed prove tha the answer is YES and has a polynomial time complexity. Inputs: certificate and the problem input $poly(|l|,|c|)=poly(|p|)$
#### P is NP hard
select an already known NP-hard problem: eg. 3-SAT, ISET, VC,...
show that $3-SAT\leq_p p$
- present an algorithm that given any instance of 3-SAT (on the chosen NP hard problem) to an instance of $p$.
- show that the construction is done in polynomial time.
- show that if $p$'s instance answer is YES, then the instance of 3-SAT is YES.
- show that if 3-SAT's instance answer is YES then the instance of $p$ is YES.

310
pages/CSE347/CSE347_L7.md Normal file

File diff suppressed because one or more lines are too long

353
pages/CSE347/CSE347_L8.md Normal file
View File

@@ -0,0 +1,353 @@
# Lecture 8
## NP-optimization problem
Cannot be solved in polynomial time.
Example:
- Maximum independent set
- Minimum vertex cover
What can we do?
- solve small instances
- hard instances are rare - average case analysis
- solve special cases
- find an approximate solution
## Approximation algorithms
We find a "good" solution in polynomial time, but may not be optimal.
Example:
- Minimum vertex cover: we will find a small vertex cover, but not necessarily the smallest one.
- Maximum independent set: we will find a large independent set, but not necessarily the largest one.
Question: How do we quantify the quality of the solution?
### Approximation ratio
Intuition:
How good is an algorithm $A$ compared to an optimal solution in the worst case?
Definition:
Consider algorithm $A$ for an NP-optimization problem $L$. Say for **any** instance $l$, $A$ finds a solution output $c_A(l)$ and the optimal solution is $c^*(l)$.
Approximation ratio is either:
$$
\max_{l \in L} \frac{c_A(l)}{c^*(l)}=\alpha
$$
for maximization problems, or
$$
\min_{l \in L} \frac{c^A(l)}{c_*(l)}=\alpha
$$
for minimization problems.
Example:
Alice's Algorithm, $A$, finds a vertex cover of size $c_A(l)$ for instance $l(G)$. The optimal vertex cover has size $c^*(l)$.
We want approximation ratio to be as close to 1 as possible.
> Vertex cover:
>
> A vertex cover is a set of vertices that touches all edges.
Let's try an approximation algorithm for the vertex cover problem, called Greedy cover.
#### Greedy cover
Pick any uncovered edge, both its endpoints are added to the cover $C$, until all edges are covered.
Runtime: $O(m)$
Claim: Greedy cover is correct, and it finds a vertex cover.
Proof:
Algorithm only terminates when all edges are covered.
Claim: Greedy cover is a 2-approximation algorithm.
Proof:
Look at the two edges we picked.
Either it is covered by Greedy cover, or it is not.
If it is not covered by Greedy cover, then we will add both endpoints to the cover.
In worst case, Greedy cover will add both endpoints of each edge to the cover. (Consider the graph with disjoint edges.)
Thus, the size of the vertex cover found by Greedy cover is at most twice the size of the optimal vertex cover.
Thus, Greedy cover is a 2-approximation algorithm.
> Min-cut:
>
> Given a graph $G$ and two vertices $s$ and $t$, find the minimum cut between $s$ and $t$.
>
> Max-cut:
>
> Given a graph $G$, find the maximum cut.
#### Local cut
Algorithm:
- start with an arbitrary cut of $G$.
- While you can move a vertex from one side to the other side while increasing the size of the cut, do so.
- Return the cut found.
We will prove its:
- Runtime
- Feasibility
- Approximation ratio
##### Runtime for local cut
Since size of cut is at most $|E|$, the runtime is $O(m)$.
When we move a vertex from one side to the other side, the size of the cut increases by at least 1.
Thus, the algorithm terminates in at most $|V|$ steps.
So the runtime is $O(|E||V|^2)$.
##### Feasibility for local cut
The algorithm only terminates when no more vertices can be moved.
Thus, the cut found is a feasible solution.
##### Approximation ratio for local cut
This is a half-approximation algorithm.
We need to show that the size of the cut found is at least half of the size of the optimal cut.
We could first upper bound the size of the optimal cut is at most $|E|$.
We will then prove that solution we found is at least half of the optimal cut $\frac{|E|}{2}$ for any graph $G$.
Proof:
When we terminate, no vertex could be moved
Therefore, **The number of crossing edges is at least the number of non-crossing edges**.
Let $d(u)$ be the degree of vertex $u\in V$.
The total number of crossing edges for vertex $u$ is at least $\frac{1}{2}d(u)$.
Summing over all vertices, the total number of crossing edges is at least $\frac{1}{2}\sum_{u\in V}d(u)=\frac{1}{2}|E|$.
So the total number of non-crossing edges is at most $\frac{|E|}{2}$.
EOP
#### Set cover
Problem:
You are collecting a set of magic cards.
$X$ is the set of all possible cards. You want at least one of each card.
Each dealer $j$ has a pack $S_j\subseteq X$ of cards. You have to buy entire pack or none from dealer $j$.
Goal: What is the least number of packs you need to buy to get all cards?
Formally:
Input $X$ is a universe of $n$ elements, and a collection of subsets of $X$, $Y=\{S_1, S_2, \ldots, S_m\}\subseteq X$.
Goal: Pick $C\subseteq Y$ such that $\bigcup_{S_i\in C}S_i=X$, and $|C|$ is minimized.
Set cover is an NP-optimization problem. It is a generalization of the vertex cover problem.
#### Greedy set cover
Algorithm:
- Start with empty set $C$.
- While there is an element $x$ in $X$ that is not covered, pick one such element $x\in S_i$ where $S_i$ is the set that has not been picked before.
- Add $S_i$ to $C$.
- Return $C$.
```python
def greedy_set_cover(X, Y):
# X is the set of elements
# Y is the collection of sets, hashset by default
C = []
def non_covered_elements(X, C):
# return the elements in X that are not covered by C
# O(|X|)
return [x for x in X if not any(x in c for c in C)]
non_covered = non_covered_elements(X, C)
# O(|X|) every loop reduce the size of non_covered by 1
while non_covered:
max_cover,max_set = 0,None
# O(|Y|)
for S in Y:
# Intersection of two sets is O(min(|X|,|S|))
cur_cover = len(set(non_covered) & set(S))
if cur_cover > max_cover:
max_cover,max_set = cur_cover,S
C.append(max_set)
non_covered = non_covered_elements(X, C)
return C
```
It is not optimal.
Need to prove its:
- Correctness:
Keep picking until all elements are covered.
- Runtime:
$O(|X||Y|^2)$
- Approximation ratio:
##### Approximation ratio for greedy set cover
> Harmonic number:
>
> $H_n=\sum_{i=1}^n\frac{1}{i}=\frac{1}{1}+\frac{1}{2}+\frac{1}{3}+\cdots+\frac{1}{n}=\Theta(\log n)$
We claim that the size of the set cover found is at most $H_n\log n$ times the size of the optimal set cover.
###### First bound:
Proof:
If the optimal picks $k$ sets, then the size of the set cover found is at most $(1+\log n)k$ sets.
Let $n=|X|$.
Observe that
For the first round, the elements that we not covered is $n$.
$$
|U_0|=n
$$
In the second round, the elements that we not covered is at most $|U_0|-x$ where $x=|S_1|$ is the number of elements in the set picked in the first round.
$$
|U_1|=|U_0|-|S_1|
$$
...
So $x_i\geq \frac{|U_{i-1}|}{k}$.
We proceed by contradiction.
Suppose all sets in the optimal solution are $< \frac{|U_0|}{k}$. Then the sum of the sizes of the sets in the optimal solution is $< |U_0|=n$.
_There exists a least ratio of selection of sets determined by $k_i$. Otherwise the function (selecting the set cover) will not terminate (no such sets exists)_
> Some math magics:
> $$(1-\frac{1}{k})^k\leq \frac{1}{e}$$
So $n(1-\frac{1}{k})^{|C|-1}=1$, $|C|\leq 1+k\ln n$.
So the size of the set cover found is at most $(1+\ln n)k$.
EOP
So the greedy set cover is not too bad...
###### Second bound:
Greedy set cover is a $H_d$-approximation algorithm of set cover.
Proof:
Assign a cost to the elements of $X$ according to the decisions of the greedy set cover.
Let $\delta(S^i)$ be the new number of elements covered by set $S^i$.
$$
\delta(S^i)=|S_i\cap U_{i-1}|
$$
If the element $x$ is added by step $i$, when set $S_i$ is picked, then the cost of $x$ to
$$
\frac{1}{\delta(S^i)}=\frac{1}{x_i}
$$
Example:
$$
\begin{aligned}
X&=\{A,B,C,D,E,F,G\}\\
S_1&=\{A,C,E\}\\
S_2&=\{B,C,F,G\}\\
S_3&=\{B,D,F,G\}\\
S_4&=\{D,G\}
\end{aligned}
$$
First we select $S_2$, then $cost(B)=cost(C)=cost(F)=cost(G)=\frac{1}{4}$.
Then we select $S_1$, then $cost(A)=cost(E)=\frac{1}{2}$.
Then we select $S_3$, then $cost(D)=1$.
If element $x$ was covered by greedy set cover due to the addition of set $S^i$ at step $i$, then the cost of $x$ is $\frac{1}{\delta(S^i)}$.
$$
\textup{Total cost of GSC}=\sum_{x\in X}c(x)=\sum_{i=1}^{|C|}\sum_{X\textup{ covered at iteration }i}c(x)=\sum_{i=1}^{|C|}\delta(S^i)\frac{1}{\delta(S^i)}=|C|
$$
Claim: Consider any set $S$ that is a subset of $X$. The cost paid by the greedy set cover for $S$ is at most $H_{|S|}$.
Suppose that the greedy set covers $S$ in order $x_1,x_2,\ldots,x_{|S|}$, where $\{x_1,x_2,\ldots,x_{|S|}\}=S$.
When GSC covers $x_j$, $\{x_j,x_{j+1},\ldots,x_{|S|}\}$ are not covered.
At this point, the GSC has the option of picking $S$
This implies that the $\delta(S)$ is at least $|S|-j+1$.
Assume that $S$ is picked $\hat{S}$ for which $\delta(\hat{S})$ is maximized ($\hat{S}$ may be $S$ or other sets that have not covered $x_j$).
So, $\delta(\hat{S})\geq \delta(S)\geq |S|-j+1$.
So the cost of $x_j$ is $\delta(\hat{S})\leq \frac{1}{\delta(S)}\leq \frac{1}{|S|-j+1}$.
Summing over all $j$, the cost of $S$ is at most $\sum_{j=1}^{|S|}\frac{1}{|S|-j+1}=H_{|S|}$.
Back to the proof of approximation ratio:
Let $C^*$ be optimal set cover.
$$
|C|=\sum_{x\in X}c(x)\leq \sum_{S_j\in C^*}\sum_{x\in S_j}c(x)
$$
This inequality holds because of counting element that is covered by more than one set.
Since $\sum_{x\in S_j}c(x)\leq H_{|S_j|}$, by our claim.
Let $d$ be the largest cardinality of any set in $C^*$.
$$
|C|\leq \sum_{S_j\in C^*}H_{|S_j|}\leq \sum_{S_j\in C^*}H_d=H_d|C^*|
$$
So the approximation ratio for greedy set cover is $H_d$.
EOP

349
pages/CSE347/CSE347_L9.md Normal file
View File

@@ -0,0 +1,349 @@
# Lecture 9
## Randomized Algorithms
### Hashing
Hashing with chaining:
Input: We have integers in range $[1,n-1]=U$. We want to map them to a hash table $T$ with $m$ slots.
Hash function: $h:U\rightarrow [m]$
Goal: Hashing a set $S\subseteq U$, $|S|=n$ into $T$ such that the number of elements in each slot is at most $1$.
#### Collisions
When multiple keys are mapped to the same slot, we call it a collision, we keep a linked list of all the keys that map to the same slot.
**Runtime** of insert, query, delete of elements $=\Theta(\textup{length of the chain})$
**Worst-case** runtime of insert, query, delete of elements $=\Theta(n)$
Therefore, we want chains to be short, or $\Theta(1)$, as long as $|S|$ is reasonably sized, or equivalently, we want the number in any set $S$ to hash **uniformly** across all slots.
#### Simple Uniform Hashing Assumptions
The $n$ elements we want to hash (the set $S$) is picked uniformly at random from $U$. Therefore, we could see that this simple hash function works fine:
$$
h(x)=x\mod m
$$
Question: What happens if an adversary knows this function and designs $S$ to make the worst-case runtime happen?
Answer: The adversary can make the runtime of each operation $\Theta(n)$ by simply making all the elements hash to the same slot.
#### Randomization to the rescue
We don't want the adversary to know the hash function based on just looking at the code.
Idea: Randomize the choice of the hash function.
### Randomized Algorithm
#### Definition
A randomized algorithm is an algorithm the algorithm makes internal random choices.
2 kinds of randomized algorithms:
1. Las Vegas: The runtime is random, but the output is always correct.
2. Monte Carlo: The runtime is fixed, but the output is sometimes incorrect.
We will focus on Las Vegas algorithms in this course.
$$O(n)=E[T(n)]$$ or some other probabilistic quantity.
#### Randomization can help
Idea: Randomize the choice of hash function $h$ from a family of hash functions, $H$.
If we randomly pick a hash function from this family, then the probability that the hash function is bad on **any particular** set $S$ is small.
Intuitively, the adversary can not pick a bad input since most hash functions are good for any particular input $S$.
#### Universal Hashing: Goal
We want to design a universal family of hash functions, $H$, such that the probability that the hash table behaves badly on any input $S$ is small.
#### Universal Hashing: Definition
Suppose we have $m$ buckets in the hash table. We also have $2$ inputs $x\neq y$ and $x,y\in U$. We want $x$ and $y$ to be unlikely to hash to the same bucket.
$H$ is a universal **family** of hash functions if for any two elements $x\neq y$,
$$
Pr_{h\in H}[h(x)=h(y)]=\frac{1}{m}
$$
where $h$ is picked uniformly at random from the family $H$.
#### Universal Hashing: Analysis
Claim: If we choose $h$ randomly from a universal family of hash functions, $H$, then the hash table will exhibit good behavior on any set $S$ of size $n$ with high probability.
Question: What are some good properties and what does it mean by with high probability?
Claim: Given a universal family of hash functions, $H$, $S=\{a_1,a_2,\cdots,a_n\}\subset \mathbb{N}$. For any probability $0\leq \delta\leq 1$, if $n\leq \sqrt{4m\delta}$, the chance that no two keys hash to the same slot is $\geq1-\delta$.
Example: If we pick $\delta=\frac{1}{2}$. As long as $n<\sqrt{2m}$, the chance that no two keys hash to the same slot is $\geq\frac{1}{2}$.
If we pick $\delta=\frac{1}{3}$. As long as $n<\sqrt{\frac{4}{3}m}$, the chance that no two keys hash to the same slot is $\geq\frac{2}{3}$.
Proof Strategy:
1. Compute the **expected value** of collisions. Note that collisions occurs when two different values are hashed to the same slot. (Indicator random variables)
2. Apply a "tail" bound that converts the expected value to probability. (Markov's inequality)
##### Compute the expected number of collisions
Let $m$ be the size of the hash table. $n$ is the number of keys in the set $S$. $N$ is the size of the universe.
For inputs $x,y\in S,x\neq y$, we define a random variable
$$
C_{xy}=
\begin{cases}
1 & \text{if } h(x)=h(y) \\
0 & \text{otherwise}
\end{cases}
$$
$C_{xy}$ is called an indicator random variable, that takes value $0$ or $1$.
The expected number of collisions is
$$
E[C_{xy}]=1\times Pr[C_{xy}=1]+0\times Pr[C_{xy}=0]=Pr[C_{xy}=1]=\frac{1}{m}
$$
Define $C_x$: random variable that represents the cost of inserting/searching/deleting $x$ from the hash table.
$C_x\leq$ total number of elements that collide with $x$ (= number of elements $y$ such that $h(x)=h(y)$).
$$
C_x=\sum_{y\in S,y\neq x,h(x)=h(y)}1
$$
So, $C_x=\sum_{y\in S,y\neq x}C_{xy}$.
By linearity of expectation,
$$
E[C_x]=\sum_{y\in S,y\neq x}E[C_{xy}]=\sum_{y\in S,y\neq x}\frac{1}{m}=\frac{n-1}{m}
$$
$E[C]=\Theta(1)$ if $n=O(m)$. Total cost of $K$ insert/search operations is $O(k)$. by linearity of expectation.
Say $C$ is the total number of collisions.
$C=\frac{\sum_{x\in S}C_x}{2}$ because each collision is counted twice.
$$
E[C]=\frac{1}{2}\sum_{x\in S}E[C_x]=\frac{1}{2}\sum_{x\in S}\frac{n-1}{m}=\frac{n(n-1)}{2m}
$$
If we want $E[C]\leq \delta$, then we need $n=\sqrt{2m\delta}$.
#### The probability of no collisions $C=0$
We know that the expected value of number of collisions is now $\leq \delta$, but what about the probability of **NO** collisions?
> Markov's inequality: $$P[X\geq k]\leq\frac{E[X]}{k}$$
> For non-negative random variable $X$, $Pr[X\geq k\cdot E[X]]\leq \frac{1}{k}$.
Use Markov's inequality: For non-negative random variable $X$, $Pr[X\geq k\cdot E[X]]\leq \frac{1}{k}$.
Apply this to $C$:
$$
Pr[C\geq \frac{1}{\delta}E[C]]<\delta\Rightarrow Pr[C\geq 1]<\delta
$$
So, if we want $Pr[C=0]>1-\delta$, $n<\sqrt{2m\delta}$ with probability $1-\delta$, you will have no collisions.
#### More general conclusion
Claim: For a universal hash function family $H$, if number of keys $n\leq \sqrt{Bm\delta}$, then the probability that at most $B+1$ keys hash to the same slot is $> 1-\delta$.
### Example: Quicksort
Based on partitioning [assume all elements are distinct]: Partition($A[p\cdots r]$)
- Rearranges $A$ into $A[p\cdots q-1],A[q],A[q+1\cdots r]$
Runtime: $O(r-p)$, linear time.
```python
def partition(A,p,r):
x=A[r]
lo=p
for i in range(p,r):
if A[i]<x:
A[lo],A[i]=A[i],A[lo]
lo+=1
A[lo],A[r]=A[r],A[lo]
return lo
def quicksort(A,p,r):
if p<r:
q=partition(A,p,r)
quicksort(A,p,q-1)
quicksort(A,q+1,r)
```
#### Runtime analysis
Let the number of element in $A_{low}$ be $k$.
$$
T(n)=\Theta(n)+T(k)+T(n-k-1)
$$
By even split assumption, $k=\frac{n}{2}$.
$$
T(n)=T(\frac{n}{2})+T(\frac{n}{2}-1)+\Theta(n)\approx \Theta(n\log n)
$$
Which is approximately the same as merge sort.
_Average case analysis is always suspicious._
### Randomized Quicksort
- Pick a random pivot element.
- Analyze the expected runtime. over the random choices of pivot.
```python
def randomized_partition(A,p,r):
ix=random.randint(p,r)
x=A[ix]
A[r],A[ix]=A[ix],A[r]
lo=p
for i in range(p,r):
if A[i]<x:
A[lo],A[i]=A[i],A[lo]
lo+=1
A[lo],A[r]=A[r],A[lo]
return lo
def randomized_quicksort(A,p,r):
if p<r:
q=randomized_partition(A,p,r)
randomized_quicksort(A,p,q-1)
randomized_quicksort(A,q+1,r)
```
$$
E[T(n)]=E(T(n-k-1)+T(k)+cn)=E(T(n-k-1))+E(T(k))+cn
$$
by linearity of expectation.
$$
Pr[\textup{pivot has rank }k]=\frac{1}{n}
$$
So,
$$
\begin{aligned}
E[T(n)]&=\frac{1}{n}\sum_{k=0}^{n-1}(E[T(k)]+E[T(n-k-1)])+cn\\
&=cn+\sum_{k=0}^{n-1}Pr[n-k-1=j]T(j)+\sum_{k=0}^{n-1}Pr[k=j]T(j)\\
&=cn+\sum_{k=0}^{n-1}\frac{1}{n}T(j)+\sum_{k=0}^{n-1}\frac{1}{n}T(j)\\
&=cn+\frac{2}{n}\sum_{k=0}^{n-1}T(j)
\end{aligned}
$$
Claim: the solution to this recurrence is $E[T(n)]=O(n\log n)$ or $T(n)=c'n\log n+1$.
Proof:
We prove by induction.
Base case: $n=1,T(n)=T(1)=c$
Inductive step: Assume that $T(k)=c'k\log k+1$ for all $k<n$.
Then,
$$
\begin{aligned}
T(n)&=cn+\frac{2}{n}\sum_{k=0}^{n-1}T(k)\\
&=cn+\frac{2}{n}\sum_{k=0}^{n-1}(c'k\log k+1)\\
&=cn+\frac{2c'}{n}\sum_{k=0}^{n-1}k\log k+\frac{2}{n}\sum_{k=0}^{n-1}1
\end{aligned}
$$
Then we use the fact that $\sum_{k=0}^{n-1}k\log k\leq \frac{n^2\log n}{2}-\frac{n^2}{8}$ (can be proved by induction).
$$
\begin{aligned}
T(n)&=cn+\frac{2c'}{n}\left(\frac{n^2\log n}{2}-\frac{n^2}{8}\right)+\frac{2}{n}n\\
&=c'n\log n-\frac{1}{4}c'n+cn+2\\
&=(c'n\log n+1)-\left(\frac{1}{4}c'n-cn-1\right)
\end{aligned}
$$
We need to prove that $\frac{1}{4}c'n-cn-1\geq 0$.
Choose $c'$ and $c$ such that $\frac{1}{4}c'n\geq cn+1$ for all $n\geq 2$.
If $c'\geq 8c$, then $T(n)\leq c'n\log n+1$.
$E[T(n)]\leq c'n\log n+1=O(n\log n)$
EOP
A more elegant proof:
Let $X_{ij}$ be an indicator random variable that is $1$ if element of rank $i$ is compared to element of rank $j$.
Running time: $$X=\sum_{i=0}^{n-2}\sum_{j=i+1}^{n-1}X_{ij}$$
So, the expected number of comparisons is
$$
E[X_{ij}]=Pr[X_{ij}=1]\times 1+Pr[X_{ij}=0]\times 0=Pr[X_{ij}=1]
$$
This is equivalent to the expected number of comparisons in randomized quicksort.
The expected number of running time is
$$
\begin{aligned}
E[X]&=E[\sum_{i=0}^{n-2}\sum_{j=i+1}^{n-1}X_{ij}]\\
&=\sum_{i=0}^{n-2}\sum_{j=i+1}^{n-1}E[X_{ij}]\\
&=\sum_{i=0}^{n-2}\sum_{j=i+1}^{n-1}Pr[X_{ij}=1]
\end{aligned}
$$
For any two elements $z_i,z_j\in S$, the probability that $z_i$ is compared to $z_j$ is (either $z_i$ or $z_j$ is picked first as the pivot before the any elements of the ranks larger than $i$ and less than $j$)
$$
\begin{aligned}
Pr[X_{ij}=1]&=Pr[z_i\text{ is picked first}]+Pr[z_j\text{ is picked first}]\\
&=\frac{1}{j-i+1}+\frac{1}{j-i+1}\\
&=\frac{2}{j-i+1}
\end{aligned}
$$
So, with harmonic number, $H_n=\sum_{k=1}^{n}\frac{1}{k}$,
$$
\begin{aligned}
E[X]&=\sum_{i=0}^{n-2}\sum_{j=i+1}^{n-1}\frac{2}{j-i+1}\\
&\leq 2\sum_{i=0}^{n-2}\sum_{k=1}^{n-i-1}\frac{1}{k}\\
&\leq 2\sum_{i=0}^{n-2}c\log(n)\\
&=2c\log(n)\sum_{i=0}^{n-2}1\\
&=\Theta(n\log n)
\end{aligned}
$$
EOP

View File

@@ -0,0 +1,34 @@
# Exam 1 review
## Greedy
A Greedy Algorithm is an algorithm whose solution applies the same choice rule at each step over and over until no more choices can be made.
- Stating and Proving a Greedy Algorithm
- State your algorithm (“at this step, make this choice”)
- Greedy Choice Property (Exchange Argument)
- Inductive Structure
- Optimal Substructure
- "Simple Induction"
- Asymptotic Runtime
## Divide and conquer
Stating and Proving a Dividing and Conquer Algorithm
- Describe the divide, conquer, and combine steps of your algorithm.
- The combine step is the most important part of a divide and conquer algorithm, and in your recurrence this step is the "f (n)", or work done at each subproblem level. You need to show that you can combine the results of your subproblems somehow to get the solution for the entire problem.
- Provide and prove a base case (when you can divide no longer)
- Prove your induction step: suppose subproblems (two problems of size n/2, usually) of the same kind are solved optimally. Then, because of the combine step, the overall problem (of size n) will be solved optimally.
- Provide recurrence and solve for its runtime (Master Method)
## Maximum Flow
Given a weighted directed acyclic graph with a source and a sink node, the goal is to see how much "flow" you can push from the source to the sink simultaneously.
Finding the maximum flow can be solved by the Ford-Fulkerson Algorithm. Runtime (from lecture slides): $O(F (|V | + |E |))$.
Fattest Path improvement: $O(log |V |(|V | + |E |))$
Min Cut-Max Flow: the maximum flow from source $s$ to sink $t$ is equal to the minimum sum of an $s-t$ cut.
A cut is a partition of a graph into two disjoint sets by removing edges connecting the two parts. An $s-t$ cut will put $s$ and $t$ into the different sets.

29
pages/CSE347/_meta.js Normal file
View File

@@ -0,0 +1,29 @@
export default {
Exam_reviews: "Exam reviews",
CSE347_L1: "Lecture 1",
CSE347_L2: "Lecture 2",
CSE347_L3: "Lecture 3",
CSE347_L4: "Lecture 4",
CSE347_L5: "Lecture 5",
CSE347_L6: "Lecture 6",
CSE347_L7: "Lecture 7",
CSE347_L8: "Lecture 8",
CSE347_L9: "Lecture 9",
CSE347_L10: "Lecture 10",
CSE347_L11: "Lecture 11",
CSE347_L12: {
display: 'hidden'
},
CSE347_L13: {
display: 'hidden'
},
CSE347_L14: {
display: 'hidden'
},
CSE347_L15: {
display: 'hidden'
},
index: {
display: 'hidden'
}
}

1
pages/CSE347/index.mdx Normal file
View File

@@ -0,0 +1 @@
# Test

View File

@@ -0,0 +1,13 @@
course_code=input('We will follow the naming pattern of {class}_L{lecture number}.md, enter the course code to start.\n')
start=input('enter the number of lecture that you are going to start.\n')
end=input('Enter the end of lecture (exclusive).\n')
start=int(start)
end=int(end)
while start<end:
# create a empty text file
fp = open(f'{course_code}_L{start}.md', 'w')
fp.write(f'# Lecture {start}')
fp.close()
start+=1
print("Complete")