9.3 KiB
Lecture 8
NP-optimization problem
Cannot be solved in polynomial time.
Example:
- Maximum independent set
- Minimum vertex cover
What can we do?
- solve small instances
- hard instances are rare - average case analysis
- solve special cases
- find an approximate solution
Approximation algorithms
We find a "good" solution in polynomial time, but may not be optimal.
Example:
- Minimum vertex cover: we will find a small vertex cover, but not necessarily the smallest one.
- Maximum independent set: we will find a large independent set, but not necessarily the largest one.
Question: How do we quantify the quality of the solution?
Approximation ratio
Intuition:
How good is an algorithm A compared to an optimal solution in the worst case?
Definition:
Consider algorithm A for an NP-optimization problem L. Say for any instance l, A finds a solution output c_A(l) and the optimal solution is c^*(l).
Approximation ratio is either:
\max_{l \in L} \frac{c_A(l)}{c^*(l)}=\alpha
for maximization problems, or
\min_{l \in L} \frac{c^A(l)}{c_*(l)}=\alpha
for minimization problems.
Example:
Alice's Algorithm, A, finds a vertex cover of size c_A(l) for instance l(G). The optimal vertex cover has size c^*(l).
We want approximation ratio to be as close to 1 as possible.
Vertex cover:
A vertex cover is a set of vertices that touches all edges.
Let's try an approximation algorithm for the vertex cover problem, called Greedy cover.
Greedy cover
Pick any uncovered edge, both its endpoints are added to the cover C, until all edges are covered.
Runtime: O(m)
Claim: Greedy cover is correct, and it finds a vertex cover.
Proof:
Algorithm only terminates when all edges are covered.
Claim: Greedy cover is a 2-approximation algorithm.
Proof:
Look at the two edges we picked.
Either it is covered by Greedy cover, or it is not.
If it is not covered by Greedy cover, then we will add both endpoints to the cover.
In worst case, Greedy cover will add both endpoints of each edge to the cover. (Consider the graph with disjoint edges.)
Thus, the size of the vertex cover found by Greedy cover is at most twice the size of the optimal vertex cover.
Thus, Greedy cover is a 2-approximation algorithm.
Min-cut:
Given a graph
Gand two verticessandt, find the minimum cut betweensandt.Max-cut:
Given a graph
G, find the maximum cut.
Local cut
Algorithm:
- start with an arbitrary cut of
G. - While you can move a vertex from one side to the other side while increasing the size of the cut, do so.
- Return the cut found.
We will prove its:
- Runtime
- Feasibility
- Approximation ratio
Runtime for local cut
Since size of cut is at most |E|, the runtime is O(m).
When we move a vertex from one side to the other side, the size of the cut increases by at least 1.
Thus, the algorithm terminates in at most |V| steps.
So the runtime is O(|E||V|^2).
Feasibility for local cut
The algorithm only terminates when no more vertices can be moved.
Thus, the cut found is a feasible solution.
Approximation ratio for local cut
This is a half-approximation algorithm.
We need to show that the size of the cut found is at least half of the size of the optimal cut.
We could first upper bound the size of the optimal cut is at most |E|.
We will then prove that solution we found is at least half of the optimal cut \frac{|E|}{2} for any graph G.
Proof:
When we terminate, no vertex could be moved
Therefore, The number of crossing edges is at least the number of non-crossing edges.
Let d(u) be the degree of vertex u\in V.
The total number of crossing edges for vertex u is at least \frac{1}{2}d(u).
Summing over all vertices, the total number of crossing edges is at least \frac{1}{2}\sum_{u\in V}d(u)=\frac{1}{2}|E|.
So the total number of non-crossing edges is at most \frac{|E|}{2}.
QED
Set cover
Problem:
You are collecting a set of magic cards.
X is the set of all possible cards. You want at least one of each card.
Each dealer j has a pack S_j\subseteq X of cards. You have to buy entire pack or none from dealer j.
Goal: What is the least number of packs you need to buy to get all cards?
Formally:
Input X is a universe of n elements, and a collection of subsets of X, Y=\{S_1, S_2, \ldots, S_m\}\subseteq X.
Goal: Pick C\subseteq Y such that \bigcup_{S_i\in C}S_i=X, and |C| is minimized.
Set cover is an NP-optimization problem. It is a generalization of the vertex cover problem.
Greedy set cover
Algorithm:
- Start with empty set
C. - While there is an element
xinXthat is not covered, pick one such elementx\in S_iwhereS_iis the set that has not been picked before. - Add
S_itoC. - Return
C.
def greedy_set_cover(X, Y):
# X is the set of elements
# Y is the collection of sets, hashset by default
C = []
def non_covered_elements(X, C):
# return the elements in X that are not covered by C
# O(|X|)
return [x for x in X if not any(x in c for c in C)]
non_covered = non_covered_elements(X, C)
# O(|X|) every loop reduce the size of non_covered by 1
while non_covered:
max_cover,max_set = 0,None
# O(|Y|)
for S in Y:
# Intersection of two sets is O(min(|X|,|S|))
cur_cover = len(set(non_covered) & set(S))
if cur_cover > max_cover:
max_cover,max_set = cur_cover,S
C.append(max_set)
non_covered = non_covered_elements(X, C)
return C
It is not optimal.
Need to prove its:
- Correctness:
Keep picking until all elements are covered. - Runtime:
O(|X||Y|^2) - Approximation ratio:
Approximation ratio for greedy set cover
Harmonic number:
H_n=\sum_{i=1}^n\frac{1}{i}=\frac{1}{1}+\frac{1}{2}+\frac{1}{3}+\cdots+\frac{1}{n}=\Theta(\log n)
We claim that the size of the set cover found is at most H_n\log n times the size of the optimal set cover.
First bound:
Proof:
If the optimal picks k sets, then the size of the set cover found is at most (1+\log n)k sets.
Let n=|X|.
Observe that
For the first round, the elements that we not covered is n.
|U_0|=n
In the second round, the elements that we not covered is at most |U_0|-x where x=|S_1| is the number of elements in the set picked in the first round.
|U_1|=|U_0|-|S_1|
...
So x_i\geq \frac{|U_{i-1}|}{k}.
We proceed by contradiction.
Suppose all sets in the optimal solution are < \frac{|U_0|}{k}. Then the sum of the sizes of the sets in the optimal solution is < |U_0|=n.
There exists a least ratio of selection of sets determined by k_i. Otherwise the function (selecting the set cover) will not terminate (no such sets exists)
Some math magics:
(1-\frac{1}{k})^k\leq \frac{1}{e}
So n(1-\frac{1}{k})^{|C|-1}=1, |C|\leq 1+k\ln n.
So the size of the set cover found is at most (1+\ln n)k.
QED
So the greedy set cover is not too bad...
Second bound:
Greedy set cover is a $H_d$-approximation algorithm of set cover.
Proof:
Assign a cost to the elements of X according to the decisions of the greedy set cover.
Let \delta(S^i) be the new number of elements covered by set S^i.
\delta(S^i)=|S_i\cap U_{i-1}|
If the element x is added by step i, when set S_i is picked, then the cost of x to
\frac{1}{\delta(S^i)}=\frac{1}{x_i}
Example:
\begin{aligned}
X&=\{A,B,C,D,E,F,G\}\\
S_1&=\{A,C,E\}\\
S_2&=\{B,C,F,G\}\\
S_3&=\{B,D,F,G\}\\
S_4&=\{D,G\}
\end{aligned}
First we select S_2, then cost(B)=cost(C)=cost(F)=cost(G)=\frac{1}{4}.
Then we select S_1, then cost(A)=cost(E)=\frac{1}{2}.
Then we select S_3, then cost(D)=1.
If element x was covered by greedy set cover due to the addition of set S^i at step i, then the cost of x is \frac{1}{\delta(S^i)}.
\textup{Total cost of GSC}=\sum_{x\in X}c(x)=\sum_{i=1}^{|C|}\sum_{X\textup{ covered at iteration }i}c(x)=\sum_{i=1}^{|C|}\delta(S^i)\frac{1}{\delta(S^i)}=|C|
Claim: Consider any set S that is a subset of X. The cost paid by the greedy set cover for S is at most H_{|S|}.
Suppose that the greedy set covers S in order x_1,x_2,\ldots,x_{|S|}, where \{x_1,x_2,\ldots,x_{|S|}\}=S.
When GSC covers x_j, \{x_j,x_{j+1},\ldots,x_{|S|}\} are not covered.
At this point, the GSC has the option of picking S
This implies that the \delta(S) is at least |S|-j+1.
Assume that S is picked \hat{S} for which \delta(\hat{S}) is maximized (\hat{S} may be S or other sets that have not covered x_j).
So, \delta(\hat{S})\geq \delta(S)\geq |S|-j+1.
So the cost of x_j is \delta(\hat{S})\leq \frac{1}{\delta(S)}\leq \frac{1}{|S|-j+1}.
Summing over all j, the cost of S is at most \sum_{j=1}^{|S|}\frac{1}{|S|-j+1}=H_{|S|}.
Back to the proof of approximation ratio:
Let C^* be optimal set cover.
|C|=\sum_{x\in X}c(x)\leq \sum_{S_j\in C^*}\sum_{x\in S_j}c(x)
This inequality holds because of counting element that is covered by more than one set.
Since \sum_{x\in S_j}c(x)\leq H_{|S_j|}, by our claim.
Let d be the largest cardinality of any set in C^*.
|C|\leq \sum_{S_j\in C^*}H_{|S_j|}\leq \sum_{S_j\in C^*}H_d=H_d|C^*|
So the approximation ratio for greedy set cover is H_d.
QED