9.9 KiB
CSE347 Analysis of Algorithms (Lecture 1)
Greedy Algorithms
- Builds up a solution by making a series of small decisions that optimize some objective.
- Make one irrevocable choice at a time, creating smaller and smaller sub-problems of the same kind as the original problem.
- There are many potential greedy strategies and picking the right one can be challenging.
A Scheduling Problem
You manage a giant space telescope.
- There are
nresearch projects that want to use it to make observations. - Only one project can use the telescope at a time.
- Project
p_ineeds the telescope starting at times_iand running for a length of timet_i. - Goal: schedule as many as possible
Formally
Input:
- Given a set
Pof projects,|P|=n - Each request
p_i\in Poccupies interval[s_i,f_i), wheref_i=s_i+t_i
Goal: Choose a subset \Pi\sqsubseteq P such that
- No two projects in
\Pihave overlapping intervals. - The number of selected projects
|\Pi|is maximized.
Shortest Interval
Counter-example: [1,10],[9,12],[11,20]
Earliest start time
Counter-example: [1,10],[2,3],[4,5]
Fewest Conflicts
Counter-example: [1,2],[1,4],[1,4],[3,6],[7,8],[5,8],[5,8]
Earliest finish time
Correct... but why
Theorem of Greedy Strategy (Earliest Finishing Time)
Say this greedy strategy (Earliest Finishing Time) picks a set \Pi of intervals, some other strategy picks a set O of intervals.
Assume sorted by finishing time
\Pi=\{i_1,i_2,...,i_k\},|\Pi|=kO=\{j_1,j_2,...,j_m\},|O|=m
We want to show that |\Pi|\geq|O|,k>m
Lemma: For all r<k,f_{i_r}\leq f_{j_r}
We proceed the proof by induction.
-
Base Case, when r=1.
\Piis the earliest finish time, andOcannot pick a interval with earlier finish time, sof_{i_r}\leq f_{j_r} -
Inductive step, when r>1. Since
\Pi_ris the earliest finish time, so for any set inO_r,f_{i_{r-1}}\leq f_{j_{r-1}}, for anyj_rinserted toO_r, it can also be inserted to\Pi_r. SoO_rcannot pick an interval with earlier finish time thanPisince it will also be picked by definition ifO_ris the optimal solutionOPT.
Problem of “Greedy Stays Ahead” Proof
- Every problem has very different theorem.
- It can be challenging to even write down the correct statement that you must prove.
- We want a systematic approach to prove the correctness of greedy algorithms.
Road Map to Prove Greedy Algorithm
1. Make a Choice
Pick an interval based on greedy choice, say q
Proof: Greedy Choice Property: Show that using our first choice is not "fatal" – at least one optimal solution makes this choice.
Techniques: Exchange Argument: "If an optimal solution does not choose q, we can turn it into an equally good solution that does."
Let \Pi^* be any optimal solution for project set P.
- If
q\in \Pi^*, we are done. - Otherwise, let
xbe the optimal solution from\Pi^*that does not pickq. We create another solution\bar{\Pi^*}that replacexwithq, and prove that the\bar{\Pi^*}is as optimal as\Pi^*
2. Create a smaller instance P' of the original problem
P' has the same optimization criteria.
Proof: Inductive Structure: Show that after making the first choice, we're left with a smaller version of the same problem, whose solution we can safely combine with the first choice.
Let P' be the subproblem left after making first choice q in problem P and let \Pi' be an optimal solution to P'. Then \Pi=\Pi^*\cup\{q\} is an optimal solution to P.
$P'=P-{q}-{$projects conflicting with q\}
3. Solution: Union of choices that we made
Union of choices that we made.
Proof: Optimal Substructure: Show that if we solve the subproblem optimally, adding our first choice creates an optimal solution to the whole problem.
Let q be the first choice, P' be the subproblem left after making q in problem P, \Pi' be an optimal solution to P'. We claim that \Pi=\Pi'\cup \{q\} is an optimal solution to P.
We proceed the proof by contradiction.
Assume that \Pi=\Pi'+\{q\} is not optimal.
By Greedy choice property GCP. we already know that \exists an optimal solution \Pi^* for problem P that contains q. If \Pi is not optimal, cost(\Pi^*)<cost(\Pi). Then since \Pi^*-q is also a feasible solution to P'. cost(\Pi^*-q)>cost(\Pi-q)=\Pi' which leads to contradiction that \Pi' is an optimal solution to P'.
4. Put 1-3 together to write an inductive proof of the Theorem
This is independent of problem, same for every problem.
Use scheduling problem as an example:
Theorem: given a scheduling problem P, if we repeatedly choose the remaining feasible project with the earliest finishing time, we will construct an optimal feasible solution to P.
Proof: We proceed by induction on |P|. (based on the size of problem P).
- Base case:
|P|=1. - Inductive step.
- Inductive hypothesis: For all problems of size
<n, earliest finishing time (EFT) gives us an optimal solution. - EFT is optimal for problem of size
n. - Proof: Once we pick q, because of greedy choice. $P'=P={q} -{$interval that conflict with
q\}.|P'|<n, By Inductive hypothesis, EFT gives us an optimal solution toP', but by inductive substructure, and optimal substructure.\Pi'(optimal solution toP'), we have optimal solution toP.
- Inductive hypothesis: For all problems of size
this step always holds as long as the previous three properties hold, and we don't usually write the whole proof.
# Algorithm construction for Interval scheduling problem
def schedule(p):
# sorting takes O(n)=nlogn
p=sorted(p,key=lambda x:x[1])
res=[P[0]]
# O(n)=n
for i in p[1:]:
if res[-1][-1]<i[0]:
res.append(i)
return res
Extra Examples:
File compression problem
You have n files of different sizes f_i.
You want to merge them to create a single file. merge(f_i,f_j) takes time f_i+f_j and creates a file of size f_k=f_i+f_j.
Goal: Find the order of merges such that the total time to merge is minimized.
Thinking process: The merge process is a binary tree and each of the file is the leaf of the tree.
The total time required =\sum^n_{i=1} d_if_i, where d_i is the depth of the file in the compression tree.
So compressing the smaller file first may yield a faster run time.
Proof:
Greedy Choice Property
Construct part of the solution by making a locally good decision.
Lemma: \exist some optimal solution that merges the two smallest file first, lets say [f_1,f_2]
Proof: Exchange argument
- Case 1: Optimal choice already merges
f_1,f_2, done. Time order does not matter in this problem at some point.- eg: [2,2,3], merge 2,3 and 2,2 first don't change the total cost
- Case 2: Optimal choice does not merges
f_1andf_2.- Suppose the optimal solution merges
f_x,f_yas the deepest merge. - Then
d_x\geq d_1,d_y\geq d_2. Exchangingf_1,f_2withf_x,f_ywould yield a strictly less greater solution sincef_1,f_2already smallest.
- Suppose the optimal solution merges
Inductive Structure
- We can combine feasible solution to the subproblem
P'with the greedy choice to get a feasible solution toP - After making greedy choice
q, we are left with a strictly smaller subproblemP'with the same optimality criteria of the original problem
Proof: Optimal Substructure: Show that if we solve the subproblem optimally, adding our first choice creates an optimal solution to the whole problem.
Let q be the first choice, P' be the subproblem left after making q in problem P, \Pi^* be an optimal solution to P'. We claim that \Pi=\Pi'\cup \{q\} is an optimal solution to P.
We proceed the proof by contradiction.
Assume that \Pi=\Pi^*+\{q\} is not optimal.
By Greedy choice property GCP. we already know that \Pi^* is optimal solution that contains q. Then |\Pi^*|>|\Pi| \Pi^*-q is also feasible solution to P'. |\Pi^*-q|>|\Pi-q|=\Pi' which is an optimal solution to P' which leads to contradiction.
Proof: Smaller problem size
After merging the smallest two files into one, we have strictly less files waiting to merge.
Optimal Substructure
- We can combine optimal solution to the subproblem
P'with the greedy choice to get a optimal solution toP
Step 4 ignored, same for all greedy problems.
Conclusion: Greedy Algorithm
- Algorithm
- Runtime Complexity
- Proof
- Greedy Choice Property
- Construct part of the solution by making a locally good decision.
- Inductive Structure
- We can combine feasible solution to the subproblem
P'with the greedy choice to get a feasible solution toP - After making greedy choice
q, we are left with a strictly smaller subproblemP'with the same optimality criteria of the original problem
- We can combine feasible solution to the subproblem
- Optimal Substructure
- We can combine optimal solution to the subproblem
P'with the greedy choice to get a optimal solution toP
- We can combine optimal solution to the subproblem
- Greedy Choice Property
- Standard Contradiction Argument simplifies it
Review:
Essence of master method
Let a\geq 1 and b>1 be constants, let f(n) be a function, and let T(n) be defined on the nonnegative integers by the recurrence
T(n)=aT(\frac{n}{b})+f(n)
where we interpret n/b to mean either ceiling or floor of n/b. c_{crit}=\log_b a Then T(n) has to following asymptotic bounds.
-
Case I: if
f(n) = O(n^{c})(f(n)"dominates"n^{\log_b a-c}) wherec<c_{crit}, thenT(n) = \Theta(n^{c_{crit}}) -
Case II: if
f(n) = \Theta(n^{c_{crit}}), (f(n), n^{\log_b a-c}have no dominate) thenT(n) = \Theta(n^{\log_b a} \log_2 n)Extension for
f(n)=\Theta(n^{critical\_value}*(\log n)^k)-
if
k>-1T(n)=\Theta(n^{critical\_value}*(\log n)^{k+1}) -
if
k=-1T(n)=\Theta(n^{critical\_value}*\log \log n) -
if
k<-1T(n)=\Theta(n^{critical\_value})
-
-
Case III: if
f(n) = \Omega(n^{log_b a+c})(n^{log_b a-c}"dominates"f(n)) for some constantc >0, and if af(n/b)<= c f(n)for some constantc <1then for all sufficiently largen,T(n) = \Theta(n^{log_b a+c})