# CSE347 Analysis of Algorithms (Lecture 1) ## Greedy Algorithms * Builds up a solution by making a series of small decisions that optimize some objective. * Make one irrevocable choice at a time, creating smaller and smaller sub-problems of the same kind as the original problem. * There are many potential greedy strategies and picking the right one can be challenging. ### A Scheduling Problem You manage a giant space telescope. * There are $n$ research projects that want to use it to make observations. * Only one project can use the telescope at a time. * Project $p_i$ needs the telescope starting at time $s_i$ and running for a length of time $t_i$. * Goal: schedule as many as possible Formally Input: * Given a set $P$ of projects, $|P|=n$ * Each request $p_i\in P$ occupies interval $[s_i,f_i)$, where $f_i=s_i+t_i$ Goal: Choose a subset $\Pi\sqsubseteq P$ such that 1. No two projects in $\Pi$ have overlapping intervals. 2. The number of selected projects $|\Pi|$ is maximized. #### Shortest Interval Counter-example: `[1,10],[9,12],[11,20]` #### Earliest start time Counter-example: `[1,10],[2,3],[4,5]` #### Fewest Conflicts Counter-example: `[1,2],[1,4],[1,4],[3,6],[7,8],[5,8],[5,8]` #### Earliest finish time Correct... but why #### Theorem of Greedy Strategy (Earliest Finishing Time) Say this greedy strategy (Earliest Finishing Time) picks a set $\Pi$ of intervals, some other strategy picks a set $O$ of intervals. Assume sorted by finishing time * $\Pi=\{i_1,i_2,...,i_k\},|\Pi|=k$ * $O=\{j_1,j_2,...,j_m\},|O|=m$ We want to show that $|\Pi|\geq|O|,k>m$ #### Lemma: For all $r1. Since $\Pi_r$ is the earliest finish time, so for any set in $O_r$, $f_{i_{r-1}}\leq f_{j_{r-1}}$, for any $j_r$ inserted to $O_r$, it can also be inserted to $\Pi_r$. So $O_r$ cannot pick an interval with earlier finish time than $Pi$ since it will also be picked by definition if $O_r$ is the optimal solution $OPT$. #### Problem of “Greedy Stays Ahead” Proof * Every problem has very different theorem. * It can be challenging to even write down the correct statement that you must prove. * We want a systematic approach to prove the correctness of greedy algorithms. ### Road Map to Prove Greedy Algorithm #### 1. Make a Choice Pick an interval based on greedy choice, say $q$ Proof: **Greedy Choice Property**: Show that using our first choice is not "fatal" – at least one optimal solution makes this choice. Techniques: **Exchange Argument**: "If an optimal solution does not choose $q$, we can turn it into an equally good solution that does." Let $\Pi^*$ be any optimal solution for project set $P$. - If $q\in \Pi^*$, we are done. - Otherwise, let $x$ be the optimal solution from $\Pi^*$ that does not pick $q$. We create another solution $\bar{\Pi^*}$ that replace $x$ with $q$, and prove that the $\bar{\Pi^*}$ is as optimal as $\Pi^*$ #### 2. Create a smaller instance $P'$ of the original problem $P'$ has the same optimization criteria. Proof: **Inductive Structure**: Show that after making the first choice, we're left with a smaller version of the same problem, whose solution we can safely combine with the first choice. Let $P'$ be the subproblem left after making first choice $q$ in problem $P$ and let $\Pi'$ be an optimal solution to $P'$. Then $\Pi=\Pi^*\cup\{q\}$ is an optimal solution to $P$. $P'=P-\{q\}-\{$projects conflicting with $q\}$ #### 3. Solution: Union of choices that we made Union of choices that we made. Proof: **Optimal Substructure**: Show that if we solve the subproblem optimally, adding our first choice creates an optimal solution to the *whole* problem. Let $q$ be the first choice, $P'$ be the subproblem left after making $q$ in problem $P$, $\Pi'$ be an optimal solution to $P'$. We claim that $\Pi=\Pi'\cup \{q\}$ is an optimal solution to $P$. We proceed the proof by contradiction. Assume that $\Pi=\Pi'+\{q\}$ is not optimal. By Greedy choice property $GCP$. we already know that $\exists$ an optimal solution $\Pi^*$ for problem $P$ that contains $q$. If $\Pi$ is not optimal, $cost(\Pi^*)cost(\Pi-q)=\Pi'$ which leads to contradiction that $\Pi'$ is an optimal solution to $P'$. #### 4. Put 1-3 together to write an inductive proof of the Theorem This is independent of problem, same for every problem. Use scheduling problem as an example: Theorem: given a scheduling problem $P$, if we repeatedly choose the remaining feasible project with the earliest finishing time, we will construct an optimal feasible solution to $P$. Proof: We proceed by induction on $|P|$. (based on the size of problem $P$). - Base case: $|P|=1$. - Inductive step. - Inductive hypothesis: For all problems of size $|\Pi|$ $\Pi^*-q$ is also feasible solution to $P'$. $|\Pi^*-q|>|\Pi-q|=\Pi'$ which is an optimal solution to $P'$ which leads to contradiction. Proof: **Smaller problem size** After merging the smallest two files into one, we have strictly less files waiting to merge. #### Optimal Substructure * We can combine optimal solution to the subproblem $P'$ with the greedy choice to get a optimal solution to $P$ Step 4 ignored, same for all greedy problems. ### Conclusion: Greedy Algorithm * Algorithm * Runtime Complexity * Proof * Greedy Choice Property * Construct part of the solution by making a locally good decision. * Inductive Structure * We can combine feasible solution to the subproblem $P'$ with the greedy choice to get a feasible solution to $P$ * After making greedy choice $q$, we are left with a strictly smaller subproblem $P'$ with the same optimality criteria of the original problem * Optimal Substructure * We can combine optimal solution to the subproblem $P'$ with the greedy choice to get a optimal solution to $P$ * Standard Contradiction Argument simplifies it ## Review: ### Essence of master method Let $a\geq 1$ and $b>1$ be constants, let $f(n)$ be a function, and let $T(n)$ be defined on the nonnegative integers by the recurrence $$ T(n)=aT(\frac{n}{b})+f(n) $$ where we interpret $n/b$ to mean either ceiling or floor of $n/b$. $c_{crit}=\log_b a$ Then $T(n)$ has to following asymptotic bounds. * Case I: if $f(n) = O(n^{c})$ ($f(n)$ "dominates" $n^{\log_b a-c}$) where $c-1$ $T(n)=\Theta(n^{critical\_value}*(\log n)^{k+1})$ * if $k=-1$ $T(n)=\Theta(n^{critical\_value}*\log \log n)$ * if $k<-1$ $T(n)=\Theta(n^{critical\_value})$ * Case III: if $f(n) = \Omega(n^{log_b a+c})$ ($n^{log_b a-c}$ "dominates" $f(n)$) for some constant $c >0$, and if a $f(n/b)<= c f(n)$ for some constant $c <1$ then for all sufficiently large $n$, $T(n) = \Theta(n^{log_b a+c})$