Files
NoteNextra-origin/content/CSE347/CSE347_L1.md
2025-07-06 12:40:25 -05:00

9.9 KiB
Raw Blame History

Lecture 1

Greedy Algorithms

  • Builds up a solution by making a series of small decisions that optimize some objective.
  • Make one irrevocable choice at a time, creating smaller and smaller sub-problems of the same kind as the original problem.
  • There are many potential greedy strategies and picking the right one can be challenging.

A Scheduling Problem

You manage a giant space telescope.

  • There are n research projects that want to use it to make observations.
  • Only one project can use the telescope at a time.
  • Project p_i needs the telescope starting at time s_i and running for a length of time t_i.
  • Goal: schedule as many as possible

Formally

Input:

  • Given a set P of projects, |P|=n
  • Each request p_i\in P occupies interval [s_i,f_i), where f_i=s_i+t_i

Goal: Choose a subset \Pi\sqsubseteq P such that

  1. No two projects in \Pi have overlapping intervals.
  2. The number of selected projects |\Pi| is maximized.

Shortest Interval

Counter-example: [1,10],[9,12],[11,20]

Earliest start time

Counter-example: [1,10],[2,3],[4,5]

Fewest Conflicts

Counter-example: [1,2],[1,4],[1,4],[3,6],[7,8],[5,8],[5,8]

Earliest finish time

Correct... but why

Theorem of Greedy Strategy (Earliest Finishing Time)

Say this greedy strategy (Earliest Finishing Time) picks a set \Pi of intervals, some other strategy picks a set O of intervals.

Assume sorted by finishing time

  • \Pi=\{i_1,i_2,...,i_k\},|\Pi|=k
  • O=\{j_1,j_2,...,j_m\},|O|=m

We want to show that |\Pi|\geq|O|,k>m

Lemma: For all r<k,f_{i_r}\leq f_{j_r}

We proceed the proof by induction.

  • Base Case, when r=1. \Pi is the earliest finish time, and O cannot pick a interval with earlier finish time, so f_{i_r}\leq f_{j_r}

  • Inductive step, when r>1. Since \Pi_r is the earliest finish time, so for any set in O_r, f_{i_{r-1}}\leq f_{j_{r-1}}, for any j_r inserted to O_r, it can also be inserted to \Pi_r. So O_r cannot pick an interval with earlier finish time than Pi since it will also be picked by definition if O_r is the optimal solution OPT.

Problem of “Greedy Stays Ahead” Proof

  • Every problem has very different theorem.
  • It can be challenging to even write down the correct statement that you must prove.
  • We want a systematic approach to prove the correctness of greedy algorithms.

Road Map to Prove Greedy Algorithm

1. Make a Choice

Pick an interval based on greedy choice, say q

Proof: Greedy Choice Property: Show that using our first choice is not "fatal" at least one optimal solution makes this choice.

Techniques: Exchange Argument: "If an optimal solution does not choose q, we can turn it into an equally good solution that does."

Let \Pi^* be any optimal solution for project set P.

  • If q\in \Pi^*, we are done.
  • Otherwise, let x be the optimal solution from \Pi^* that does not pick q. We create another solution \bar{\Pi^*} that replace x with q, and prove that the \bar{\Pi^*} is as optimal as \Pi^*

2. Create a smaller instance P' of the original problem

P' has the same optimization criteria.

Proof: Inductive Structure: Show that after making the first choice, we're left with a smaller version of the same problem, whose solution we can safely combine with the first choice.

Let P' be the subproblem left after making first choice q in problem P and let \Pi' be an optimal solution to P'. Then \Pi=\Pi^*\cup\{q\} is an optimal solution to P.

$P'=P-{q}-{$projects conflicting with q\}

3. Solution: Union of choices that we made

Union of choices that we made.

Proof: Optimal Substructure: Show that if we solve the subproblem optimally, adding our first choice creates an optimal solution to the whole problem.

Let q be the first choice, P' be the subproblem left after making q in problem P, \Pi' be an optimal solution to P'. We claim that \Pi=\Pi'\cup \{q\} is an optimal solution to P.

We proceed the proof by contradiction.

Assume that \Pi=\Pi'+\{q\} is not optimal.

By Greedy choice property GCP. we already know that \exists an optimal solution \Pi^* for problem P that contains q. If \Pi is not optimal, cost(\Pi^*)<cost(\Pi). Then since \Pi^*-q is also a feasible solution to P'. cost(\Pi^*-q)>cost(\Pi-q)=\Pi' which leads to contradiction that \Pi' is an optimal solution to P'.

4. Put 1-3 together to write an inductive proof of the Theorem

This is independent of problem, same for every problem.

Use scheduling problem as an example:

Theorem: given a scheduling problem P, if we repeatedly choose the remaining feasible project with the earliest finishing time, we will construct an optimal feasible solution to P.

Proof: We proceed by induction on |P|. (based on the size of problem P).

  • Base case: |P|=1.
  • Inductive step.
    • Inductive hypothesis: For all problems of size <n, earliest finishing time (EFT) gives us an optimal solution.
    • EFT is optimal for problem of size n.
    • Proof: Once we pick q, because of greedy choice. $P'=P={q} -{$interval that conflict with q\}. |P'|<n, By Inductive hypothesis, EFT gives us an optimal solution to P', but by inductive substructure, and optimal substructure. \Pi' (optimal solution to P'), we have optimal solution to P.

this step always holds as long as the previous three properties hold, and we don't usually write the whole proof.

# Algorithm construction for Interval scheduling problem
def schedule(p):
  # sorting takes O(n)=nlogn
  p=sorted(p,key=lambda x:x[1])
  res=[P[0]]
  # O(n)=n
  for i in p[1:]:
    if res[-1][-1]<i[0]:
      res.append(i)
  return res

Extra Examples:

File compression problem

You have n files of different sizes f_i.

You want to merge them to create a single file. merge(f_i,f_j) takes time f_i+f_j and creates a file of size f_k=f_i+f_j.

Goal: Find the order of merges such that the total time to merge is minimized.

Thinking process: The merge process is a binary tree and each of the file is the leaf of the tree.

The total time required =\sum^n_{i=1} d_if_i, where d_i is the depth of the file in the compression tree.

So compressing the smaller file first may yield a faster run time.

Proof:

Greedy Choice Property

Construct part of the solution by making a locally good decision.

Lemma: \exist some optimal solution that merges the two smallest file first, lets say [f_1,f_2]

Proof: Exchange argument

  • Case 1: Optimal choice already merges f_1,f_2, done. Time order does not matter in this problem at some point.
    • eg: [2,2,3], merge 2,3 and 2,2 first don't change the total cost
  • Case 2: Optimal choice does not merges f_1 and f_2.
    • Suppose the optimal solution merges f_x,f_y as the deepest merge.
    • Then d_x\geq d_1,d_y\geq d_2. Exchanging f_1,f_2 with f_x,f_y would yield a strictly less greater solution since f_1,f_2 already smallest.

Inductive Structure

  • We can combine feasible solution to the subproblem P' with the greedy choice to get a feasible solution to P
  • After making greedy choice q, we are left with a strictly smaller subproblem P' with the same optimality criteria of the original problem

Proof: Optimal Substructure: Show that if we solve the subproblem optimally, adding our first choice creates an optimal solution to the whole problem.

Let q be the first choice, P' be the subproblem left after making q in problem P, \Pi^* be an optimal solution to P'. We claim that \Pi=\Pi'\cup \{q\} is an optimal solution to P.

We proceed the proof by contradiction.

Assume that \Pi=\Pi^*+\{q\} is not optimal.

By Greedy choice property GCP. we already know that \Pi^* is optimal solution that contains q. Then |\Pi^*|>|\Pi| \Pi^*-q is also feasible solution to P'. |\Pi^*-q|>|\Pi-q|=\Pi' which is an optimal solution to P' which leads to contradiction.

Proof: Smaller problem size

After merging the smallest two files into one, we have strictly less files waiting to merge.

Optimal Substructure

  • We can combine optimal solution to the subproblem P' with the greedy choice to get a optimal solution to P

Step 4 ignored, same for all greedy problems.

Conclusion: Greedy Algorithm

  • Algorithm
  • Runtime Complexity
  • Proof
    • Greedy Choice Property
      • Construct part of the solution by making a locally good decision.
    • Inductive Structure
      • We can combine feasible solution to the subproblem P' with the greedy choice to get a feasible solution to P
      • After making greedy choice q, we are left with a strictly smaller subproblem P' with the same optimality criteria of the original problem
    • Optimal Substructure
      • We can combine optimal solution to the subproblem P' with the greedy choice to get a optimal solution to P
  • Standard Contradiction Argument simplifies it

Review:

Essence of master method

Let a\geq 1 and b>1 be constants, let f(n) be a function, and let T(n) be defined on the nonnegative integers by the recurrence


T(n)=aT(\frac{n}{b})+f(n)

where we interpret n/b to mean either ceiling or floor of n/b. c_{crit}=\log_b a Then T(n) has to following asymptotic bounds.

  • Case I: if f(n) = O(n^{c}) (f(n) "dominates" n^{\log_b a-c}) where c<c_{crit}, then T(n) = \Theta(n^{c_{crit}})

  • Case II: if f(n) = \Theta(n^{c_{crit}}), (f(n), n^{\log_b a-c} have no dominate) then T(n) = \Theta(n^{\log_b a} \log_2 n)

    Extension for f(n)=\Theta(n^{critical\_value}*(\log n)^k)

    • if k>-1

      T(n)=\Theta(n^{critical\_value}*(\log n)^{k+1})

    • if k=-1

      T(n)=\Theta(n^{critical\_value}*\log \log n)

    • if k<-1

      T(n)=\Theta(n^{critical\_value})

  • Case III: if f(n) = \Omega(n^{log_b a+c}) (n^{log_b a-c} "dominates" f(n)) for some constant c >0, and if a f(n/b)<= c f(n) for some constant c <1 then for all sufficiently large n, T(n) = \Theta(n^{log_b a+c})