diff --git a/content/CSE5519/CSE5519_L3.md b/content/CSE5519/CSE5519_L3.md
new file mode 100644
index 0000000..68756da
--- /dev/null
+++ b/content/CSE5519/CSE5519_L3.md
@@ -0,0 +1,164 @@
+# CSE5519 Lecture 3
+
+## Reminders
+
+First Example notebook due Sep 18
+
+Project proposal due Sep 23
+
+## Continued: A brief history (time) of computer vision
+
+### Theme changes
+
+#### 1980
+
+- “Definitive” detectors
+  - Edges: Canny (1986); corners: Harris & Stephens (1988)
+- Multiscale image representations
+  - Witkin (1983), Burt & Adelson (1984), Koenderink (1984, 1987), etc.
+  - Markov Random Field models: Geman & Geman (1984)
+- Segmentation by energy minimization
+  - Kass, Witkin & Terzopoulos (1987), Mumford & Shah (1989)
+
+#### Conferences, journals, books
+
+- Conferences: ICPR (1973), CVPR (1983), ICCV (1987), ECCV (1990)
+- Journals: TPAMI (1979), IJCV (1987)
+- Books: Duda & Hart (1972), Marr (1982), Ballard & Brown (1982), Horn (1986)
+
+#### 1980s: The dead ends
+
+- Alignment-based recognition
+  - Faugeras & Hebert (1983), Grimson & Lozano-Perez (1984), Lowe (1985), Huttenlocher & Ullman (1987), etc.
+- Aspect graphs
+  - Koenderink & Van Doorn (1979), Plantinga & Dyer (1986), Hebert & Kanade (1985), Ikeuchi & Kanade (1988), Gigus & Malik (1990)
+- Invariants: Mundy & Zisserman (1992)
+
+#### 1980s: Meanwhile...
+
+- Neocognitron: Fukushima (1980)
+- Back-propagation: Rumelhart, Hinton & Williams (1986)
+  - Origins in control theory and optimization: Kelley (1960), Dreyfus (1962), Bryson & Ho (1969), Linnainmaa (1970)
+  - Application to neural networks: Werbos (1974)
+  - Interesting blog post: Backpropagating through time Or, How come BP hasn’t been invented earlier?
+- Parallel Distributed Processing: Rumelhart et al. (1987)
+- Neural networks for digit recognition: LeCun et al. (1989)
+
+#### 1990s
+
+Multi-view geometry, statistical and appearance-based models for recognition, first approaches for (class-specific) object detection
+
+Geometry (mostly) solved
+
+- Fundamental matrix: Faugeras (1992)
+- Normalized 8-point algorithm: Hartley (1997)
+- RANSAC for robust fundamental matrix estimation: Torr & Murray (1997)
+- Bundle adjustment: Triggs et al. (1999)
+- Hartley & Zisserman book (2000)
+- Projective structure from motion: Faugeras and Luong (2001)
+
+Data enters the scene
+
+- Appearance-based models: Turk & Pentland (1991), Murase & Nayar (1995)
+
+PCA for face recognition: Turk & Pentland (1991)
+Image manifolds
+
+Keypoint-based image indexing
+
+- Schmid & Mohr (1996), Lowe (1999)
+
+Constellation models for object categories
+
+- Burl, Weber & Perona (1998), Weber, Welling & Perona (2000)
+
+First sustained use of classifiers and negative data
+
+- Face detectors: Rowley, Baluja & Kanade (1996), Osuna, Freund & Girosi (1997), Schneiderman & Kanade (1998), Viola & Jones (2001)
+- Convolutional nets: LeCun et al. (1998)
+
+Graph cut image inference
+
+- Boykov, Veksler & Zabih (1998)
+
+Segmentation
+
+- Normalized cuts: Shi & Malik (2000)
+- Berkeley segmentation dataset: Martin et al. (2001)
+
+Video processing
+
+- Layered motion models: Adelson & Wang (1993)
+- Robust optical flow: Black & Anandan (1993)
+- Probabilistic curve tracking: Isard & Blake (1998)
+
+#### 2000s: Keypoints and reconstruction
+
+Keypoints craze
+
+- Kadir & Brady (2001), Mikolajczyk & Schmid (2002), Matas et al. (2004), Lowe (2004), Bay et al. (2006), etc.
+
+3D reconstruction "in the wild"
+
+- SFM in the wild
+- Multi-view stereo, stereo on GPU's
+
+Generic object recognition
+
+- Constellation models
+- Bags of features
+- Datasets: Caltech-101 -> ImageNet
+
+Generic object detection
+
+- PASCAL dataset
+- HOG, Deformable part models
+
+Action and activity recognition:
+
+"misc. early efforts"
+
+#### 1990s-2000s: Dead ends (?)
+
+Probabilistic graphical models
+
+Perceptual organization
+
+#### 2010s: Deep learning, big data
+
+They can be more accurate (often much more accurate).
+
+They are faster (often much faster).
+
+They are adaptable to new problems.
+
+Deep Convolutional Neural Networks
+
+- Many layers, some of which are convolutional (usually near the input)
+- Early layers "extract features"
+- Trained using stochastic gradient descent on very large datasets
+- Many possible loss functions (depending on task)
+
+Additional benefits:
+
+- High-quality software frameworks
+- "New" network layers
+  - Dropout (enables simultaneously training many models)
+  - ReLU activation (enables faster training because gradients don’t become zero)
+- Bigger datasets
+  - reduces overfitting
+  - improves robustness
+  - enable larger, deeper networks
+- Deeper networks eliminate the need for hand-engineered features
+
+### Where did we go wrong?
+
+In retrospect, computer vision has had several periods of "spinning its wheels"
+
+- We've always **prioritized methods that could already do interesting things** over potentially more promising methods that could not yet deliver
+- We've undervalued simple methods, data, and learning
+- When nothing worked, we **distracted ourselves with fancy math**
+- On a few occasions, we unaccountably **ignored methods that later proved to be "game changers"** (RANSAC, SIFT)
+- We've had some problems with **bandwagon jumping and intellectual snobbery**
+
+But it's not clear whether any of it mattered in the end.
diff --git a/content/CSE5519/_meta.js b/content/CSE5519/_meta.js
index 2a4e437..4b7c17a 100644
--- a/content/CSE5519/_meta.js
+++ b/content/CSE5519/_meta.js
@@ -5,4 +5,8 @@ export default {
     },
     CSE5519_L1: "CSE5519 Advances in Computer Vision (Lecture 1)",
     CSE5519_L2: "CSE5519 Advances in Computer Vision (Lecture 2)",
+    CSE5519_L3: "CSE5519 Advances in Computer Vision (Lecture 3)",
+    "---":{
+        type: 'separator'
+    },
 }
\ No newline at end of file