diff --git a/content/CSE5519/CSE5519_L3.md b/content/CSE5519/CSE5519_L3.md new file mode 100644 index 0000000..68756da --- /dev/null +++ b/content/CSE5519/CSE5519_L3.md @@ -0,0 +1,164 @@ +# CSE5519 Lecture 3 + +## Reminders + +First Example notebook due Sep 18 + +Project proposal due Sep 23 + +## Continued: A brief history (time) of computer vision + +### Theme changes + +#### 1980 + +- “Definitive” detectors + - Edges: Canny (1986); corners: Harris & Stephens (1988) +- Multiscale image representations + - Witkin (1983), Burt & Adelson (1984), Koenderink (1984, 1987), etc. + - Markov Random Field models: Geman & Geman (1984) +- Segmentation by energy minimization + - Kass, Witkin & Terzopoulos (1987), Mumford & Shah (1989) + +#### Conferences, journals, books + +- Conferences: ICPR (1973), CVPR (1983), ICCV (1987), ECCV (1990) +- Journals: TPAMI (1979), IJCV (1987) +- Books: Duda & Hart (1972), Marr (1982), Ballard & Brown (1982), Horn (1986) + +#### 1980s: The dead ends + +- Alignment-based recognition + - Faugeras & Hebert (1983), Grimson & Lozano-Perez (1984), Lowe (1985), Huttenlocher & Ullman (1987), etc. +- Aspect graphs + - Koenderink & Van Doorn (1979), Plantinga & Dyer (1986), Hebert & Kanade (1985), Ikeuchi & Kanade (1988), Gigus & Malik (1990) +- Invariants: Mundy & Zisserman (1992) + +#### 1980s: Meanwhile... + +- Neocognitron: Fukushima (1980) +- Back-propagation: Rumelhart, Hinton & Williams (1986) + - Origins in control theory and optimization: Kelley (1960), Dreyfus (1962), Bryson & Ho (1969), Linnainmaa (1970) + - Application to neural networks: Werbos (1974) + - Interesting blog post: Backpropagating through time Or, How come BP hasn’t been invented earlier? +- Parallel Distributed Processing: Rumelhart et al. (1987) +- Neural networks for digit recognition: LeCun et al. (1989) + +#### 1990s + +Multi-view geometry, statistical and appearance-based models for recognition, first approaches for (class-specific) object detection + +Geometry (mostly) solved + +- Fundamental matrix: Faugeras (1992) +- Normalized 8-point algorithm: Hartley (1997) +- RANSAC for robust fundamental matrix estimation: Torr & Murray (1997) +- Bundle adjustment: Triggs et al. (1999) +- Hartley & Zisserman book (2000) +- Projective structure from motion: Faugeras and Luong (2001) + +Data enters the scene + +- Appearance-based models: Turk & Pentland (1991), Murase & Nayar (1995) + +PCA for face recognition: Turk & Pentland (1991) +Image manifolds + +Keypoint-based image indexing + +- Schmid & Mohr (1996), Lowe (1999) + +Constellation models for object categories + +- Burl, Weber & Perona (1998), Weber, Welling & Perona (2000) + +First sustained use of classifiers and negative data + +- Face detectors: Rowley, Baluja & Kanade (1996), Osuna, Freund & Girosi (1997), Schneiderman & Kanade (1998), Viola & Jones (2001) +- Convolutional nets: LeCun et al. (1998) + +Graph cut image inference + +- Boykov, Veksler & Zabih (1998) + +Segmentation + +- Normalized cuts: Shi & Malik (2000) +- Berkeley segmentation dataset: Martin et al. (2001) + +Video processing + +- Layered motion models: Adelson & Wang (1993) +- Robust optical flow: Black & Anandan (1993) +- Probabilistic curve tracking: Isard & Blake (1998) + +#### 2000s: Keypoints and reconstruction + +Keypoints craze + +- Kadir & Brady (2001), Mikolajczyk & Schmid (2002), Matas et al. (2004), Lowe (2004), Bay et al. (2006), etc. + +3D reconstruction "in the wild" + +- SFM in the wild +- Multi-view stereo, stereo on GPU's + +Generic object recognition + +- Constellation models +- Bags of features +- Datasets: Caltech-101 -> ImageNet + +Generic object detection + +- PASCAL dataset +- HOG, Deformable part models + +Action and activity recognition: + +"misc. early efforts" + +#### 1990s-2000s: Dead ends (?) + +Probabilistic graphical models + +Perceptual organization + +#### 2010s: Deep learning, big data + +They can be more accurate (often much more accurate). + +They are faster (often much faster). + +They are adaptable to new problems. + +Deep Convolutional Neural Networks + +- Many layers, some of which are convolutional (usually near the input) +- Early layers "extract features" +- Trained using stochastic gradient descent on very large datasets +- Many possible loss functions (depending on task) + +Additional benefits: + +- High-quality software frameworks +- "New" network layers + - Dropout (enables simultaneously training many models) + - ReLU activation (enables faster training because gradients don’t become zero) +- Bigger datasets + - reduces overfitting + - improves robustness + - enable larger, deeper networks +- Deeper networks eliminate the need for hand-engineered features + +### Where did we go wrong? + +In retrospect, computer vision has had several periods of "spinning its wheels" + +- We've always **prioritized methods that could already do interesting things** over potentially more promising methods that could not yet deliver +- We've undervalued simple methods, data, and learning +- When nothing worked, we **distracted ourselves with fancy math** +- On a few occasions, we unaccountably **ignored methods that later proved to be "game changers"** (RANSAC, SIFT) +- We've had some problems with **bandwagon jumping and intellectual snobbery** + +But it's not clear whether any of it mattered in the end. diff --git a/content/CSE5519/_meta.js b/content/CSE5519/_meta.js index 2a4e437..4b7c17a 100644 --- a/content/CSE5519/_meta.js +++ b/content/CSE5519/_meta.js @@ -5,4 +5,8 @@ export default { }, CSE5519_L1: "CSE5519 Advances in Computer Vision (Lecture 1)", CSE5519_L2: "CSE5519 Advances in Computer Vision (Lecture 2)", + CSE5519_L3: "CSE5519 Advances in Computer Vision (Lecture 3)", + "---":{ + type: 'separator' + }, } \ No newline at end of file