update
This commit is contained in:
164
content/CSE5519/CSE5519_L3.md
Normal file
164
content/CSE5519/CSE5519_L3.md
Normal file
@@ -0,0 +1,164 @@
|
|||||||
|
# CSE5519 Lecture 3
|
||||||
|
|
||||||
|
## Reminders
|
||||||
|
|
||||||
|
First Example notebook due Sep 18
|
||||||
|
|
||||||
|
Project proposal due Sep 23
|
||||||
|
|
||||||
|
## Continued: A brief history (time) of computer vision
|
||||||
|
|
||||||
|
### Theme changes
|
||||||
|
|
||||||
|
#### 1980
|
||||||
|
|
||||||
|
- “Definitive” detectors
|
||||||
|
- Edges: Canny (1986); corners: Harris & Stephens (1988)
|
||||||
|
- Multiscale image representations
|
||||||
|
- Witkin (1983), Burt & Adelson (1984), Koenderink (1984, 1987), etc.
|
||||||
|
- Markov Random Field models: Geman & Geman (1984)
|
||||||
|
- Segmentation by energy minimization
|
||||||
|
- Kass, Witkin & Terzopoulos (1987), Mumford & Shah (1989)
|
||||||
|
|
||||||
|
#### Conferences, journals, books
|
||||||
|
|
||||||
|
- Conferences: ICPR (1973), CVPR (1983), ICCV (1987), ECCV (1990)
|
||||||
|
- Journals: TPAMI (1979), IJCV (1987)
|
||||||
|
- Books: Duda & Hart (1972), Marr (1982), Ballard & Brown (1982), Horn (1986)
|
||||||
|
|
||||||
|
#### 1980s: The dead ends
|
||||||
|
|
||||||
|
- Alignment-based recognition
|
||||||
|
- Faugeras & Hebert (1983), Grimson & Lozano-Perez (1984), Lowe (1985), Huttenlocher & Ullman (1987), etc.
|
||||||
|
- Aspect graphs
|
||||||
|
- Koenderink & Van Doorn (1979), Plantinga & Dyer (1986), Hebert & Kanade (1985), Ikeuchi & Kanade (1988), Gigus & Malik (1990)
|
||||||
|
- Invariants: Mundy & Zisserman (1992)
|
||||||
|
|
||||||
|
#### 1980s: Meanwhile...
|
||||||
|
|
||||||
|
- Neocognitron: Fukushima (1980)
|
||||||
|
- Back-propagation: Rumelhart, Hinton & Williams (1986)
|
||||||
|
- Origins in control theory and optimization: Kelley (1960), Dreyfus (1962), Bryson & Ho (1969), Linnainmaa (1970)
|
||||||
|
- Application to neural networks: Werbos (1974)
|
||||||
|
- Interesting blog post: Backpropagating through time Or, How come BP hasn’t been invented earlier?
|
||||||
|
- Parallel Distributed Processing: Rumelhart et al. (1987)
|
||||||
|
- Neural networks for digit recognition: LeCun et al. (1989)
|
||||||
|
|
||||||
|
#### 1990s
|
||||||
|
|
||||||
|
Multi-view geometry, statistical and appearance-based models for recognition, first approaches for (class-specific) object detection
|
||||||
|
|
||||||
|
Geometry (mostly) solved
|
||||||
|
|
||||||
|
- Fundamental matrix: Faugeras (1992)
|
||||||
|
- Normalized 8-point algorithm: Hartley (1997)
|
||||||
|
- RANSAC for robust fundamental matrix estimation: Torr & Murray (1997)
|
||||||
|
- Bundle adjustment: Triggs et al. (1999)
|
||||||
|
- Hartley & Zisserman book (2000)
|
||||||
|
- Projective structure from motion: Faugeras and Luong (2001)
|
||||||
|
|
||||||
|
Data enters the scene
|
||||||
|
|
||||||
|
- Appearance-based models: Turk & Pentland (1991), Murase & Nayar (1995)
|
||||||
|
|
||||||
|
PCA for face recognition: Turk & Pentland (1991)
|
||||||
|
Image manifolds
|
||||||
|
|
||||||
|
Keypoint-based image indexing
|
||||||
|
|
||||||
|
- Schmid & Mohr (1996), Lowe (1999)
|
||||||
|
|
||||||
|
Constellation models for object categories
|
||||||
|
|
||||||
|
- Burl, Weber & Perona (1998), Weber, Welling & Perona (2000)
|
||||||
|
|
||||||
|
First sustained use of classifiers and negative data
|
||||||
|
|
||||||
|
- Face detectors: Rowley, Baluja & Kanade (1996), Osuna, Freund & Girosi (1997), Schneiderman & Kanade (1998), Viola & Jones (2001)
|
||||||
|
- Convolutional nets: LeCun et al. (1998)
|
||||||
|
|
||||||
|
Graph cut image inference
|
||||||
|
|
||||||
|
- Boykov, Veksler & Zabih (1998)
|
||||||
|
|
||||||
|
Segmentation
|
||||||
|
|
||||||
|
- Normalized cuts: Shi & Malik (2000)
|
||||||
|
- Berkeley segmentation dataset: Martin et al. (2001)
|
||||||
|
|
||||||
|
Video processing
|
||||||
|
|
||||||
|
- Layered motion models: Adelson & Wang (1993)
|
||||||
|
- Robust optical flow: Black & Anandan (1993)
|
||||||
|
- Probabilistic curve tracking: Isard & Blake (1998)
|
||||||
|
|
||||||
|
#### 2000s: Keypoints and reconstruction
|
||||||
|
|
||||||
|
Keypoints craze
|
||||||
|
|
||||||
|
- Kadir & Brady (2001), Mikolajczyk & Schmid (2002), Matas et al. (2004), Lowe (2004), Bay et al. (2006), etc.
|
||||||
|
|
||||||
|
3D reconstruction "in the wild"
|
||||||
|
|
||||||
|
- SFM in the wild
|
||||||
|
- Multi-view stereo, stereo on GPU's
|
||||||
|
|
||||||
|
Generic object recognition
|
||||||
|
|
||||||
|
- Constellation models
|
||||||
|
- Bags of features
|
||||||
|
- Datasets: Caltech-101 -> ImageNet
|
||||||
|
|
||||||
|
Generic object detection
|
||||||
|
|
||||||
|
- PASCAL dataset
|
||||||
|
- HOG, Deformable part models
|
||||||
|
|
||||||
|
Action and activity recognition:
|
||||||
|
|
||||||
|
"misc. early efforts"
|
||||||
|
|
||||||
|
#### 1990s-2000s: Dead ends (?)
|
||||||
|
|
||||||
|
Probabilistic graphical models
|
||||||
|
|
||||||
|
Perceptual organization
|
||||||
|
|
||||||
|
#### 2010s: Deep learning, big data
|
||||||
|
|
||||||
|
They can be more accurate (often much more accurate).
|
||||||
|
|
||||||
|
They are faster (often much faster).
|
||||||
|
|
||||||
|
They are adaptable to new problems.
|
||||||
|
|
||||||
|
Deep Convolutional Neural Networks
|
||||||
|
|
||||||
|
- Many layers, some of which are convolutional (usually near the input)
|
||||||
|
- Early layers "extract features"
|
||||||
|
- Trained using stochastic gradient descent on very large datasets
|
||||||
|
- Many possible loss functions (depending on task)
|
||||||
|
|
||||||
|
Additional benefits:
|
||||||
|
|
||||||
|
- High-quality software frameworks
|
||||||
|
- "New" network layers
|
||||||
|
- Dropout (enables simultaneously training many models)
|
||||||
|
- ReLU activation (enables faster training because gradients don’t become zero)
|
||||||
|
- Bigger datasets
|
||||||
|
- reduces overfitting
|
||||||
|
- improves robustness
|
||||||
|
- enable larger, deeper networks
|
||||||
|
- Deeper networks eliminate the need for hand-engineered features
|
||||||
|
|
||||||
|
### Where did we go wrong?
|
||||||
|
|
||||||
|
In retrospect, computer vision has had several periods of "spinning its wheels"
|
||||||
|
|
||||||
|
- We've always **prioritized methods that could already do interesting things** over potentially more promising methods that could not yet deliver
|
||||||
|
- We've undervalued simple methods, data, and learning
|
||||||
|
- When nothing worked, we **distracted ourselves with fancy math**
|
||||||
|
- On a few occasions, we unaccountably **ignored methods that later proved to be "game changers"** (RANSAC, SIFT)
|
||||||
|
- We've had some problems with **bandwagon jumping and intellectual snobbery**
|
||||||
|
|
||||||
|
But it's not clear whether any of it mattered in the end.
|
||||||
@@ -5,4 +5,8 @@ export default {
|
|||||||
},
|
},
|
||||||
CSE5519_L1: "CSE5519 Advances in Computer Vision (Lecture 1)",
|
CSE5519_L1: "CSE5519 Advances in Computer Vision (Lecture 1)",
|
||||||
CSE5519_L2: "CSE5519 Advances in Computer Vision (Lecture 2)",
|
CSE5519_L2: "CSE5519 Advances in Computer Vision (Lecture 2)",
|
||||||
|
CSE5519_L3: "CSE5519 Advances in Computer Vision (Lecture 3)",
|
||||||
|
"---":{
|
||||||
|
type: 'separator'
|
||||||
|
},
|
||||||
}
|
}
|
||||||
Reference in New Issue
Block a user