Files
NoteNextra-origin/content/CSE5519/CSE5519_F1.md
2025-09-03 23:17:01 -05:00

33 lines
1.3 KiB
Markdown

# CSE5519 Advances in Computer Vision (Topic F: 2021 and before: Representation Learning)
## A Simple Framework for Contrastive Learning of Visual Representations
[link to the paper](https://arxiv.org/pdf/2002.05709)
~~Laughing my ass off when I see 75% accuracy on ImageNet. Can't believe what the authors think after few years, when Deep Learning is becoming the dominant paradigm in Computer Vision.~~
In this work, we introduce a simple framework for contrastive learning of visual representations, which we call SimCLR.
Wait, that IS a NEURAL NETWORK?
## General Framework
A stochastic data augmentation module
A neural network base encoder $f(\cdot)$
A small neural network projection head $g(\cdot)$
A contrastive loss function
## Novelty in SimCLR
Semi-supervised learning with data augmentation.
> [!TIP]
>
> In the section "Training with Large Batch Size", the authors mentioned that:
>
> To keep it simple, we do not train the model with a memory bank (Wu et al., 2018; He et al., 2019). Instead, we vary the training batch size N from 256 to 8192. A batch size of 8192 gives us 16382 negative examples per positive pair from both augmentation views. They use LARS optimizer for stabilizing the training.
>
> What does memory bank means here? And what is LARS optimizer, and how does it benefit the training?