NoteNextra-origin/content/CSE5519/CSE5519_F1.md

# CSE5519 Advances in Computer Vision (Topic F: 2021 and before: Representation Learning)

## A Simple Framework for Contrastive Learning of Visual Representations

[link to the paper](https://arxiv.org/pdf/2002.05709)

~~Laughing my ass off when I see 75% accuracy on ImageNet. Can't believe what the authors think after few years, when Deep Learning is becoming the dominant paradigm in Computer Vision.~~

In this work, we introduce a simple framework for contrastive learning of visual representations, which we call SimCLR.

Wait, that IS a NEURAL NETWORK?

## General Framework

A stochastic data augmentation module

A neural network base encoder $f(\cdot)$

A small neural network projection head $g(\cdot)$

A contrastive loss function

## Novelty in SimCLR

Semi-supervised learning with data augmentation.

> [!TIP]
>
> In the section "Training with Large Batch Size", the authors mentioned that:
>
> To keep it simple, we do not train the model with a memory bank (Wu et al., 2018; He et al., 2019). Instead, we vary the training batch size N from 256 to 8192. A batch size of 8192 gives us 16382 negative examples per positive pair from both augmentation views. They use LARS optimizer for stabilizing the training.
>
> What does memory bank means here? And what is LARS optimizer, and how does it benefit the training?