updates

2025-09-25 00:18:46 -05:00
parent ffcf2d90eb
commit 2df7d6983e
2 changed files with 66 additions and 0 deletions
--- a/content/CSE5519/CSE5519_F2.md
+++ b/content/CSE5519/CSE5519_F2.md
@@ -1,2 +1,21 @@
 # CSE5519 Advances in Computer Vision (Topic F: 2022: Representation Learning)

+## Masked Autoencoders Are Scalable Vision Learners
+
+[link to the paper](https://openaccess.thecvf.com/content/CVPR2022/papers/He_Masked_Autoencoders_Are_Scalable_Vision_Learners_CVPR_2022_paper.pdf)
+
+### Novelty in MAE
+
+#### Masked Autoencoders
+
+Masked Autoencoders are a type of autoencoders that mask out some of the input data and train the model to reconstruct the original data. For best performance, they mask out 75% of the input data.
+
+A masked auto encoder with a single-block decoder can perform strongly with fine tuning.
+
+This method speeds up the training process by a factor of 3-4
+
+> [!TIP]
+>
+> This paper shows a new way to train a vision model by using masked autoencoders. The authors masked out 75% of the input data and train the model to reconstruct the original data by the insight that image data is highly redundant compared to text data when using transformer architecture.
+>
+> Currently, the sampling method is uniform and simple with surprising results. I wonder if we could use better sampling method, for example uniform sampling with information entropy on each patches would yield better results?