diff --git a/content/CSE5519/CSE5519_B3.md b/content/CSE5519/CSE5519_B3.md index aba47b0..3210880 100644 --- a/content/CSE5519/CSE5519_B3.md +++ b/content/CSE5519/CSE5519_B3.md @@ -1,2 +1,13 @@ # CSE5519 Advances in Computer Vision (Topic B: 2023: Vision-Language Models) +## InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning + +[link to paper](https://arxiv.org/pdf/2305.06500) + +> [!TIP] +> +> This paper introduces InstructBLIP, a framework for a vision-language model that aligns with text instructions. +> +> It consists of three submodules: the BLIP-2 model with an image decoder, an LLM, and a query Transformer (Q-former) to bridge the two. +> +> From qualitative results, we can see some hints that the model is following the text instructions, but I wonder if this framework could also bring to the image editing and generation tasks? What might be the difficulties in migrating this framework to context-awarded image generation? \ No newline at end of file diff --git a/content/CSE5519/CSE5519_H3.md b/content/CSE5519/CSE5519_H3.md index 3416c63..97bd037 100644 --- a/content/CSE5519/CSE5519_H3.md +++ b/content/CSE5519/CSE5519_H3.md @@ -1,2 +1,13 @@ # CSE5519 Advances in Computer Vision (Topic H: 2023: Safety, Robustness, and Evaluation of CV Models) +## How to backdoor diffusion models + +[link to paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Chou_How_to_Backdoor_Diffusion_Models_CVPR_2023_paper.pdf ) + +> [!TIP] +> +> This is an interesting paper showing that it is possible to backdoor a diffusion model with high utility and high specificity at a low cost compared to per-training. +> +> I wonder how this technique could possibly be used for AI watermarking and reliably detected with other AI operations? +> +> And there are many metrics and loss functions used in this paper, I wonder what objectives they are trying to optimize, and looking for more clarification on the presentation. \ No newline at end of file