sad
This commit is contained in:
@@ -1,2 +1,17 @@
|
|||||||
# CSE5519 Advances in Computer Vision (Topic B: 2024: Vision-Language Models)
|
# CSE5519 Advances in Computer Vision (Topic B: 2024: Vision-Language Models)
|
||||||
|
|
||||||
|
## Improved Baselines with Visual Instruction Tuning (LLaVA-1.5)
|
||||||
|
|
||||||
|
[link to the paper](https://openaccess.thecvf.com/content/CVPR2024/papers/Liu_Improved_Baselines_with_Visual_Instruction_Tuning_CVPR_2024_paper.pdf)
|
||||||
|
|
||||||
|
This paper shows that the visual instruction tuning can improve the performance of the vision-language model.
|
||||||
|
|
||||||
|
### Novelty in LLaVA-1.5
|
||||||
|
|
||||||
|
1. Scaling to high resolution images by dividing images into grids and maintaining the data efficiency.
|
||||||
|
2. Compositional ability, (use long-form language reasoning together with shorter visual reasoning can improve the model's writing ability)
|
||||||
|
3. Random downsampling will not degrade the performance.
|
||||||
|
|
||||||
|
>[!TIP]
|
||||||
|
>
|
||||||
|
> This paper shows that LLaVA-1.5 obeys the scaling law and splitting the high resolution images into grids to maintain the data efficiency. I wonder why this method is not applicable to multi-image understanding tasks? Why we cannot assign index embeddings to each image and push the image sets to the model for better understanding?
|
||||||
@@ -15,14 +15,15 @@
|
|||||||
"@vercel/analytics": "^1.5.0",
|
"@vercel/analytics": "^1.5.0",
|
||||||
"@vercel/speed-insights": "^1.2.0",
|
"@vercel/speed-insights": "^1.2.0",
|
||||||
"cross-env": "^7.0.3",
|
"cross-env": "^7.0.3",
|
||||||
|
"eslint-config-next": "^16.0.1",
|
||||||
"katex": "^0.16.22",
|
"katex": "^0.16.22",
|
||||||
"next": "^15.5.2",
|
"next": "^16.0.1",
|
||||||
"next-sitemap": "^4.2.3",
|
"next-sitemap": "^4.2.3",
|
||||||
"nextra": "^4.2.17",
|
"nextra": "^4.2.17",
|
||||||
"nextra-theme-docs": "^4.2.17",
|
"nextra-theme-docs": "^4.2.17",
|
||||||
"pagefind": "^1.4.0",
|
"pagefind": "^1.4.0",
|
||||||
"react": "^19.1.0",
|
"react": "^19.2.0",
|
||||||
"react-dom": "^19.1.0"
|
"react-dom": "^19.2.0"
|
||||||
},
|
},
|
||||||
"devDependencies": {
|
"devDependencies": {
|
||||||
"@types/node": "24.0.10",
|
"@types/node": "24.0.10",
|
||||||
|
|||||||
Reference in New Issue
Block a user