arxiv:2403.04343

Adaptive Task Balancing for Visual Instruction Tuning via Inter-Task Contribution and Intra-Task Difficulty

Published on Mar 7, 2024

Authors:

Abstract

Adaptive task balancing method for visual instruction tuning addresses performance imbalances by measuring inter-task contribution and intra-task difficulty to prioritize tasks based on their impact and difficulty levels.

AI-generated summary

Visual instruction tuning is a key training stage of large multimodal models. However, when learning multiple visual tasks simultaneously, this approach often results in suboptimal and imbalanced overall performance due to latent knowledge conflicts across tasks. To mitigate this issue, we propose a novel Adaptive Task Balancing approach tailored for visual instruction tuning (VisATB). Specifically, we measure two critical dimensions for visual task balancing based on validation performance: (1) Inter-Task Contribution, the mechanism where learning one task enhances the performance on others owing to shared knowledge across tasks, and (2) Intra-Task Difficulty, which denotes the inherent learning difficulty of a single task. Furthermore, we propose prioritizing three categories of tasks with greater weight: those that offer substantial contributions to others, those that receive minimal contributions from others, and those that present high learning difficulties. Among these three task weighting strategies, the first and third focus on improving overall performance, and the second targets the mitigation of performance imbalance. Extensive experiments on three benchmarks demonstrate that our VisATB approach consistently achieves superior and more balanced overall performance in visual instruction tuning. The data, code, and models are available at https://github.com/YanqiDai/VisATB.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2403.04343 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2403.04343 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2403.04343 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.