YuqianFu's picture
Upload folder using huggingface_hub
625a17f verified
# Prepare Datasets for PSALM
The training process of PSALM has two stages. First stage is visual language alignment and the second stage is joint training of multiple segmentation tasks.
We use a custom dataset to enable joint training. We assume that all datasets' root path are under /datasets.
## First stage training
We follow LLaVA's training strategy, see [here](https://github.com/haotian-liu/LLaVA/blob/main/docs/Data.md#pretraining-dataset) for a detailed dataset preparation.
## Second stage joint training
The second stage joint training of PSALM contains four different tasks: Generic Segmentation, Referring Segmentation, Interactivate Segmentation, and Visual-Language Tasks.
We use COCO Panoptic for Generic Segmentation, RefCOCO/+/g for Referring Segmentation, COCO-Interactive for Interactive Segmentation and LLaVA-1.5's training data for Visual-Language Tasks.
(Optional) We also support LVIS for PSALM's second stage joint training.
### Expected dataset structure for [COCO](https://cocodataset.org/#download):
```
coco/
annotations/
instances_{train,val}2017.json
panoptic_{train,val}2017.json
{train,val}2017/
# image files that are mentioned in the corresponding json
panoptic_{train,val}2017/ # png annotations
panoptic_semseg_{train,val}2017/ # generated by the script mentioned below
```
Install panopticapi by:
```
pip install git+https://github.com/cocodataset/panopticapi.git
```
run `python datasets/build_COCO_instance.py`, to get dataset format for COCO instance segmentation.
run `python datasets/prepare_coco_semantic_annos_from_panoptic_annos.py`, to extract semantic annotations from panoptic annotations (only used for evaluation).
### Expected dataset structure for [RefCOCO/+/g](https://github.com/lichengunc/refer):
```
refseg/
refcoco/
instances.json
merged_google.json
refs(google).p
refs(unc).p
refcoco+/
instances.json
refs(unc).p
refcocog/
instances.json
refs(google).p
refs(umd.p)
images/
mscoco/
train2014/
```
run `python datasets/build_RefCOCO.py`, to get the dataset format for joint training.
### Dataset preparation for COCO-Interactive:
We build COCO-Interactive upon COCO-Instance. So make sure follow the instruction of [COCO](#expected-dataset-structure-for-coco) preparation.
run `python datasets/build_COCO_Interactivate.py`, to get the dataset format for joint training.
Also you can directly download converted file of COCO-Interactive in [Google Drive](https://drive.google.com/file/d/1EcC1tl1OQRgIqqy7KFG7JZz2KHujAQB3/view?usp=sharing) | [Baidu Cloud](https://pan.baidu.com/s/1NRGJGkJDUGn8CU-sU5ScOg) . Detailed format of downloaded file is [here](https://github.com/zamling/PSALM/blob/main/docs/DATASET.md#download-converted-dataset-files)
### Dataset preparation for LLaVA-1.5 training data:
Please download the images and [annotation](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/blob/main/llava_v1_5_mix665k.json) following [llava 1.5](https://github.com/haotian-liu/LLaVA?tab=readme-ov-file#visual-instruction-tuning) stage 2 training instruction.
```
# Do not need to download COCO again
gqa/
images/
ocr_vqa/
images/
textvqa/
train_images/
vg/
VG_100K/
VG_100K_2/
llava_v1_5_mix665k.json
```
Since LLaVA-1.5 dataset contain text-only samples, run `python datasets/prepare_llava_1_5.py` to filter text-only samples. Note to change paths in `prepare_llava_1_5.py` to your dataset paths.
### (Optional) Expected dataset structure for [LVIS](https://www.lvisdataset.org/dataset):
We only use LVIS dataset for training. If you have already downloaded the COCO images, you only need to download the LVIS annotations.
```
lvis/
{train, val}2017/
# Since you already have the coco image, there is no need to download this
lvis_v1_train.json
lvis_v1_val.json
```
run `python datasets/build_lvis.py`, to get the dataset format for joint training.
## Zero-shot evaluation for other dataset
PSALM shows powerful zero-shot capability for many unseen tasks: Open-Vocabulary Segmentation, Generalized Referring Segmentation, and Video Object Segmentation.
### Dataset preparation for Open-Vocabulary Segmentation:
We follow [here](https://github.com/bytedance/fc-clip/blob/main/datasets/README.md#expected-dataset-structure-for-cityscapes) for preparation of cityscapes, ADE20k, Pascal VOC, and Pascal Context.
### Expected dataset structure for [gRefCOCO](https://github.com/henghuiding/gRefCOCO):
Download the gRefCOCO dataset from this [link](https://entuedu-my.sharepoint.com/personal/liuc0058_e_ntu_edu_sg/_layouts/15/onedrive.aspx?id=%2Fpersonal%2Fliuc0058%5Fe%5Fntu%5Fedu%5Fsg%2FDocuments%2Fopensource%2FGRES%2Fdataset&ga=1) and put in the same folder of RefCOCO
```
refer_seg/
grefcoco/
grefs(unc).json
instances.json
refcoco/
refcoco+/
refcocog/
```
run `python datasets/build_gRefCOCO.py`, to get the dataset format for evaluation.
### Expected dataset structure for [DAVIS-2017](https://davischallenge.org/davis2017/code.html)
```
DAVIS/
2017/
trainval/
Annotations/
480p/
# name for each video
ImageSets/
2017/
train.txt
val.txt
JPEGImages/
480p/
# name for each video
```
run `python datasets/build_DAVIS.py`, to get the dataset format for evaluation.
### Download Converted Dataset Files
You can download converted files ([Google Drive](https://drive.google.com/file/d/1EcC1tl1OQRgIqqy7KFG7JZz2KHujAQB3/view?usp=sharing) | [Baidu Cloud](https://pan.baidu.com/s/1NRGJGkJDUGn8CU-sU5ScOg) (code: hust)).
The dowloaded files should in following structure:
```
refcoco/
refcoco_val.json
refcoco_testA.json
...
refcoco+/
refcoco+_val.json
refcoco+_testA.json
...
refcocog/
refcocog_val.json
refcocog_test.json
...
grefcoco/
refcocog_val.json
refcocog_testA.json
refcocog_testB.json
coco_interactive_train_psalm.json # training set for interactive coco
coco_interactive_val_psalm.json # val set for interactive coco
instruction_dataset_coco_format.json: # GT for COCO instance
#you need to put this file in psalm/output/instance_segmentation
instruction_dataset_coco_format.json.lock #you need to put this file in psalm/output/instance_segmentation
instance_train_psalm.json: training set for COCO instance
instance_val_psalm.json: val set for COCO instance
trainval_val_psalm.json: val set for DAVIS
```