# Prepare Datasets for PSALM The training process of PSALM has two stages. First stage is visual language alignment and the second stage is joint training of multiple segmentation tasks. We use a custom dataset to enable joint training. We assume that all datasets' root path are under /datasets. ## First stage training We follow LLaVA's training strategy, see [here](https://github.com/haotian-liu/LLaVA/blob/main/docs/Data.md#pretraining-dataset) for a detailed dataset preparation. ## Second stage joint training The second stage joint training of PSALM contains four different tasks: Generic Segmentation, Referring Segmentation, Interactivate Segmentation, and Visual-Language Tasks. We use COCO Panoptic for Generic Segmentation, RefCOCO/+/g for Referring Segmentation, COCO-Interactive for Interactive Segmentation and LLaVA-1.5's training data for Visual-Language Tasks. (Optional) We also support LVIS for PSALM's second stage joint training. ### Expected dataset structure for [COCO](https://cocodataset.org/#download): ``` coco/ annotations/ instances_{train,val}2017.json panoptic_{train,val}2017.json {train,val}2017/ # image files that are mentioned in the corresponding json panoptic_{train,val}2017/ # png annotations panoptic_semseg_{train,val}2017/ # generated by the script mentioned below ``` Install panopticapi by: ``` pip install git+https://github.com/cocodataset/panopticapi.git ``` run `python datasets/build_COCO_instance.py`, to get dataset format for COCO instance segmentation. run `python datasets/prepare_coco_semantic_annos_from_panoptic_annos.py`, to extract semantic annotations from panoptic annotations (only used for evaluation). ### Expected dataset structure for [RefCOCO/+/g](https://github.com/lichengunc/refer): ``` refseg/ refcoco/ instances.json merged_google.json refs(google).p refs(unc).p refcoco+/ instances.json refs(unc).p refcocog/ instances.json refs(google).p refs(umd.p) images/ mscoco/ train2014/ ``` run `python datasets/build_RefCOCO.py`, to get the dataset format for joint training. ### Dataset preparation for COCO-Interactive: We build COCO-Interactive upon COCO-Instance. So make sure follow the instruction of [COCO](#expected-dataset-structure-for-coco) preparation. run `python datasets/build_COCO_Interactivate.py`, to get the dataset format for joint training. Also you can directly download converted file of COCO-Interactive in [Google Drive](https://drive.google.com/file/d/1EcC1tl1OQRgIqqy7KFG7JZz2KHujAQB3/view?usp=sharing) | [Baidu Cloud](https://pan.baidu.com/s/1NRGJGkJDUGn8CU-sU5ScOg) . Detailed format of downloaded file is [here](https://github.com/zamling/PSALM/blob/main/docs/DATASET.md#download-converted-dataset-files) ### Dataset preparation for LLaVA-1.5 training data: Please download the images and [annotation](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/blob/main/llava_v1_5_mix665k.json) following [llava 1.5](https://github.com/haotian-liu/LLaVA?tab=readme-ov-file#visual-instruction-tuning) stage 2 training instruction. ``` # Do not need to download COCO again gqa/ images/ ocr_vqa/ images/ textvqa/ train_images/ vg/ VG_100K/ VG_100K_2/ llava_v1_5_mix665k.json ``` Since LLaVA-1.5 dataset contain text-only samples, run `python datasets/prepare_llava_1_5.py` to filter text-only samples. Note to change paths in `prepare_llava_1_5.py` to your dataset paths. ### (Optional) Expected dataset structure for [LVIS](https://www.lvisdataset.org/dataset): We only use LVIS dataset for training. If you have already downloaded the COCO images, you only need to download the LVIS annotations. ``` lvis/ {train, val}2017/ # Since you already have the coco image, there is no need to download this lvis_v1_train.json lvis_v1_val.json ``` run `python datasets/build_lvis.py`, to get the dataset format for joint training. ## Zero-shot evaluation for other dataset PSALM shows powerful zero-shot capability for many unseen tasks: Open-Vocabulary Segmentation, Generalized Referring Segmentation, and Video Object Segmentation. ### Dataset preparation for Open-Vocabulary Segmentation: We follow [here](https://github.com/bytedance/fc-clip/blob/main/datasets/README.md#expected-dataset-structure-for-cityscapes) for preparation of cityscapes, ADE20k, Pascal VOC, and Pascal Context. ### Expected dataset structure for [gRefCOCO](https://github.com/henghuiding/gRefCOCO): Download the gRefCOCO dataset from this [link](https://entuedu-my.sharepoint.com/personal/liuc0058_e_ntu_edu_sg/_layouts/15/onedrive.aspx?id=%2Fpersonal%2Fliuc0058%5Fe%5Fntu%5Fedu%5Fsg%2FDocuments%2Fopensource%2FGRES%2Fdataset&ga=1) and put in the same folder of RefCOCO ``` refer_seg/ grefcoco/ grefs(unc).json instances.json refcoco/ refcoco+/ refcocog/ ``` run `python datasets/build_gRefCOCO.py`, to get the dataset format for evaluation. ### Expected dataset structure for [DAVIS-2017](https://davischallenge.org/davis2017/code.html) ``` DAVIS/ 2017/ trainval/ Annotations/ 480p/ # name for each video ImageSets/ 2017/ train.txt val.txt JPEGImages/ 480p/ # name for each video ``` run `python datasets/build_DAVIS.py`, to get the dataset format for evaluation. ### Download Converted Dataset Files You can download converted files ([Google Drive](https://drive.google.com/file/d/1EcC1tl1OQRgIqqy7KFG7JZz2KHujAQB3/view?usp=sharing) | [Baidu Cloud](https://pan.baidu.com/s/1NRGJGkJDUGn8CU-sU5ScOg) (code: hust)). The dowloaded files should in following structure: ``` refcoco/ refcoco_val.json refcoco_testA.json ... refcoco+/ refcoco+_val.json refcoco+_testA.json ... refcocog/ refcocog_val.json refcocog_test.json ... grefcoco/ refcocog_val.json refcocog_testA.json refcocog_testB.json coco_interactive_train_psalm.json # training set for interactive coco coco_interactive_val_psalm.json # val set for interactive coco instruction_dataset_coco_format.json: # GT for COCO instance #you need to put this file in psalm/output/instance_segmentation instruction_dataset_coco_format.json.lock #you need to put this file in psalm/output/instance_segmentation instance_train_psalm.json: training set for COCO instance instance_val_psalm.json: val set for COCO instance trainval_val_psalm.json: val set for DAVIS ```