| # Prepare Datasets for PSALM | |
| The training process of PSALM has two stages. First stage is visual language alignment and the second stage is joint training of multiple segmentation tasks. | |
| We use a custom dataset to enable joint training. We assume that all datasets' root path are under /datasets. | |
| ## First stage training | |
| We follow LLaVA's training strategy, see [here](https://github.com/haotian-liu/LLaVA/blob/main/docs/Data.md#pretraining-dataset) for a detailed dataset preparation. | |
| ## Second stage joint training | |
| The second stage joint training of PSALM contains four different tasks: Generic Segmentation, Referring Segmentation, Interactivate Segmentation, and Visual-Language Tasks. | |
| We use COCO Panoptic for Generic Segmentation, RefCOCO/+/g for Referring Segmentation, COCO-Interactive for Interactive Segmentation and LLaVA-1.5's training data for Visual-Language Tasks. | |
| (Optional) We also support LVIS for PSALM's second stage joint training. | |
| ### Expected dataset structure for [COCO](https://cocodataset.org/#download): | |
| ``` | |
| coco/ | |
| annotations/ | |
| instances_{train,val}2017.json | |
| panoptic_{train,val}2017.json | |
| {train,val}2017/ | |
| # image files that are mentioned in the corresponding json | |
| panoptic_{train,val}2017/ # png annotations | |
| panoptic_semseg_{train,val}2017/ # generated by the script mentioned below | |
| ``` | |
| Install panopticapi by: | |
| ``` | |
| pip install git+https://github.com/cocodataset/panopticapi.git | |
| ``` | |
| run `python datasets/build_COCO_instance.py`, to get dataset format for COCO instance segmentation. | |
| run `python datasets/prepare_coco_semantic_annos_from_panoptic_annos.py`, to extract semantic annotations from panoptic annotations (only used for evaluation). | |
| ### Expected dataset structure for [RefCOCO/+/g](https://github.com/lichengunc/refer): | |
| ``` | |
| refseg/ | |
| refcoco/ | |
| instances.json | |
| merged_google.json | |
| refs(google).p | |
| refs(unc).p | |
| refcoco+/ | |
| instances.json | |
| refs(unc).p | |
| refcocog/ | |
| instances.json | |
| refs(google).p | |
| refs(umd.p) | |
| images/ | |
| mscoco/ | |
| train2014/ | |
| ``` | |
| run `python datasets/build_RefCOCO.py`, to get the dataset format for joint training. | |
| ### Dataset preparation for COCO-Interactive: | |
| We build COCO-Interactive upon COCO-Instance. So make sure follow the instruction of [COCO](#expected-dataset-structure-for-coco) preparation. | |
| run `python datasets/build_COCO_Interactivate.py`, to get the dataset format for joint training. | |
| Also you can directly download converted file of COCO-Interactive in [Google Drive](https://drive.google.com/file/d/1EcC1tl1OQRgIqqy7KFG7JZz2KHujAQB3/view?usp=sharing) | [Baidu Cloud](https://pan.baidu.com/s/1NRGJGkJDUGn8CU-sU5ScOg) . Detailed format of downloaded file is [here](https://github.com/zamling/PSALM/blob/main/docs/DATASET.md#download-converted-dataset-files) | |
| ### Dataset preparation for LLaVA-1.5 training data: | |
| Please download the images and [annotation](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/blob/main/llava_v1_5_mix665k.json) following [llava 1.5](https://github.com/haotian-liu/LLaVA?tab=readme-ov-file#visual-instruction-tuning) stage 2 training instruction. | |
| ``` | |
| # Do not need to download COCO again | |
| gqa/ | |
| images/ | |
| ocr_vqa/ | |
| images/ | |
| textvqa/ | |
| train_images/ | |
| vg/ | |
| VG_100K/ | |
| VG_100K_2/ | |
| llava_v1_5_mix665k.json | |
| ``` | |
| Since LLaVA-1.5 dataset contain text-only samples, run `python datasets/prepare_llava_1_5.py` to filter text-only samples. Note to change paths in `prepare_llava_1_5.py` to your dataset paths. | |
| ### (Optional) Expected dataset structure for [LVIS](https://www.lvisdataset.org/dataset): | |
| We only use LVIS dataset for training. If you have already downloaded the COCO images, you only need to download the LVIS annotations. | |
| ``` | |
| lvis/ | |
| {train, val}2017/ | |
| # Since you already have the coco image, there is no need to download this | |
| lvis_v1_train.json | |
| lvis_v1_val.json | |
| ``` | |
| run `python datasets/build_lvis.py`, to get the dataset format for joint training. | |
| ## Zero-shot evaluation for other dataset | |
| PSALM shows powerful zero-shot capability for many unseen tasks: Open-Vocabulary Segmentation, Generalized Referring Segmentation, and Video Object Segmentation. | |
| ### Dataset preparation for Open-Vocabulary Segmentation: | |
| We follow [here](https://github.com/bytedance/fc-clip/blob/main/datasets/README.md#expected-dataset-structure-for-cityscapes) for preparation of cityscapes, ADE20k, Pascal VOC, and Pascal Context. | |
| ### Expected dataset structure for [gRefCOCO](https://github.com/henghuiding/gRefCOCO): | |
| Download the gRefCOCO dataset from this [link](https://entuedu-my.sharepoint.com/personal/liuc0058_e_ntu_edu_sg/_layouts/15/onedrive.aspx?id=%2Fpersonal%2Fliuc0058%5Fe%5Fntu%5Fedu%5Fsg%2FDocuments%2Fopensource%2FGRES%2Fdataset&ga=1) and put in the same folder of RefCOCO | |
| ``` | |
| refer_seg/ | |
| grefcoco/ | |
| grefs(unc).json | |
| instances.json | |
| refcoco/ | |
| refcoco+/ | |
| refcocog/ | |
| ``` | |
| run `python datasets/build_gRefCOCO.py`, to get the dataset format for evaluation. | |
| ### Expected dataset structure for [DAVIS-2017](https://davischallenge.org/davis2017/code.html) | |
| ``` | |
| DAVIS/ | |
| 2017/ | |
| trainval/ | |
| Annotations/ | |
| 480p/ | |
| # name for each video | |
| ImageSets/ | |
| 2017/ | |
| train.txt | |
| val.txt | |
| JPEGImages/ | |
| 480p/ | |
| # name for each video | |
| ``` | |
| run `python datasets/build_DAVIS.py`, to get the dataset format for evaluation. | |
| ### Download Converted Dataset Files | |
| You can download converted files ([Google Drive](https://drive.google.com/file/d/1EcC1tl1OQRgIqqy7KFG7JZz2KHujAQB3/view?usp=sharing) | [Baidu Cloud](https://pan.baidu.com/s/1NRGJGkJDUGn8CU-sU5ScOg) (code: hust)). | |
| The dowloaded files should in following structure: | |
| ``` | |
| refcoco/ | |
| refcoco_val.json | |
| refcoco_testA.json | |
| ... | |
| refcoco+/ | |
| refcoco+_val.json | |
| refcoco+_testA.json | |
| ... | |
| refcocog/ | |
| refcocog_val.json | |
| refcocog_test.json | |
| ... | |
| grefcoco/ | |
| refcocog_val.json | |
| refcocog_testA.json | |
| refcocog_testB.json | |
| coco_interactive_train_psalm.json # training set for interactive coco | |
| coco_interactive_val_psalm.json # val set for interactive coco | |
| instruction_dataset_coco_format.json: # GT for COCO instance | |
| #you need to put this file in psalm/output/instance_segmentation | |
| instruction_dataset_coco_format.json.lock #you need to put this file in psalm/output/instance_segmentation | |
| instance_train_psalm.json: training set for COCO instance | |
| instance_val_psalm.json: val set for COCO instance | |
| trainval_val_psalm.json: val set for DAVIS | |
| ``` | |