# Prepare Datasets for PSALM

The training process of PSALM has two stages. First stage is visual language alignment and the second stage is joint training of multiple segmentation tasks.

We use a custom dataset to enable joint training. We assume that all datasets' root path are under /datasets.

## First stage training

We follow LLaVA's training strategy, see [here](https://github.com/haotian-liu/LLaVA/blob/main/docs/Data.md#pretraining-dataset) for a detailed dataset preparation.

## Second stage joint training

The second stage joint training of PSALM contains four different tasks: Generic Segmentation, Referring Segmentation, Interactivate Segmentation, and Visual-Language Tasks.

We use COCO Panoptic for Generic Segmentation, RefCOCO/+/g for Referring Segmentation, COCO-Interactive for Interactive Segmentation and LLaVA-1.5's training data for Visual-Language Tasks. 

(Optional) We also support LVIS for PSALM's second stage joint training.

### Expected dataset structure for [COCO](https://cocodataset.org/#download):

```
coco/
  annotations/
    instances_{train,val}2017.json
    panoptic_{train,val}2017.json
  {train,val}2017/
    # image files that are mentioned in the corresponding json
  panoptic_{train,val}2017/  # png annotations
  panoptic_semseg_{train,val}2017/  # generated by the script mentioned below
```

Install panopticapi by:
```
pip install git+https://github.com/cocodataset/panopticapi.git
```


run `python datasets/build_COCO_instance.py`, to get dataset format for COCO instance segmentation.

run `python datasets/prepare_coco_semantic_annos_from_panoptic_annos.py`, to extract semantic annotations from panoptic annotations (only used for evaluation).

### Expected dataset structure for [RefCOCO/+/g](https://github.com/lichengunc/refer):

```
refseg/
    refcoco/
        instances.json
        merged_google.json
        refs(google).p
        refs(unc).p
    refcoco+/
        instances.json
        refs(unc).p
    refcocog/
        instances.json
        refs(google).p
        refs(umd.p)
    images/
        mscoco/
            train2014/
```
run `python datasets/build_RefCOCO.py`, to get the dataset format for joint training.

### Dataset preparation for COCO-Interactive:

We build COCO-Interactive upon COCO-Instance. So make sure follow the instruction of [COCO](#expected-dataset-structure-for-coco) preparation.

run `python datasets/build_COCO_Interactivate.py`, to get the dataset format for joint training.

Also you can directly download converted file of COCO-Interactive in [Google Drive](https://drive.google.com/file/d/1EcC1tl1OQRgIqqy7KFG7JZz2KHujAQB3/view?usp=sharing) | [Baidu Cloud](https://pan.baidu.com/s/1NRGJGkJDUGn8CU-sU5ScOg) . Detailed format of downloaded file is [here](https://github.com/zamling/PSALM/blob/main/docs/DATASET.md#download-converted-dataset-files)

### Dataset preparation for LLaVA-1.5 training data:
Please download the images and [annotation](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/blob/main/llava_v1_5_mix665k.json) following [llava 1.5](https://github.com/haotian-liu/LLaVA?tab=readme-ov-file#visual-instruction-tuning) stage 2 training instruction. 
```
# Do not need to download COCO again
gqa/
    images/
ocr_vqa/
    images/
textvqa/
    train_images/
vg/
    VG_100K/
    VG_100K_2/
llava_v1_5_mix665k.json
```
Since LLaVA-1.5 dataset contain text-only samples, run `python datasets/prepare_llava_1_5.py` to filter text-only samples.  Note to change paths in `prepare_llava_1_5.py` to your dataset paths.

### (Optional) Expected dataset structure for [LVIS](https://www.lvisdataset.org/dataset):

We only use LVIS dataset for training. If you have already downloaded the COCO images, you only need to download the LVIS annotations.

```
lvis/
    {train, val}2017/
        # Since you already have the coco image, there is no need to download this
    lvis_v1_train.json
    lvis_v1_val.json
```

run `python datasets/build_lvis.py`, to get the dataset format for joint training.

## Zero-shot evaluation for other dataset

PSALM shows powerful zero-shot capability for many unseen tasks: Open-Vocabulary Segmentation, Generalized Referring Segmentation, and Video Object Segmentation.

### Dataset preparation for Open-Vocabulary Segmentation:

We follow [here](https://github.com/bytedance/fc-clip/blob/main/datasets/README.md#expected-dataset-structure-for-cityscapes) for preparation of cityscapes, ADE20k, Pascal VOC, and Pascal Context.

### Expected dataset structure for [gRefCOCO](https://github.com/henghuiding/gRefCOCO):

Download the gRefCOCO dataset from this [link](https://entuedu-my.sharepoint.com/personal/liuc0058_e_ntu_edu_sg/_layouts/15/onedrive.aspx?id=%2Fpersonal%2Fliuc0058%5Fe%5Fntu%5Fedu%5Fsg%2FDocuments%2Fopensource%2FGRES%2Fdataset&ga=1) and put in the same folder of RefCOCO

```
refer_seg/
    grefcoco/
        grefs(unc).json
        instances.json
    refcoco/
    refcoco+/
    refcocog/
```
run `python datasets/build_gRefCOCO.py`, to get the dataset format for evaluation.

### Expected dataset structure for [DAVIS-2017](https://davischallenge.org/davis2017/code.html)

```
DAVIS/
    2017/
        trainval/
            Annotations/
                480p/
                    # name for each video
            ImageSets/
                2017/
                    train.txt
                    val.txt
            JPEGImages/
                480p/
                    # name for each video
```

run `python datasets/build_DAVIS.py`, to get the dataset format for evaluation.

### Download Converted Dataset Files
You can download converted files ([Google Drive](https://drive.google.com/file/d/1EcC1tl1OQRgIqqy7KFG7JZz2KHujAQB3/view?usp=sharing) | [Baidu Cloud](https://pan.baidu.com/s/1NRGJGkJDUGn8CU-sU5ScOg) (code: hust)).
The dowloaded files should in following structure:
```
refcoco/
    refcoco_val.json
    refcoco_testA.json
    ...
refcoco+/
    refcoco+_val.json
    refcoco+_testA.json
    ...
refcocog/
    refcocog_val.json
    refcocog_test.json
    ...
grefcoco/
    refcocog_val.json
    refcocog_testA.json
    refcocog_testB.json
coco_interactive_train_psalm.json # training set for interactive coco
coco_interactive_val_psalm.json # val set for interactive coco
instruction_dataset_coco_format.json: # GT for COCO instance
    #you need to put this file in psalm/output/instance_segmentation
instruction_dataset_coco_format.json.lock #you need to put this file in psalm/output/instance_segmentation
instance_train_psalm.json: training set for COCO instance 
instance_val_psalm.json: val set for COCO instance 
trainval_val_psalm.json: val set for DAVIS
```