Forge-EMB-mmclip
This repository contains a multimodal embedding model based on CLIP, designed for FORGE framework tasks. It allows for joint embedding of images and text.
π Reference: This implementation is based on the mm_clip project from the
al_sidrepository.
π Quick Start
The easiest way to use this model is by running the provided Jupyter Notebook.
1. Clone the Repository
git clone https://huggingface.co/AL-GR/Forge-EMB-mmclip
cd Forge-EMB-mmclip
β οΈ Important Note on Large Files: The file
clip.pthis stored using Git LFS. If you see thatclip.pthis only ~134 Bytes on your local machine, it means the actual weights were not downloaded. Please ensure you havegit-lfsinstalled and run:git lfs install git lfs pullOr download the files directly via the "Download" button on the website.
2. Install Dependencies
Ensure you have the necessary libraries installed (e.g., torch, transformers, jupyter).
pip install torch torchvision transformers matplotlib pillow
3. Run the Demo
Simply open and run the notebook:
jupyter notebook CLIP_demo.ipynb
Inside the notebook, you will find examples of how to load the model from clip.pth and process images like test.jpg.
π File Structure
| File/Folder | Description |
|---|---|
CLIP_demo.ipynb |
The main entry point. Contains inference code and visualization examples. |
clip.pth |
The pre-trained model weights (PyTorch checkpoint). |
bert/ |
Configuration or tokenizer files related to the text encoder. |
utils/ |
Helper functions used by the demo. |
test.jpg |
A sample image for testing the model. |
π Related Resources
- Original Implementation: selous123/al_sid/mm_clip
- FORGE Framework: AL-GR/FORGE
π License
This project is licensed under the Apache License 2.0.