Forge-EMB-mmclip

This repository contains a multimodal embedding model based on CLIP, designed for FORGE framework tasks. It allows for joint embedding of images and text.

📚 Reference: This implementation is based on the mm_clip project from the al_sid repository.

🚀 Quick Start

The easiest way to use this model is by running the provided Jupyter Notebook.

1. Clone the Repository

git clone https://huggingface.co/AL-GR/Forge-EMB-mmclip
cd Forge-EMB-mmclip

⚠️ Important Note on Large Files: The file clip.pth is stored using Git LFS. If you see that clip.pth is only ~134 Bytes on your local machine, it means the actual weights were not downloaded. Please ensure you have git-lfs installed and run:
git lfs install
git lfs pull
Or download the files directly via the "Download" button on the website.

2. Install Dependencies

Ensure you have the necessary libraries installed (e.g., torch, transformers, jupyter).

pip install torch torchvision transformers matplotlib pillow

3. Run the Demo

Simply open and run the notebook:

jupyter notebook CLIP_demo.ipynb

Inside the notebook, you will find examples of how to load the model from clip.pth and process images like test.jpg.

📂 File Structure

File/Folder	Description
`CLIP_demo.ipynb`	The main entry point. Contains inference code and visualization examples.
`clip.pth`	The pre-trained model weights (PyTorch checkpoint).
`bert/`	Configuration or tokenizer files related to the text encoder.
`utils/`	Helper functions used by the demo.
`test.jpg`	A sample image for testing the model.

🔗 Related Resources

Original Implementation: selous123/al_sid/mm_clip
FORGE Framework: AL-GR/FORGE

📄 License

This project is licensed under the Apache License 2.0.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support