Instructions to use HuggingFaceM4/idefics-9b-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use HuggingFaceM4/idefics-9b-instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="HuggingFaceM4/idefics-9b-instruct")# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("HuggingFaceM4/idefics-9b-instruct") model = AutoModelForImageTextToText.from_pretrained("HuggingFaceM4/idefics-9b-instruct") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use HuggingFaceM4/idefics-9b-instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "HuggingFaceM4/idefics-9b-instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceM4/idefics-9b-instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/HuggingFaceM4/idefics-9b-instruct
- SGLang
How to use HuggingFaceM4/idefics-9b-instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "HuggingFaceM4/idefics-9b-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceM4/idefics-9b-instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "HuggingFaceM4/idefics-9b-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceM4/idefics-9b-instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use HuggingFaceM4/idefics-9b-instruct with Docker Model Runner:
docker model run hf.co/HuggingFaceM4/idefics-9b-instruct
No output generated with sample code on non-quantised model
Hi and thanks for this brilliant model.
I have been running your Colab notebook and it works like a charm on Google Colab. I have also tried to reproduce it on my server with 8x NVIDIA RTX A6000. With the exact same code from the notebook I receive the exact same output:
Question: What's on the picture? Answer: Kittens.
But whatever I do, if I do not use the quantised model but idefics-9b or idefics-9b-instruct, I only ever receive:
Question: What's on the picture? Answer:
The only difference between the colab code and my code is the removal of quantization_config=bnb_config from the IdeficsForVisionText2Text.from_pretrained(...) parameter list. I have a had a colleague find their own way of running the model with the code you provided and they have reproduced the exact same issue independently (Question: What's on the picture? Answer:). I've tried different GPUs and different servers, but without the quantised model, I am unable to produce any output. The model loads into memory and is accessed during inference - it just does not generate or return or display any new tokens (I have also increased max_new_tokens=50, tried other prompts like the Pokémon example).
Any help would be appreciated.
Hi @Pwicke ,
That does not sound right indeed.
Could you say more about your env? In particular transformers and tokenizers versions?
I'll try to reproduce the error.
Thank you for your response.
accelerate 0.24.0.dev0, bitsandbytes 0.41.1, nvidia-cublas-cu12 12.1.3.1, python 3.10.12 , sentencepiece 0.1.99 ,tokenizers 0.14.1, torch 2.1.0, transformers 4.35.0.dev0
upgrading transformers to 4.37 can solve this problem.