Spaces:
Running
Running
| title: README | |
| emoji: π | |
| colorFrom: pink | |
| colorTo: red | |
| sdk: static | |
| pinned: true | |
| FuriosaAI develops data center AI accelerators. Our RNGD (pronounced "Renegade") accelerator, currently sampling, | |
| excels at high-performance inference for LLMs and agentic AI. | |
| Get started fast with common inference tasks on RNGD | |
| using these pre-compiled popular Hugging Face models β no manual conversion or quantization needed. Requires Furiosa SDK 2025.2 or later on a server with RNGD accelerator. | |
| Need a model with custom configurations? Compile it yourself using our [Model Preparation Workflow](https://developer.furiosa.ai/latest/en/furiosa_llm/model-preparation.html) on Furiosa Docs. | |
| Visit [Supported Models](https://developer.furiosa.ai/latest/en/overview/supported_models.html) in the SDK documentation | |
| for more information and learn more about RNGD at https://furiosa.ai/rngd. | |
| ## Pre-compiled models | |
| Please check out the collection of models at https://huggingface.co/furiosa-ai/collections. | |
| | Pre-compiled Model | Description | Base Model | Support Version | | |
| | ------------------------------------------------------------------------------------------------------------- | ------------------------------------ |-------------------------------------------------------------------------------------------------------------- | ----------------| | |
| | [furiosa-ai/bert-large-uncased-INT8-MLPerf](https://huggingface.co/furiosa-ai/bert-large-uncased-INT8-MLPerf) | INT8 quantized, optimized for MLPerf | [google-bert/bert-large-uncased](https://huggingface.co/google-bert/bert-large-uncased) | 2025.2 | | |
| | [furiosa-ai/gpt-j-6b-FP8-MLPerf](https://huggingface.co/furiosa-ai/gpt-j-6b-FP8-MLPerf) | FP8 quantized, optimized for MLPerf | [EleutherAI/gpt-j-6b](https://huggingface.co/EleutherAI/gpt-j-6b) | 2025.2 | | |
| | [furiosa-ai/DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/furiosa-ai/DeepSeek-R1-Distill-Llama-8B) | BF16 | [deepseek-ai/DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) | >= 2025.3 | | |
| | [furiosa-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/furiosa-ai/DeepSeek-R1-Distill-Llama-70B) | BF16 | [deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) | >= 2025.3 | | |
| | [furiosa-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/furiosa-ai/DeepSeek-R1-Distill-Qwen-7B) | BF16 | [deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) | >= 2025.3 | | |
| | [furiosa-ai/DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/furiosa-ai/DeepSeek-R1-Distill-Qwen-14B) | BF16 | [deepseek-ai/DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B) | >= 2025.3 | | |
| | [furiosa-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/furiosa-ai/DeepSeek-R1-Distill-Qwen-32B) | BF16 | [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) | >= 2025.3 | | |
| | [furiosa-ai/EXAONE-3.5-7.8B-Instruct](https://huggingface.co/furiosa-ai/EXAONE-3.5-7.8B-Instruct) | BF16 | [LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct](https://huggingface.co/LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct) | >= 2025.2 | | |
| | [furiosa-ai/EXAONE-3.5-32B-Instruct](https://huggingface.co/furiosa-ai/EXAONE-3.5-32B-Instruct) | BF16 | [LGAI-EXAONE/EXAONE-3.5-32B-Instruct](https://huggingface.co/LGAI-EXAONE/EXAONE-3.5-32B-Instruct) | >= 2025.2 | | |
| | [furiosa-ai/Llama-3.1-8B-Instruct](https://huggingface.co/furiosa-ai/Llama-3.1-8B-Instruct) | BF16 | [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | >= 2025.2 | | |
| | [furiosa-ai/Llama-3.1-8B-Instruct-FP8](https://huggingface.co/furiosa-ai/Llama-3.1-8B-Instruct-FP8) | FP8 quantized | [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | >= 2025.2 | | |
| | [furiosa-ai/Llama-3.3-70B-Instruct](https://huggingface.co/furiosa-ai/Llama-3.3-70B-Instruct) | BF16 | [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | >= 2025.3 | | |
| | [furiosa-ai/Llama-3.3-70B-Instruct-INT8](https://huggingface.co/furiosa-ai/Llama-3.3-70B-Instruct-INT8) | INT8 weight quantization | [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | >= 2025.3 | | |
| | [furiosa-ai/Qwen2.5-7B-Instruct](https://huggingface.co/furiosa-ai/Qwen2.5-7B-Instruct) | BF16 | [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) | >= 2025.3 | | |
| | [furiosa-ai/Qwen2.5-14B-Instruct](https://huggingface.co/furiosa-ai/Qwen2.5-14B-Instruct) | BF16 | [Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) | >= 2025.3 | | |
| | [furiosa-ai/Qwen2.5-32B-Instruct](https://huggingface.co/furiosa-ai/Qwen2.5-32B-Instruct) | BF16 | [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) | >= 2025.3 | | |
| | [furiosa-ai/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/furiosa-ai/Qwen2.5-Coder-7B-Instruct) | BF16 | [Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) | >= 2025.3 | | |
| | [furiosa-ai/Qwen2.5-Coder-14B-Instruct](https://huggingface.co/furiosa-ai/Qwen2.5-Coder-14B-Instruct) | BF16 | [Qwen/Qwen2.5-Coder-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct) | >= 2025.3 | | |
| | [furiosa-ai/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/furiosa-ai/Qwen2.5-Coder-32B-Instruct) | BF16 | [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) | >= 2025.3 | | |
| ## Examples | |
| First, install the pre-requisites by following [Installing Furiosa-LLM](https://developer.furiosa.ai/latest/en/get_started/furiosa_llm.html#installing-furiosa-llm). | |
| Then, run the following command to start the Furiosa-LLM server with the Llama-3.1-8B-Instruct-FP8 model: | |
| ``` | |
| furiosa-llm serve furiosa-ai/Llama-3.1-8B-Instruct-FP8 | |
| ``` | |
| For reasoning models like DeepSeek-R1-Distill-Llama-8B, you can enable the reasoning mode with a proper reasoning parser: | |
| ``` | |
| furiosa-llm serve furiosa-ai/DeepSeek-R1-Distill-Llama-8B \ | |
| --enable-reasoning --reasoning-parser deepseek_r1 | |
| ``` | |
| Once your server has launched, you can query the model with input prompts: | |
| ```sh | |
| curl http://localhost:8000/v1/chat/completions \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "model": "EMPTY", | |
| "messages": [{"role": "user", "content": "What is the capital of France?"}] | |
| }' \ | |
| | python -m json.tool | |
| ``` | |
| You can also learn more about usages from [Quick Start with Furiosa-LLM](https://developer.furiosa.ai/latest/en/get_started/furiosa_llm.html#installing-furiosa-llm). |