Update README.md
Browse files
README.md
CHANGED
|
@@ -57,6 +57,10 @@ FlashHead matches the baseline **Llama-3.2-1B** within rounding on standard eval
|
|
| 57 |
|
| 58 |
FlashHead improves end-to-end speed by **1.75×** over state-of-the-art, while maintaining full accuracy parity.
|
| 59 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
---
|
| 61 |
|
| 62 |
## Accuracy (Parity with Baseline)
|
|
|
|
| 57 |
|
| 58 |
FlashHead improves end-to-end speed by **1.75×** over state-of-the-art, while maintaining full accuracy parity.
|
| 59 |
|
| 60 |
+
**Measurement setup:** vLLM 0.10.2, batch_size=1, prompt length=32, max_new_tokens=128, 10 warm-up runs, averaged over 100 runs.
|
| 61 |
+
|
| 62 |
+
**NVIDIA H200 measurement:** **FP8**, **512 Tokens/sec**.
|
| 63 |
+
|
| 64 |
---
|
| 65 |
|
| 66 |
## Accuracy (Parity with Baseline)
|