WilhelmT commited on
Commit
844dc16
·
verified ·
1 Parent(s): 579b0dd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -57,6 +57,10 @@ FlashHead matches the baseline **Llama-3.2-1B** within rounding on standard eval
57
 
58
  FlashHead improves end-to-end speed by **1.75×** over state-of-the-art, while maintaining full accuracy parity.
59
 
 
 
 
 
60
  ---
61
 
62
  ## Accuracy (Parity with Baseline)
 
57
 
58
  FlashHead improves end-to-end speed by **1.75×** over state-of-the-art, while maintaining full accuracy parity.
59
 
60
+ **Measurement setup:** vLLM 0.10.2, batch_size=1, prompt length=32, max_new_tokens=128, 10 warm-up runs, averaged over 100 runs.
61
+
62
+ **NVIDIA H200 measurement:** **FP8**, **512 Tokens/sec**.
63
+
64
  ---
65
 
66
  ## Accuracy (Parity with Baseline)