Inference benchmark
#11
by
Geodd
- opened
We recently ran inference benchmarks for arcee-ai/trinity-mini on a single Nvidia H200 using
our DeployPad inference stack and published the full results.
| Metric | vLLM | DeployPad | Change |
|---|---|---|---|
| Mean TPS (batch 32) | 78.0 | 114.5 | +46.8% |
| P99 TPS | 90.6 | 134.8 | +48.8% |
| Single batch TPS | ~88 | ~180 | ~+105% |
| Mean TTFT | 0.71 s | 0.74 s | β4% |
The full benchmark report, raw statistics, and methodology are available here:
https://github.com/geoddllc/large-llm-inference-benchmarks/blob/main/models/arcee-ai/trinity-mini/README.md
Support for larger models (400B class) is planned for next week.
If you want to try it yourself, you can deploy directly via the console:
https://console.geodd.io/
Happy to answer questions about setup or benchmarking methodology.