Inference benchmark

#11
by Geodd - opened

We recently ran inference benchmarks for arcee-ai/trinity-mini on a single Nvidia H200 using
our DeployPad inference stack and published the full results.

Metric vLLM DeployPad Change
Mean TPS (batch 32) 78.0 114.5 +46.8%
P99 TPS 90.6 134.8 +48.8%
Single batch TPS ~88 ~180 ~+105%
Mean TTFT 0.71 s 0.74 s βˆ’4%

The full benchmark report, raw statistics, and methodology are available here:
https://github.com/geoddllc/large-llm-inference-benchmarks/blob/main/models/arcee-ai/trinity-mini/README.md

Support for larger models (400B class) is planned for next week.

If you want to try it yourself, you can deploy directly via the console:
https://console.geodd.io/

Happy to answer questions about setup or benchmarking methodology.

Sign up or log in to comment