Inference benchmark

#11

by Geodd - opened 1 day ago

1 day ago

•

We recently ran inference benchmarks for arcee-ai/trinity-mini on a single Nvidia H200 using
our DeployPad inference stack and published the full results.

Metric	vLLM	DeployPad	Change
Mean TPS (batch 32)	78.0	114.5	+46.8%
P99 TPS	90.6	134.8	+48.8%
Single batch TPS	~88	~180	~+105%
Mean TTFT	0.71 s	0.74 s	−4%

The full benchmark report, raw statistics, and methodology are available here:
https://github.com/geoddllc/large-llm-inference-benchmarks/blob/main/models/arcee-ai/trinity-mini/README.md

Support for larger models (400B class) is planned for next week.

If you want to try it yourself, you can deploy directly via the console:
https://console.geodd.io/

Happy to answer questions about setup or benchmarking methodology.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment