llm-bench dashboard

Capability ranking

Adjust capability weights

comprehension · contributioncomprehension strength 1comprehension triage 0.27comprehension summarization 0.22comprehension docqa 0.2comprehension reasoning 0.31

coding · contributioncoding strength 1coding grade 0.6coding agentic loop 0.25coding instruction following 0.15

Fleet suitability

Adjust fleet & speed weights

speedspeed e2e throughput 0.4speed cold ttft 0.45speed warm ttft 0.15speed decode retention 0

fleetworker ctx parallel overhead reserve fleet w thru 0.5

Context size

Backend A/B — vulkan vs rocm

Δ = rocm relative to vulkan: green ⇒ vulkan faster, yellow ⇒ rocm faster. Vulkan (int-dot off) is the suite default. llama-bench -fa1 -ngl99 -ctk/v q8_0.

llm-bench dashboard

Capability ranking

Fleet suitability

Context size

Backend A/B — vulkan vs rocm

Per-model breakdown

Data sources / required runs