ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM
Key findings: ITBench-AA SRE overview: Highlights ITBench-AA is built in partnership with @IBM based on their ITBench benchmark.