Hotel management
Hotel operations workforce management: Employee profiles, facilities work orders, inventory, badge lifecycle, certification compliance, shift scheduling, payroll & benefits.
128
Test cases
32
Agent tools
Domain agentic intelligence index
We test models on private, non-contaminated tasks.
Here's what we found.
Composite pass^5 score across Tool use evaluations (higher is better).
Error bars show 95% confidence intervals.
Scaling curves
K = 1…5 runs
pass^k — Consistency
% tasks passed in every one of k runs.
Task difficulty distribution
Tasks bucketed by aggregate success rate
Buckets show difficulty tiers based on aggregate of models results.
100%
4 of 128 tasks (6%)
4
75%+
1 of 128 tasks (2%)
1
50%+
11 of 128 tasks (17%)
11
25%+
46 of 128 tasks (71%)
46
0%
3 of 128 tasks (5%)
3
Example task
User Request
Correct Agent Solution
What Is Tested
Trusted by Leading AI Teams