Kalmantic Labs

All Evaluations

Benchmarks

Domain-specific evaluations for autonomous agents. Filter by domain to explore active and upcoming benchmarks.

LegacyCodeBench

Legacy Code

Evaluating how well AI systems understand and modernize legacy code.

120Tasks
8Models
0.623Top Score
View benchmark

AutoBench

Automotive

Autonomous agent evaluation across automotive scenarios.

48Tasks
12Models
0.847Top Score
View benchmark

TravelBench

Travel

Measuring agent capabilities in travel planning and operations.

64Tasks
10Models
0.751Top Score
View benchmark

WealthBench

Finance

Agent evaluation for wealth management and financial advisory.

Coming Soon

KiranaBench

Kirana

Benchmarking agents for small-format retail and kirana store operations.

Coming Soon

WholesaleBench

Wholesale

Evaluating agents across wholesale distribution and supply chain workflows.

Coming Soon

HospitalBench

Healthcare

Agent evaluation for hospital operations and clinical decision support.

Coming Soon

ClinicBench

Clinic

Benchmarking agents for outpatient clinic workflows and patient management.

Coming Soon

RetailOutletBench

Retail

Evaluating agents for retail outlet management and customer operations.

Coming Soon

TransportOperatorBench

Transport

Agent evaluation for transport and logistics operator workflows.

Coming Soon