Publications
Research
Applied AI research combining benchmarks, inference optimization, and open-source publishing.
Research Philosophy
The transition from AI research to production surfaces problems that don't appear in benchmarks. Our approach combines applied research with open-source publishing — we build tools and benchmarks that help organizations own their data, own their harness, and own their intelligence.
Industry Benchmarking
Domain-specific evaluations for autonomous agents across automotive, legacy code, finance, healthcare, and more.
Inference Optimization
Research on MoE models, weight optimization, and techniques for efficient AI deployment at scale.
AI Safety & Harness
Building the right harness and designing benchmarks that measure AI safety in production environments.
Papers & Reports
Publications
LegacyCodeBench: A Benchmark for Evaluating AI Agents on Real-World Legacy Modernization
Kalmantic Labs
We introduce LegacyCodeBench, a comprehensive benchmark for evaluating how well AI systems understand and modernize legacy code across COBOL, Fortran, and enterprise Java systems with real-world production constraints.
PeakWeights: Weight Optimization Techniques for Efficient Model Deployment
Kalmantic Labs
Research on weight optimization techniques for efficient model deployment, bridging the gap between AI research and production systems.
Inference Optimization and MoE Models for Production Systems
Kalmantic Labs
Deep research into inference optimization strategies, Mixture of Experts model architectures, and their practical implications for AI safety, AI harness design, and autonomous agent deployment.
Beyond Benchmarks: Measuring Real-World Impact of Autonomous Agents
Kalmantic Labs
A framework for collecting and analyzing real-world feedback on how autonomous agents impact humans, workflows, and organizational structures across industries.