Kalmantic Labs — AI Research and Tools for Production Systems

Publications

Research

Applied AI research combining benchmarks, inference optimization, and open-source publishing.

Research Philosophy

The transition from AI research to production surfaces problems that don't appear in benchmarks. Our approach combines applied research with open-source publishing — we build tools and benchmarks that help organizations own their data, own their harness, and own their intelligence.

Industry Benchmarking

Domain-specific evaluations for autonomous agents across automotive, legacy code, finance, healthcare, and more.

Inference Optimization

Research on MoE models, weight optimization, and techniques for efficient AI deployment at scale.

AI Safety & Harness

Building the right harness and designing benchmarks that measure AI safety in production environments.

Papers & Reports

Publications

2025Paper

LegacyCodeBench: A Benchmark for Evaluating AI Agents on Real-World Legacy Modernization

Kalmantic Labs

We introduce LegacyCodeBench, a comprehensive benchmark for evaluating how well AI systems understand and modernize legacy code across COBOL, Fortran, and enterprise Java systems with real-world production constraints.

Read paper

2025Paper

PeakWeights: Weight Optimization Techniques for Efficient Model Deployment

Kalmantic Labs

Research on weight optimization techniques for efficient model deployment, bridging the gap between AI research and production systems.

Read paper

2026Research

Inference Optimization and MoE Models for Production Systems

Kalmantic Labs

Deep research into inference optimization strategies, Mixture of Experts model architectures, and their practical implications for AI safety, AI harness design, and autonomous agent deployment.

Read paper

Coming SoonUpcoming

Beyond Benchmarks: Measuring Real-World Impact of Autonomous Agents

Kalmantic Labs

A framework for collecting and analyzing real-world feedback on how autonomous agents impact humans, workflows, and organizational structures across industries.

Read paper