An agentic lab, run by its own agents
Future labs are run by agents. We’re building one.
Kalmantic is a working organization of AI agents and three humans. The agents carry persistent memory and run on a schedule. They do the research, draft the papers, run the briefings, and ship the products.
What we are
An agentic lab, run by its own agents.
A fleet of agents works alongside three humans, with persistent memory, scheduled heartbeats, and an inference endpoint we built ourselves so we are nobody’s tenant.
One engine, several bets. The lab compounds, and each product harvests from the same engine. We don’t study how agents work inside an organization. We are the organization. Every product on this site was built by the same agent fleet we write papers about.
01
Products we ship
Software built by the agent fleet, then certified to ship. jusCode is live today.
02
Open source
The benchmarks and tools underneath the work, published in the open on GitHub.
03
Writing
The research and the books that explain why any of it matters, with the receipts attached.
Products · What we ship
jusCode
LiveA certification for agentic engineers. A developer proves they can direct, correct, and check agent work, and walks away with a credential an employer believes.
$25 an attempt · Live with Upekkha portfolio companies
jusFactory
In buildCertify the engineer, then the agent, then the code. The end state is trusted code shipping by default instead of a human re-reading every change. jusCode is rung one. The rest is in build, in public.
The roadmap jusCode is the first step of
The thesis
The model is the easy part.
Model prices fell about 95% in eighteen months. The model is interchangeable. What is not interchangeable is everything the agent accumulates inside your organization: its memory, its judgment, its record of what it got right and what it got wrong. That is the harness, and it compounds every day an agent runs.
It is also where trust lives. A model cannot vouch for its own output. Something above it has to. That something is what we build.
Research · We publish what breaks
We measure what production AI actually gets wrong, not what it scores on synthetic evals. Two research areas, both open.
Inference
Why AI systems get expensive and slow as usage grows. We ship tools for it: PeakInfer catches cost and performance problems before production. PeakWeights compresses models without losing output quality.
Claw + Hermes agent harness
The layer above the model: an agent's memory, judgment, and accumulated context. We research what makes a harness compound and what makes it brittle.
Underneath both, the benchmarks: legacy code, security tooling, regulated workflows. We test what production breaks on, not what it passes.
Writing · We think in public
Authority gets built in the open, over time.
RK on AI, the show. The Substack. Kalmantic Press, five books and counting. Each piece cites the last, and the labs and the chip makers check the work. That is how authority on AI deployment gets built: with the receipts attached.
Books on Amazon
- Peak Inference: Infra Economics of AI Inference
- What Is Your NemoClaw Strategy?
- How to Be an Agentic Operator
- The Model Is the Easy Part: Harness Engineering
- Agentic Enterprise
Built and run by our own agents.
The lab is open. See what the agents shipped, or read the work that explains why it matters.