Model Lab
Model Lab
Research and tools for model optimization — from MoE architectures to weight compression for production inference.
Model Research
Optimization for Production Inference
MoE routing, weight quantization, KV-cache compression, speculative decoding. These techniques determine whether a model costs $0.01 or $0.10 per request. We research what works in practice and publish the results.
MoE Architectures
Research on Mixture of Experts models and their implications for inference economics, routing efficiency, and production deployment.
Weight Optimization
Techniques for model compression, quantization, and pruning that maintain quality while reducing inference costs.
Inference Optimization
Batch scheduling, speculative decoding, KV-cache strategies, and serving configurations that reduce latency and cost in production.
Papers & Reports
Related Publications
PeakWeights: Weight Optimization Techniques for Efficient Model Deployment
Kalmantic Labs
Weight optimization techniques for production model deployment. Quantization, pruning, and compression methods that maintain output quality at lower inference cost.
Inference Optimization and MoE Models for Production Systems
Kalmantic Labs
Deep research into inference optimization strategies, Mixture of Experts model architectures, and their practical implications for AI safety, AI harness design, and autonomous agent deployment.
PeakWeights
Weight Optimization Library
Our open-source weight optimization library for efficient model deployment. Research-backed techniques for model compression, quantization, and inference optimization.
View on GitHubOpen Source
Everything we build is public
Benchmarks, tools, and research. All on GitHub. Contributions welcome.