Contract a model
Contraction is the inverse of Fusion: it shrinks a model you already have by identifying and pruning its lowest-value neurons, producing a smaller model that keeps nearly all of the original's quality. It's how you trade a little headroom for cheaper, faster inference.
When to use it
- Cut inference cost — a 10–25% smaller model is cheaper and faster to serve, for every request, forever.
- Fit a tighter budget — bring a model down to a size that runs on smaller hardware.
- After a growth run — grow aggressively to learn, then contract to ship a lean production model. Grow-then-contract is a common pipeline.
Contraction needs no dataset — it operates on the model's own structure.
Run one
bash
curl https://console.axomlabs.ai/api/jobs \
-H "Authorization: Bearer $AXOM_KEY" \
-H "Content-Type: application/json" \
-d '{
"type": "contract",
"idempotency_key": "contract-support-bot-1",
"hyperparams": {
"source_model_id": "<model-id>",
"contraction_ratio": 0.5,
"num_layers_to_contract": 8
}
}'You get back a queued job, same shape as a training run. The smaller model appears in your account on completion.
Settings
| Setting | Default | Range | What it controls |
|---|---|---|---|
source_model_id | — (required) | — | The trained model to shrink. The result is a new model; the source is untouched. |
contraction_ratio | 0.5 | 0.1–0.9 | How aggressively to prune within each targeted layer. Higher removes more neurons → smaller model, but more quality risk. 0.5 is a balanced starting point. |
num_layers_to_contract | 8 | 1–all | How many layers to process. More layers → more total shrinkage. |
Choosing the aggressiveness:
| Goal | contraction_ratio | Expect |
|---|---|---|
| Conservative — protect quality | 0.25 | ~5–10% smaller, negligible quality change. |
| Balanced (recommended) | 0.5 | ~10–20% smaller, minimal quality loss. |
| Aggressive — maximize savings | 0.75 | ~20–30% smaller; verify quality on your eval set. |
Verify before you ship
Contraction holds quality well, but the right ratio depends on your model and your tolerance. Check the result's perplexity (ppl_final) against the source and run your own evals before promoting an aggressively-contracted model to production.
Pricing
Contraction is a flat price per run, scaled by the size of the model you contract (processing a 1T model is not the same as a 1.5B). See the contraction tiers — e.g. $9 for a ≤3B model, $150 for a 70B. Estimate it first:
bash
curl https://console.axomlabs.ai/api/pricing/estimate \
-H "Authorization: Bearer $AXOM_KEY" -H "Content-Type: application/json" \
-d '{ "type": "contract", "params_before": 70000000000 }'What you get back
A new, smaller model and a result summary showing the parameter reduction (params_before → params_after) and the quality held (ppl_final). Monitor the run and download it like any other.
Next
- Train a model — the forward operation.
- Settings reference — all knobs in one place.