Skip to content

Contract a model

Contraction is the inverse of Fusion: it shrinks a model you already have by identifying and pruning its lowest-value neurons, producing a smaller model that keeps nearly all of the original's quality. It's how you trade a little headroom for cheaper, faster inference.

When to use it

  • Cut inference cost — a 10–25% smaller model is cheaper and faster to serve, for every request, forever.
  • Fit a tighter budget — bring a model down to a size that runs on smaller hardware.
  • After a growth run — grow aggressively to learn, then contract to ship a lean production model. Grow-then-contract is a common pipeline.

Contraction needs no dataset — it operates on the model's own structure.

Run one

bash
curl https://console.axomlabs.ai/api/jobs \
  -H "Authorization: Bearer $AXOM_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "contract",
    "idempotency_key": "contract-support-bot-1",
    "hyperparams": {
      "source_model_id": "<model-id>",
      "contraction_ratio": 0.5,
      "num_layers_to_contract": 8
    }
  }'

You get back a queued job, same shape as a training run. The smaller model appears in your account on completion.

Settings

SettingDefaultRangeWhat it controls
source_model_id— (required)The trained model to shrink. The result is a new model; the source is untouched.
contraction_ratio0.50.1–0.9How aggressively to prune within each targeted layer. Higher removes more neurons → smaller model, but more quality risk. 0.5 is a balanced starting point.
num_layers_to_contract81–allHow many layers to process. More layers → more total shrinkage.

Choosing the aggressiveness:

Goalcontraction_ratioExpect
Conservative — protect quality0.25~5–10% smaller, negligible quality change.
Balanced (recommended)0.5~10–20% smaller, minimal quality loss.
Aggressive — maximize savings0.75~20–30% smaller; verify quality on your eval set.

Verify before you ship

Contraction holds quality well, but the right ratio depends on your model and your tolerance. Check the result's perplexity (ppl_final) against the source and run your own evals before promoting an aggressively-contracted model to production.

Pricing

Contraction is a flat price per run, scaled by the size of the model you contract (processing a 1T model is not the same as a 1.5B). See the contraction tiers — e.g. $9 for a ≤3B model, $150 for a 70B. Estimate it first:

bash
curl https://console.axomlabs.ai/api/pricing/estimate \
  -H "Authorization: Bearer $AXOM_KEY" -H "Content-Type: application/json" \
  -d '{ "type": "contract", "params_before": 70000000000 }'

What you get back

A new, smaller model and a result summary showing the parameter reduction (params_beforeparams_after) and the quality held (ppl_final). Monitor the run and download it like any other.

Next

Fusion Training Console