Contract a model

Contraction is the inverse of Fusion: it shrinks a model you already have by identifying and pruning its lowest-value neurons, producing a smaller model that keeps nearly all of the original's quality. It's how you trade a little headroom for cheaper, faster inference.

When to use it

Cut inference cost — a 10–25% smaller model is cheaper and faster to serve, for every request, forever.
Fit a tighter budget — bring a model down to a size that runs on smaller hardware.
After a growth run — grow aggressively to learn, then contract to ship a lean production model. Grow-then-contract is a common pipeline.

Contraction needs no dataset — it operates on the model's own structure.

Run one

bash

curl https://console.axomlabs.ai/api/jobs \
  -H "Authorization: Bearer $AXOM_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "contract",
    "idempotency_key": "contract-support-bot-1",
    "hyperparams": {
      "source_model_id": "<model-id>",
      "contraction_ratio": 0.5,
      "num_layers_to_contract": 8
    }
  }'

You get back a queued job, same shape as a training run. The smaller model appears in your account on completion.

Settings

Setting	Default	Range	What it controls
`source_model_id`	— (required)	—	The trained model to shrink. The result is a new model; the source is untouched.
`contraction_ratio`	`0.5`	0.1–0.9	How aggressively to prune within each targeted layer. Higher removes more neurons → smaller model, but more quality risk. `0.5` is a balanced starting point.
`num_layers_to_contract`	`8`	1–all	How many layers to process. More layers → more total shrinkage.

Choosing the aggressiveness:

Goal	`contraction_ratio`	Expect
Conservative — protect quality	`0.25`	~5–10% smaller, negligible quality change.
Balanced (recommended)	`0.5`	~10–20% smaller, minimal quality loss.
Aggressive — maximize savings	`0.75`	~20–30% smaller; verify quality on your eval set.

Verify before you ship

Contraction holds quality well, but the right ratio depends on your model and your tolerance. Check the result's perplexity (ppl_final) against the source and run your own evals before promoting an aggressively-contracted model to production.

Pricing

Contraction is a flat price per run, scaled by the size of the model you contract (processing a 1T model is not the same as a 1.5B). See the contraction tiers — e.g. $9 for a ≤3B model, $150 for a 70B. Estimate it first:

bash

curl https://console.axomlabs.ai/api/pricing/estimate \
  -H "Authorization: Bearer $AXOM_KEY" -H "Content-Type: application/json" \
  -d '{ "type": "contract", "params_before": 70000000000 }'

What you get back

A new, smaller model and a result summary showing the parameter reduction (params_before → params_after) and the quality held (ppl_final). Monitor the run and download it like any other.

Train a model — the forward operation.
Settings reference — all knobs in one place.

Contract a model ​

When to use it ​

Run one ​

Settings ​

Pricing ​

What you get back ​

Next ​