Train a model

A training run grows a base model and teaches the new capacity on your data, producing a larger, stronger model in your account. Your original is never modified. This page walks the whole thing end to end, then explains every setting and how to choose it.

Before you start

You need three things:

An API key — create one and export it:
bash
```
export AXOM_KEY="axom_live_…"
```
A base model to grow — list the shared catalog with GET /base-models, or use a model already in your account from GET /models. Either way you need its id.
bash
```
curl https://console.axomlabs.ai/api/base-models -H "Authorization: Bearer $AXOM_KEY"
```
A dataset — uploaded and visible in GET /datasets. See Datasets for the upload flow.

The shortest path

bash

curl https://console.axomlabs.ai/api/jobs \
  -H "Authorization: Bearer $AXOM_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "train",
    "idempotency_key": "run-2026-06-04-a",
    "base_model_id": "<base-model-id>",
    "dataset_id": "<dataset-id>",
    "hyperparams": {
      "turns": 2,
      "output_name": "my-grown-model"
    }
  }'

You get back a queued job. The orchestrator picks it up within seconds, runs it on a GPU worker, and the model appears in your account when it finishes. Then you monitor it and download the weights.

Every field, explained

Top-level

Field	Required	What it is
`type`	✓	`"train"` for a growth+training run.
`idempotency_key`	✓	A string you choose, unique per logical run. Re-submitting the same key returns the existing job instead of starting a second one — so retrying after a dropped connection is always safe. Use a slug like `customer-42-run-3`.
`base_model_id`	✓	The model to grow. Its architecture and starting weights are the foundation; Fusion adds capacity on top.
`dataset_id`	✓	The data the new capacity learns from. Quality and relevance matter more than raw size.
`hyperparams`	–	The training settings below. Sensible defaults apply if omitted.

Training settings (`hyperparams`)

Setting	Default	What it controls
`turns`	`1`	The one lever that matters. A turn is a complete training pass. Each additional turn refines the model further — more quality for proportionally more time and cost.
`output_name`	auto	The name your new model gets in the catalog.

Turns — the quality dial

Turns are the headline control: pick how hard you want the model trained. Each turn is a full pass; the next turn picks up where the last left off and pushes quality further.

Turns	Speed	Quality	Use it for
1	Fastest	Good	Quick iterations, validating a dataset, cost-sensitive runs.
2	~2× time	Markedly better	Recommended default — the biggest quality-per-dollar jump.
3	~3× time	Best	Hard tasks where you want every point of quality.

In our own testing, a second turn cut perplexity from ~17 → ~11 — a large quality gain for double the time. The third turn adds more on top with diminishing returns.

Turns and cost

Each turn processes one pass of training tokens, so turns scales the training portion of your bill linearly; the one-time growth fee is unchanged. A 2-turn run costs about one growth fee + 2× the training. Estimate it before you launch.

Advanced: per-turn tuning

By default a turn is a full pass. To change how much training happens within each turn, set cycles (default 6) and steps_per_cycle (default 1000), or override with an explicit train_steps. Most users never need these — turns is the right lever.

The response

POST /jobs returns the created job immediately — it does not wait for training:

json

{
  "id": "bcc91e6d-0cc1-411d-96a3-bc1eb203f9cb",
  "type": "train",
  "status": "queued",
  "retries": 0,
  "last_checkpoint_step": 0,
  "output_model_id": null,
  "created_at": "2026-06-04T02:30:00Z",
  "started_at": null,
  "finished_at": null
}

Field	Meaning
`id`	The job ID — use it for all follow-up calls.
`status`	Where the run is in its lifecycle. Starts at `queued`.
`retries`	How many times the run was reclaimed after a worker interruption (resumes from checkpoint — not an error).
`last_checkpoint_step`	The step the last durable checkpoint was written at; the resume point.
`output_model_id`	`null` until the run completes, then the ID of your new model.
`created_at` / `started_at` / `finished_at`	Timestamps; `started_at`/`finished_at` fill in as the run progresses.

Estimate the cost first

Always free, always exact:

bash

curl https://console.axomlabs.ai/api/pricing/estimate \
  -H "Authorization: Bearer $AXOM_KEY" -H "Content-Type: application/json" \
  -d '{ "type": "train", "params_before": 1500000000, "params_after": 2360000000,
        "turns": 2 }'

See Pricing for the full model and the Estimate API.

What happens after you submit

The run is queued and a GPU worker is provisioned.
The model is grown, then trained for your configured number of turns.
Progress streams as live metrics (loss, learning rate, throughput) and a status timeline.
On completion, a result summary is written and your new model becomes downloadable.

Interruptions are a non-event: a run that loses its worker resumes from its last checkpoint automatically — you'll see retries tick up, nothing more.

Finding the right config (it's iterative)

There's no universal magic number — the best amount of training depends on your model and your data, and dialing it in is normal. The console suggests a tailored starting point for your specific model + dataset (via POST /training/recommend) — learning rate scaled to model size, training amount to dataset size — so you don't begin from one-size-fits-all defaults. Treat these as starting points, not guarantees:

Your model	Start with
Small (≤ ~3B)	`turns: 2` — the validated sweet spot.
Larger / unfamiliar	`turns: 1` first to see the trajectory, then add turns.

A good, cheap workflow:

Run 1 turn and watch the live loss and the resulting perplexity. A 1-turn run is the cheapest way to sanity-check your data and see where quality lands.
Still improving at the end? Add a turn (or, advanced, more steps_per_cycle). Flat almost immediately? That usually points at the dataset, not the amount.
Repeat until the quality/cost trade-off is where you want it.

Expect to run a handful of short tests to find your model's sweet spot — that's expected with any local training, and 1-turn runs make it inexpensive. Use /pricing/estimate to price each option before you launch.

Proven configs

Configurations we've validated in our own testing. We add to this as we test more models — a known-good starting point so you skip straight to the good runs.

Base model	Config	Result
Falcon-H1 1.5B	`turns: 2`	Perplexity ~17 → ~11 (2nd turn). Recommended.

More coming as we test additional models.

TIP

The defaults are tuned from our own testing on a 1.5B model. They're a sensible place to begin for any model — but the few extra test runs it takes to tune for your model and data are worth it, and cheap.

Recipes

Cheap validation run — confirm your data and wiring for a couple dollars:

json

{ "type": "train", "idempotency_key": "smoke-1", "base_model_id": "…",
  "dataset_id": "…", "hyperparams": { "turns": 1 } }

Production run — the recommended quality-per-dollar sweet spot:

json

{ "type": "train", "idempotency_key": "prod-1", "base_model_id": "…",
  "dataset_id": "…", "hyperparams": { "turns": 2, "output_name": "support-bot-v1" } }

Monitor a run — follow it live and retrieve the model.
Contract a model — the inverse: shrink a model you already have.
Settings reference — every knob, defaults, and tuning advice.

Train a model ​

Before you start ​

The shortest path ​

Every field, explained ​

Top-level ​

Training settings (hyperparams) ​

Turns — the quality dial ​

The response ​

Estimate the cost first ​

What happens after you submit ​

Finding the right config (it's iterative) ​

Proven configs ​

Recipes ​

Next ​

Train a model

Before you start

The shortest path

Every field, explained

Top-level

Training settings (`hyperparams`)

Turns — the quality dial

The response

Estimate the cost first

What happens after you submit

Finding the right config (it's iterative)

Proven configs

Recipes

Next