Train a model
A training run grows a base model and teaches the new capacity on your data, producing a larger, stronger model in your account. Your original is never modified. This page walks the whole thing end to end, then explains every setting and how to choose it.
Before you start
You need three things:
- An API key — create one and export it:bash
export AXOM_KEY="axom_live_…" - A base model to grow — list the shared catalog with
GET /base-models, or use a model already in your account fromGET /models. Either way you need itsid.bashcurl https://console.axomlabs.ai/api/base-models -H "Authorization: Bearer $AXOM_KEY" - A dataset — uploaded and visible in
GET /datasets. See Datasets for the upload flow.
The shortest path
bash
curl https://console.axomlabs.ai/api/jobs \
-H "Authorization: Bearer $AXOM_KEY" \
-H "Content-Type: application/json" \
-d '{
"type": "train",
"idempotency_key": "run-2026-06-04-a",
"base_model_id": "<base-model-id>",
"dataset_id": "<dataset-id>",
"hyperparams": {
"turns": 2,
"output_name": "my-grown-model"
}
}'You get back a queued job. The orchestrator picks it up within seconds, runs it on a GPU worker, and the model appears in your account when it finishes. Then you monitor it and download the weights.
Every field, explained
Top-level
| Field | Required | What it is |
|---|---|---|
type | ✓ | "train" for a growth+training run. |
idempotency_key | ✓ | A string you choose, unique per logical run. Re-submitting the same key returns the existing job instead of starting a second one — so retrying after a dropped connection is always safe. Use a slug like customer-42-run-3. |
base_model_id | ✓ | The model to grow. Its architecture and starting weights are the foundation; Fusion adds capacity on top. |
dataset_id | ✓ | The data the new capacity learns from. Quality and relevance matter more than raw size. |
hyperparams | – | The training settings below. Sensible defaults apply if omitted. |
Training settings (hyperparams)
| Setting | Default | What it controls |
|---|---|---|
turns | 1 | The one lever that matters. A turn is a complete training pass. Each additional turn refines the model further — more quality for proportionally more time and cost. |
output_name | auto | The name your new model gets in the catalog. |
Turns — the quality dial
Turns are the headline control: pick how hard you want the model trained. Each turn is a full pass; the next turn picks up where the last left off and pushes quality further.
| Turns | Speed | Quality | Use it for |
|---|---|---|---|
| 1 | Fastest | Good | Quick iterations, validating a dataset, cost-sensitive runs. |
| 2 | ~2× time | Markedly better | Recommended default — the biggest quality-per-dollar jump. |
| 3 | ~3× time | Best | Hard tasks where you want every point of quality. |
In our own testing, a second turn cut perplexity from ~17 → ~11 — a large quality gain for double the time. The third turn adds more on top with diminishing returns.
Turns and cost
Each turn processes one pass of training tokens, so turns scales the training portion of your bill linearly; the one-time growth fee is unchanged. A 2-turn run costs about one growth fee + 2× the training. Estimate it before you launch.
Advanced: per-turn tuning
By default a turn is a full pass. To change how much training happens within each turn, set cycles (default 6) and steps_per_cycle (default 1000), or override with an explicit train_steps. Most users never need these — turns is the right lever.
The response
POST /jobs returns the created job immediately — it does not wait for training:
json
{
"id": "bcc91e6d-0cc1-411d-96a3-bc1eb203f9cb",
"type": "train",
"status": "queued",
"retries": 0,
"last_checkpoint_step": 0,
"output_model_id": null,
"created_at": "2026-06-04T02:30:00Z",
"started_at": null,
"finished_at": null
}| Field | Meaning |
|---|---|
id | The job ID — use it for all follow-up calls. |
status | Where the run is in its lifecycle. Starts at queued. |
retries | How many times the run was reclaimed after a worker interruption (resumes from checkpoint — not an error). |
last_checkpoint_step | The step the last durable checkpoint was written at; the resume point. |
output_model_id | null until the run completes, then the ID of your new model. |
created_at / started_at / finished_at | Timestamps; started_at/finished_at fill in as the run progresses. |
Estimate the cost first
Always free, always exact:
bash
curl https://console.axomlabs.ai/api/pricing/estimate \
-H "Authorization: Bearer $AXOM_KEY" -H "Content-Type: application/json" \
-d '{ "type": "train", "params_before": 1500000000, "params_after": 2360000000,
"turns": 2 }'See Pricing for the full model and the Estimate API.
What happens after you submit
- The run is queued and a GPU worker is provisioned.
- The model is grown, then trained for your configured number of
turns. - Progress streams as live metrics (loss, learning rate, throughput) and a status timeline.
- On completion, a result summary is written and your new model becomes downloadable.
Interruptions are a non-event: a run that loses its worker resumes from its last checkpoint automatically — you'll see retries tick up, nothing more.
Finding the right config (it's iterative)
There's no universal magic number — the best amount of training depends on your model and your data, and dialing it in is normal. The console suggests a tailored starting point for your specific model + dataset (via POST /training/recommend) — learning rate scaled to model size, training amount to dataset size — so you don't begin from one-size-fits-all defaults. Treat these as starting points, not guarantees:
| Your model | Start with |
|---|---|
| Small (≤ ~3B) | turns: 2 — the validated sweet spot. |
| Larger / unfamiliar | turns: 1 first to see the trajectory, then add turns. |
A good, cheap workflow:
- Run 1 turn and watch the live loss and the resulting perplexity. A 1-turn run is the cheapest way to sanity-check your data and see where quality lands.
- Still improving at the end? Add a turn (or, advanced, more
steps_per_cycle). Flat almost immediately? That usually points at the dataset, not the amount. - Repeat until the quality/cost trade-off is where you want it.
Expect to run a handful of short tests to find your model's sweet spot — that's expected with any local training, and 1-turn runs make it inexpensive. Use /pricing/estimate to price each option before you launch.
Proven configs
Configurations we've validated in our own testing. We add to this as we test more models — a known-good starting point so you skip straight to the good runs.
| Base model | Config | Result |
|---|---|---|
| Falcon-H1 1.5B | turns: 2 | Perplexity ~17 → ~11 (2nd turn). Recommended. |
More coming as we test additional models.
TIP
The defaults are tuned from our own testing on a 1.5B model. They're a sensible place to begin for any model — but the few extra test runs it takes to tune for your model and data are worth it, and cheap.
Recipes
Cheap validation run — confirm your data and wiring for a couple dollars:
json
{ "type": "train", "idempotency_key": "smoke-1", "base_model_id": "…",
"dataset_id": "…", "hyperparams": { "turns": 1 } }Production run — the recommended quality-per-dollar sweet spot:
json
{ "type": "train", "idempotency_key": "prod-1", "base_model_id": "…",
"dataset_id": "…", "hyperparams": { "turns": 2, "output_name": "support-bot-v1" } }Next
- Monitor a run — follow it live and retrieve the model.
- Contract a model — the inverse: shrink a model you already have.
- Settings reference — every knob, defaults, and tuning advice.