Skip to content

Train a model

A training run grows a base model and teaches the new capacity on your data, producing a larger, stronger model in your account. Your original is never modified. This page walks the whole thing end to end, then explains every setting and how to choose it.

Before you start

You need three things:

  1. An API keycreate one and export it:
    bash
    export AXOM_KEY="axom_live_…"
  2. A base model to grow — list the shared catalog with GET /base-models, or use a model already in your account from GET /models. Either way you need its id.
    bash
    curl https://console.axomlabs.ai/api/base-models -H "Authorization: Bearer $AXOM_KEY"
  3. A dataset — uploaded and visible in GET /datasets. See Datasets for the upload flow.

The shortest path

bash
curl https://console.axomlabs.ai/api/jobs \
  -H "Authorization: Bearer $AXOM_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "train",
    "idempotency_key": "run-2026-06-04-a",
    "base_model_id": "<base-model-id>",
    "dataset_id": "<dataset-id>",
    "hyperparams": {
      "turns": 2,
      "output_name": "my-grown-model"
    }
  }'

You get back a queued job. The orchestrator picks it up within seconds, runs it on a GPU worker, and the model appears in your account when it finishes. Then you monitor it and download the weights.

Every field, explained

Top-level

FieldRequiredWhat it is
type"train" for a growth+training run.
idempotency_keyA string you choose, unique per logical run. Re-submitting the same key returns the existing job instead of starting a second one — so retrying after a dropped connection is always safe. Use a slug like customer-42-run-3.
base_model_idThe model to grow. Its architecture and starting weights are the foundation; Fusion adds capacity on top.
dataset_idThe data the new capacity learns from. Quality and relevance matter more than raw size.
hyperparamsThe training settings below. Sensible defaults apply if omitted.

Training settings (hyperparams)

SettingDefaultWhat it controls
turns1The one lever that matters. A turn is a complete training pass. Each additional turn refines the model further — more quality for proportionally more time and cost.
output_nameautoThe name your new model gets in the catalog.

Turns — the quality dial

Turns are the headline control: pick how hard you want the model trained. Each turn is a full pass; the next turn picks up where the last left off and pushes quality further.

TurnsSpeedQualityUse it for
1FastestGoodQuick iterations, validating a dataset, cost-sensitive runs.
2~2× timeMarkedly betterRecommended default — the biggest quality-per-dollar jump.
3~3× timeBestHard tasks where you want every point of quality.

In our own testing, a second turn cut perplexity from ~17 → ~11 — a large quality gain for double the time. The third turn adds more on top with diminishing returns.

Turns and cost

Each turn processes one pass of training tokens, so turns scales the training portion of your bill linearly; the one-time growth fee is unchanged. A 2-turn run costs about one growth fee + 2× the training. Estimate it before you launch.

Advanced: per-turn tuning

By default a turn is a full pass. To change how much training happens within each turn, set cycles (default 6) and steps_per_cycle (default 1000), or override with an explicit train_steps. Most users never need these — turns is the right lever.

The response

POST /jobs returns the created job immediately — it does not wait for training:

json
{
  "id": "bcc91e6d-0cc1-411d-96a3-bc1eb203f9cb",
  "type": "train",
  "status": "queued",
  "retries": 0,
  "last_checkpoint_step": 0,
  "output_model_id": null,
  "created_at": "2026-06-04T02:30:00Z",
  "started_at": null,
  "finished_at": null
}
FieldMeaning
idThe job ID — use it for all follow-up calls.
statusWhere the run is in its lifecycle. Starts at queued.
retriesHow many times the run was reclaimed after a worker interruption (resumes from checkpoint — not an error).
last_checkpoint_stepThe step the last durable checkpoint was written at; the resume point.
output_model_idnull until the run completes, then the ID of your new model.
created_at / started_at / finished_atTimestamps; started_at/finished_at fill in as the run progresses.

Estimate the cost first

Always free, always exact:

bash
curl https://console.axomlabs.ai/api/pricing/estimate \
  -H "Authorization: Bearer $AXOM_KEY" -H "Content-Type: application/json" \
  -d '{ "type": "train", "params_before": 1500000000, "params_after": 2360000000,
        "turns": 2 }'

See Pricing for the full model and the Estimate API.

What happens after you submit

  1. The run is queued and a GPU worker is provisioned.
  2. The model is grown, then trained for your configured number of turns.
  3. Progress streams as live metrics (loss, learning rate, throughput) and a status timeline.
  4. On completion, a result summary is written and your new model becomes downloadable.

Interruptions are a non-event: a run that loses its worker resumes from its last checkpoint automatically — you'll see retries tick up, nothing more.

Finding the right config (it's iterative)

There's no universal magic number — the best amount of training depends on your model and your data, and dialing it in is normal. The console suggests a tailored starting point for your specific model + dataset (via POST /training/recommend) — learning rate scaled to model size, training amount to dataset size — so you don't begin from one-size-fits-all defaults. Treat these as starting points, not guarantees:

Your modelStart with
Small (≤ ~3B)turns: 2 — the validated sweet spot.
Larger / unfamiliarturns: 1 first to see the trajectory, then add turns.

A good, cheap workflow:

  1. Run 1 turn and watch the live loss and the resulting perplexity. A 1-turn run is the cheapest way to sanity-check your data and see where quality lands.
  2. Still improving at the end? Add a turn (or, advanced, more steps_per_cycle). Flat almost immediately? That usually points at the dataset, not the amount.
  3. Repeat until the quality/cost trade-off is where you want it.

Expect to run a handful of short tests to find your model's sweet spot — that's expected with any local training, and 1-turn runs make it inexpensive. Use /pricing/estimate to price each option before you launch.

Proven configs

Configurations we've validated in our own testing. We add to this as we test more models — a known-good starting point so you skip straight to the good runs.

Base modelConfigResult
Falcon-H1 1.5Bturns: 2Perplexity ~17 → ~11 (2nd turn). Recommended.

More coming as we test additional models.

TIP

The defaults are tuned from our own testing on a 1.5B model. They're a sensible place to begin for any model — but the few extra test runs it takes to tune for your model and data are worth it, and cheap.

Recipes

Cheap validation run — confirm your data and wiring for a couple dollars:

json
{ "type": "train", "idempotency_key": "smoke-1", "base_model_id": "…",
  "dataset_id": "…", "hyperparams": { "turns": 1 } }

Production run — the recommended quality-per-dollar sweet spot:

json
{ "type": "train", "idempotency_key": "prod-1", "base_model_id": "…",
  "dataset_id": "…", "hyperparams": { "turns": 2, "output_name": "support-bot-v1" } }

Next

Fusion Training Console