Jobs & runs

A job is one run — either a training run (type: "train") or a contraction run (type: "contract"). This is the endpoint reference; for the how-and-why, see the guides: Train a model, Contract a model, Monitor & retrieve.

Submit a run — `POST /jobs`

Returns the created job immediately (status queued); the run executes asynchronously.

Training run

json

// request
{ "type": "train", "idempotency_key": "run-1",
  "base_model_id": "<uuid>", "dataset_id": "<uuid>",
  "hyperparams": { "turns": 2, "output_name": "my-model" } }

Field	Type	Notes
`type`	`"train"`	Required.
`idempotency_key`	string	Required. Re-using a key returns the same job.
`base_model_id`	uuid	Required. The model to grow — from `GET /base-models` or your own `GET /models`.
`dataset_id`	uuid	Required. From `GET /datasets`.
`hyperparams.turns`	int	Quality lever — full training passes (default `1`; `2` recommended).
`hyperparams.output_name`	string	Name for the new model.
`hyperparams.cycles` / `steps_per_cycle` / `lr` / `seq_len`	num	Advanced (optional) — standard tuning knobs; all have defaults.

→ Full setting guidance: Settings reference.

Fine-tune run

Adapt a model's behavior (LoRA) — no size change. → Fine-tune guide.

json

// request
{ "type": "finetune", "idempotency_key": "ft-1",
  "source_model_id": "<uuid>", "dataset_id": "<uuid>",
  "hyperparams": { "lora_rank": 32, "steps": 2000, "output_name": "my-model" } }

Field	Type	Notes
`type`	`"finetune"`	Required.
`source_model_id`	uuid	The model to fine-tune (top-level or in `hyperparams`). Or use `base_model_id` for a catalog base.
`dataset_id`	uuid	Required.
`hyperparams.lora_rank`	int	Adapter capacity (default `32`).
`hyperparams.steps`	int	Training steps (default `2000`).
`hyperparams.output_name`	string	Name for the new model.

Contraction run

json

// request
{ "type": "contract", "idempotency_key": "contract-1",
  "hyperparams": { "source_model_id": "<uuid>",
                   "contraction_ratio": 0.5, "num_layers_to_contract": 8 } }

Field	Type	Notes
`type`	`"contract"`	Required.
`idempotency_key`	string	Required.
`hyperparams.source_model_id`	uuid	Required. The model to shrink.
`hyperparams.contraction_ratio`	float	Prune aggressiveness, 0–1 (default `0.5`).
`hyperparams.num_layers_to_contract`	int	Layers to process (default `8`).

No dataset_id — contraction operates on the model itself.

Response (`201`)

json

{ "id": "<uuid>", "type": "train", "status": "queued", "retries": 0,
  "last_checkpoint_step": 0, "output_model_id": null,
  "created_at": "…", "started_at": null, "finished_at": null }

Recommended config — `POST /training/recommend`

A tailored starting point + safe window for a given model and dataset, so you don't start from one-size-fits-all defaults. Learning rate scales with model size; training amount scales with dataset size.

json

// request
{ "base_model_id": "<uuid>", "dataset_id": "<uuid>" }

// response
{ "model_params": 7000000000, "dataset_tokens_est": 2100000,
  "recommended": { "turns": 2, "lr": 7e-5, "cycles": 6, "steps_per_cycle": 1000, "seq_len": 256 },
  "windows": { "lr": {"min":3.5e-5,"max":1.4e-4}, "cycles": {"min":3,"max":10}, "...": {} },
  "notes": ["≈3 epochs over your ~2.1M-token dataset", "Learning rate tuned for a ~7B model"],
  "disclaimer": "Recommended starting points — tune within the window for your model and data." }

You can also pass params / dataset_tokens / dataset_bytes directly instead of IDs.

The job object

Returned by GET /jobs (list) and GET /jobs/{id} (single).

Field	Meaning
`id`	Job ID — used for all follow-up calls.
`type`	`train` or `contract`.
`status`	Lifecycle state — see the lifecycle.
`retries`	Times reclaimed after an interruption (resumes from checkpoint; not an error).
`last_checkpoint_step`	Resume point of the last durable checkpoint.
`output_model_id`	`null` until complete, then your new model's ID.
`created_at` / `started_at` / `finished_at`	Timestamps.

`GET /jobs`

List your jobs, newest first.

`GET /jobs/{id}`

A single job object.

Telemetry

`GET /jobs/{id}/metrics?since_step=N`

Per-step training telemetry. Pass since_step to fetch only points newer than the last you have, and append client-side (this drives the live loss curve).

json

{ "job_id": "…", "count": 50, "latest_step": 1000,
  "points": [
    { "ts": "…", "step": 620, "cycle": 1, "phase": "train",
      "loss": 0.78, "lr": 2.1e-05, "throughput": 1.39, "vram_gb": 23.6, "grad_norm": 1.02 }
  ] }

Point fields: step, cycle, phase, loss, lr, throughput (tok/s), vram_gb, grad_norm. → How to read these.

`GET /jobs/{id}/logs?since=<seq>`

The live console tail — curated progress lines (phases + per-step training output). Poll with ?since=<latest_seq> and append the new lines, identical to the metrics pattern.

json

{ "job_id": "…", "latest_seq": 1862,
  "lines": [
    { "seq": 1841, "ts": "…", "stream": "stdout", "text": "step 2000/6000 · loss 1.83 · lr 3.1e-05" },
    { "seq": 1842, "ts": "…", "stream": "stdout", "text": "Training complete — saving your model…" }
  ] }

The tail is capped to the most recent lines for live viewing.

`GET /jobs/{id}/events`

The run timeline — [ { ts, kind, payload } ], every state transition.

`GET /jobs/{id}/result`

Headline result for a finished run (404 until it exists).

json

{ "job_id": "…",
  "ppl_baseline": 11023, "ppl_final": 324, "ppl_delta_pct": -97.06, "loss_final": 0.327,
  "params_before": 1540000000, "params_after": 2360000000, "params_delta": 820000000,
  "function_preservation": 0.0009,
  "gpu_seconds": 1418, "tokens_processed": 1536000, "cost_usd": 25.00 }

Field	Notes
`ppl_baseline` / `ppl_final` / `ppl_delta_pct`	Quality: perplexity before → after, and % change.
`params_before` / `after` / `delta`	Size before/after and the change.
`function_preservation`	How cleanly capacity was added (near-zero = clean).
`gpu_seconds` / `tokens_processed`	Compute meters.
`cost_usd`	Billed amount — see Pricing.

→ Field-by-field interpretation.

Jobs & runs ​

Submit a run — POST /jobs ​

Training run ​

Fine-tune run ​

Contraction run ​

Response (201) ​

Recommended config — POST /training/recommend ​

The job object ​

GET /jobs ​

GET /jobs/{id} ​

Telemetry ​

GET /jobs/{id}/metrics?since_step=N ​

GET /jobs/{id}/logs?since=<seq> ​

GET /jobs/{id}/events ​

GET /jobs/{id}/result ​

Jobs & runs

Submit a run — `POST /jobs`

Training run

Fine-tune run

Contraction run

Response (`201`)

Recommended config — `POST /training/recommend`

The job object

`GET /jobs`

`GET /jobs/{id}`

Telemetry

`GET /jobs/{id}/metrics?since_step=N`

`GET /jobs/{id}/logs?since=<seq>`

`GET /jobs/{id}/events`

`GET /jobs/{id}/result`