Jobs & runs
A job is one run — either a training run (type: "train") or a contraction run (type: "contract"). This is the endpoint reference; for the how-and-why, see the guides: Train a model, Contract a model, Monitor & retrieve.
Submit a run — POST /jobs
Returns the created job immediately (status queued); the run executes asynchronously.
Training run
json
// request
{ "type": "train", "idempotency_key": "run-1",
"base_model_id": "<uuid>", "dataset_id": "<uuid>",
"hyperparams": { "turns": 2, "output_name": "my-model" } }| Field | Type | Notes |
|---|---|---|
type | "train" | Required. |
idempotency_key | string | Required. Re-using a key returns the same job. |
base_model_id | uuid | Required. The model to grow — from GET /base-models or your own GET /models. |
dataset_id | uuid | Required. From GET /datasets. |
hyperparams.turns | int | Quality lever — full training passes (default 1; 2 recommended). |
hyperparams.output_name | string | Name for the new model. |
hyperparams.cycles / steps_per_cycle / lr / seq_len | num | Advanced (optional) — standard tuning knobs; all have defaults. |
→ Full setting guidance: Settings reference.
Fine-tune run
Adapt a model's behavior (LoRA) — no size change. → Fine-tune guide.
json
// request
{ "type": "finetune", "idempotency_key": "ft-1",
"source_model_id": "<uuid>", "dataset_id": "<uuid>",
"hyperparams": { "lora_rank": 32, "steps": 2000, "output_name": "my-model" } }| Field | Type | Notes |
|---|---|---|
type | "finetune" | Required. |
source_model_id | uuid | The model to fine-tune (top-level or in hyperparams). Or use base_model_id for a catalog base. |
dataset_id | uuid | Required. |
hyperparams.lora_rank | int | Adapter capacity (default 32). |
hyperparams.steps | int | Training steps (default 2000). |
hyperparams.output_name | string | Name for the new model. |
Contraction run
json
// request
{ "type": "contract", "idempotency_key": "contract-1",
"hyperparams": { "source_model_id": "<uuid>",
"contraction_ratio": 0.5, "num_layers_to_contract": 8 } }| Field | Type | Notes |
|---|---|---|
type | "contract" | Required. |
idempotency_key | string | Required. |
hyperparams.source_model_id | uuid | Required. The model to shrink. |
hyperparams.contraction_ratio | float | Prune aggressiveness, 0–1 (default 0.5). |
hyperparams.num_layers_to_contract | int | Layers to process (default 8). |
No dataset_id — contraction operates on the model itself.
Response (201)
json
{ "id": "<uuid>", "type": "train", "status": "queued", "retries": 0,
"last_checkpoint_step": 0, "output_model_id": null,
"created_at": "…", "started_at": null, "finished_at": null }Recommended config — POST /training/recommend
A tailored starting point + safe window for a given model and dataset, so you don't start from one-size-fits-all defaults. Learning rate scales with model size; training amount scales with dataset size.
json
// request
{ "base_model_id": "<uuid>", "dataset_id": "<uuid>" }
// response
{ "model_params": 7000000000, "dataset_tokens_est": 2100000,
"recommended": { "turns": 2, "lr": 7e-5, "cycles": 6, "steps_per_cycle": 1000, "seq_len": 256 },
"windows": { "lr": {"min":3.5e-5,"max":1.4e-4}, "cycles": {"min":3,"max":10}, "...": {} },
"notes": ["≈3 epochs over your ~2.1M-token dataset", "Learning rate tuned for a ~7B model"],
"disclaimer": "Recommended starting points — tune within the window for your model and data." }You can also pass params / dataset_tokens / dataset_bytes directly instead of IDs.
The job object
Returned by GET /jobs (list) and GET /jobs/{id} (single).
| Field | Meaning |
|---|---|
id | Job ID — used for all follow-up calls. |
type | train or contract. |
status | Lifecycle state — see the lifecycle. |
retries | Times reclaimed after an interruption (resumes from checkpoint; not an error). |
last_checkpoint_step | Resume point of the last durable checkpoint. |
output_model_id | null until complete, then your new model's ID. |
created_at / started_at / finished_at | Timestamps. |
GET /jobs
List your jobs, newest first.
GET /jobs/{id}
A single job object.
Telemetry
GET /jobs/{id}/metrics?since_step=N
Per-step training telemetry. Pass since_step to fetch only points newer than the last you have, and append client-side (this drives the live loss curve).
json
{ "job_id": "…", "count": 50, "latest_step": 1000,
"points": [
{ "ts": "…", "step": 620, "cycle": 1, "phase": "train",
"loss": 0.78, "lr": 2.1e-05, "throughput": 1.39, "vram_gb": 23.6, "grad_norm": 1.02 }
] }Point fields: step, cycle, phase, loss, lr, throughput (tok/s), vram_gb, grad_norm. → How to read these.
GET /jobs/{id}/logs?since=<seq>
The live console tail — curated progress lines (phases + per-step training output). Poll with ?since=<latest_seq> and append the new lines, identical to the metrics pattern.
json
{ "job_id": "…", "latest_seq": 1862,
"lines": [
{ "seq": 1841, "ts": "…", "stream": "stdout", "text": "step 2000/6000 · loss 1.83 · lr 3.1e-05" },
{ "seq": 1842, "ts": "…", "stream": "stdout", "text": "Training complete — saving your model…" }
] }The tail is capped to the most recent lines for live viewing.
GET /jobs/{id}/events
The run timeline — [ { ts, kind, payload } ], every state transition.
GET /jobs/{id}/result
Headline result for a finished run (404 until it exists).
json
{ "job_id": "…",
"ppl_baseline": 11023, "ppl_final": 324, "ppl_delta_pct": -97.06, "loss_final": 0.327,
"params_before": 1540000000, "params_after": 2360000000, "params_delta": 820000000,
"function_preservation": 0.0009,
"gpu_seconds": 1418, "tokens_processed": 1536000, "cost_usd": 25.00 }| Field | Notes |
|---|---|
ppl_baseline / ppl_final / ppl_delta_pct | Quality: perplexity before → after, and % change. |
params_before / after / delta | Size before/after and the change. |
function_preservation | How cleanly capacity was added (near-zero = clean). |
gpu_seconds / tokens_processed | Compute meters. |
cost_usd | Billed amount — see Pricing. |