Skip to content

Datasets

A training run reads from a dataset in your account, referenced by dataset_id. Datasets are versioned and immutable — a new upload is a new version, never an overwrite.

How uploads work

The console never receives your data bytes. Instead it mints a short-lived presigned URL and you upload directly to object storage. This keeps large datasets off the control plane entirely.

your client ──(1) POST /uploads/presign──▶ console   (returns a presigned PUT URL)
your client ──(2) PUT <bytes>────────────▶ object storage   (direct, no proxy)

1. Request an upload URL

bash
curl https://console.axomlabs.ai/api/uploads/presign \
  -H "Authorization: Bearer $AXOM_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "filename": "train.jsonl", "content_type": "application/jsonl" }'
json
{ "upload_id": "…", "key": "tmp/…/train.jsonl",
  "url": "https://…signed…", "method": "PUT" }

2. Upload the bytes

bash
curl -X PUT --upload-file train.jsonl "<url-from-step-1>"

The URL is scoped to that single object and expires shortly — request a fresh one per file.

Listing datasets

bash
curl https://console.axomlabs.ai/api/datasets -H "Authorization: Bearer $AXOM_KEY"
json
[ { "id": "…", "name": "wikitext-x2-chassis", "version": "v1",
    "shard_count": 8, "bytes": 2400000000, "status": "ready", "created_at": "…" } ]

Use a dataset's id as the dataset_id when you launch a run.

Format

Datasets are JSONL — one example per line, e.g. {"text": "..."}. Upload one or more .jsonl files; the trainer tokenizes and cycles through them across all training steps. When you launch a training run with this dataset's id, the worker pulls your files directly from storage and trains on them.

Dataset registration

Creating and versioning datasets from raw uploads is managed in the console today; a dataset-registration API is on the roadmap. The upload primitive above and read access via GET /datasets are available now.

Fusion Training Console