Create Upload
Mint a file_upload_connection_config row and return it with a presigned PUT.
The row id is Postgres-generated. The storage nonce in the S3 key is
just a path-collision-avoiding segment with no DB row backing it. PUT
the bytes to upload_url with the returned content_type
verbatim (or S3 rejects with SignatureDoesNotMatch), then register
the dataset via POST /v1/datasets over the returned
connection_config — that call allocates the dataset uuid id and
HEAD-verifies the upload landed.
Body
POST /v1/uploads -- request a presigned URL for direct-to-S3 upload.
Format of the dataset you will PUT to the returned URL. 'pkl' = a pickled pandas DataFrame, 'csv' = RFC4180 CSV with a header row, 'parquet' = Apache Parquet. The label column must be present in the uploaded table and its name is supplied on the subsequent /v1/trainer/run or /v1/reasoning/fit call as label_column.
pkl, csv, parquet Response
Successful Response
POST /v1/uploads response.
Mints the file_upload_connection_config row (so its id is
Postgres-generated, not derived from the S3 key) and returns it as
connection_config alongside the presigned PUT. The caller PUTs the
bytes to upload_url with content_type, then registers a dataset
over connection_config via POST /v1/datasets — which is what
assigns the dataset id. No dataset_id is returned here.
FileUploadConnectionConfig plus the server-assigned row identity.
The response shape for POST /v1/uploads — never a request, so id
and state are always present (dedicated response type instead of
nullable fields on the base). The base config is what rides the
ConnectionConfig union on requests/inference; this adds the persisted
id and lifecycle state.